Brad H Story
- Professor, Speech/Language and Hearing
- Associate Dean, Faculty Affairs
- Professor, Cognitive Science - GIDP
- Professor, Second Language Acquisition / Teaching - GIDP
- Professor, Biomedical Engineering
- Member of the Graduate Faculty
- Professor, Applied Intercultural Arts Research - GIDP
- (520) 626-9528
- Speech And Hearing Sciences, Rm. 513
- Tucson, AZ 85721
- bstory@arizona.edu
Biography
Brad Story, PhD, is Professor in the Department of Speech, Language, and Hearing Sciences and Associate Dean of Faculty Affairs in the College of Science at the University of Arizona. Dr. Story received his BS in Applied Physics from the University of Northern Iowa in 1987 and his PhD in Speech and Hearing Sciences from the University of Iowa in 1995. From 1987-1991, he was employed in industry as an engineer where he developed computational models and instrumentation systems for designing and measuring the performance of mufflers. Dr. Story’s research publications concern the mechanics, aerodynamics, and acoustics of speech production, as well as the perception of speech sounds. Dr. Story has served multiple terms as Associate Editor for the Journal of the Acoustical Society of America, is a fellow of the Acoustical Society of America (ASA), and recipient of the ASA’s Rossing Prize in Acoustics Education in 2016. Dr. Story was recognized by the American Speech Language and Hearing Association in 2013 with the Willard R. Zemlin Lecture Award, and by the University of Iowa in 2018 with a Distinguished Alum Award. His research has been supported by the National Institutes of Health and the National Science Foundation.
Degrees
- Ph.D. Speech Science
- University of Iowa, Iowa City, Iowa, USA
- B.S. Applied Physics
- University of Northern Iowa, Cedar Falls, Iowa, United States
Work Experience
- University of Arizona, Tucson, Arizona (2020 - Ongoing)
- University of Arizona, Tucson, Arizona (2013 - Ongoing)
- University of Arizona, Tucson, Arizona (2006 - 2013)
- University of Arizona, Tucson, Arizona (2000 - 2006)
- WJ Gould Voice Research Center, Denver Center for the Performing Arts (1996 - 2000)
- University of Iowa, Iowa City, Iowa (1994 - 1996)
- Donaldson Co., Inc. (1987 - 1991)
Awards
- Galileo Circle Fellows Grant
- College of Science Galileo Circle, Fall 2019
- Distinguished Alum Award
- The Department of Communication Sciences and Disorders, University of Iowa, Fall 2018
- Rossing Prize in Acoustics Education
- Acoustical Society of America, Fall 2016
Interests
Research
Speech acoustics, computational modeling of speech production, speech signal processing, development of speech synthesis systems
Courses
2024-25 Courses
-
Research
SLHS 900 (Spring 2025) -
Research
SLHS 900 (Fall 2024)
2023-24 Courses
-
Acoustics/Perception of Speech
SLHS 565 (Fall 2023) -
Preclinical Speech Science
SLHS 566 (Fall 2023)
2022-23 Courses
-
Acoustics/Perception of Speech
SLHS 565 (Fall 2022) -
Preclinical Speech Science
SLHS 566 (Fall 2022)
2021-22 Courses
-
Independent Study
SLHS 699 (Spring 2022) -
Acoustics/Perception of Speech
SLHS 565 (Fall 2021) -
Preclinical Speech Science
SLHS 566 (Fall 2021)
2020-21 Courses
-
Directed Research
SLHS 692 (Spring 2021) -
Acoustics/Perception of Speech
SLHS 565 (Fall 2020) -
Honors Independent Study
SLHS 399H (Fall 2020) -
Preclinical Speech Science
SLHS 566 (Fall 2020)
2019-20 Courses
-
Honors Independent Study
SLHS 399H (Spring 2020) -
Acoustics/Perception of Speech
SLHS 565 (Fall 2019) -
Preclinical Speech Science
SLHS 566 (Fall 2019)
2018-19 Courses
-
Hearing Science
SLHS 380 (Summer I 2019) -
Acoustics/Spch+Hear Sci
SLHS 267 (Fall 2018) -
Independent Study
SLHS 799 (Fall 2018) -
Preclinical Speech Science
SLHS 566 (Fall 2018)
2017-18 Courses
-
Hearing Science
SLHS 380 (Summer I 2018) -
Independent Study
SLHS 399 (Fall 2017) -
Preclinical Speech Science
SLHS 566 (Fall 2017)
2016-17 Courses
-
Hearing Science
SLHS 380 (Summer I 2017) -
Dissertation
SLHS 920 (Spring 2017) -
Preceptorship
SLHS 491 (Spring 2017) -
Speech Science
SLHS 367 (Spring 2017) -
Acoustics/Spch+Hear Sci
SLHS 565 (Fall 2016) -
Dissertation
SLHS 920 (Fall 2016) -
Independent Study
SLHS 599 (Fall 2016) -
Preclinical Speech Science
SLHS 566 (Fall 2016)
2015-16 Courses
-
Dissertation
SLHS 920 (Spring 2016)
Scholarly Contributions
Chapters
- Story, B. H. (2019). History of speech synthesis. In The Routledge Handbook of Phonetics(pp 9-32). Routledge. doi:10.4324/9780429056253-2
- Story, B. H. (2016). The vocal tract in singing. In The Handbook of Singing. Oxford University Press. doi:10.1093/oxfordhb/9780199660773.013.012
- Story, B. H. (2015). Mechanisms of Voice Production. In The Handbook of Speech Production(pp 34-58). West Sussex, UK: John Wiley and Sons.
- Story, B. H. (2014). The Vocal Tract in Singing. In The Oxford Handbook of Singing. doi:10.1093/OXFORDHB/9780199660773.013.012
- Story, B. H., & Bunton, K. (2013). Simulation and identification of vowels based on a time-varying model of the vocal tract area function. In Vowel Inherent Spectral Change(pp 155--174). Springer Berlin Heidelberg.
Journals/Publications
- Lester-Smith, R. A., Jebaily, C. G., & Story, B. H. (2024). The Effects of Remote Signal Transmission and Recording on Acoustical Measures of Simulated Essential Vocal Tremor: Considerations for Remote Treatment Research and Telepractice. Journal of voice : official journal of the Voice Foundation, 38(2), 325-336.More infoStudies on medical and behavioral interventions for essential vocal tremor (EVT) have shown inconsistent effects on acoustical and perceptual outcome measures across studies and across participants. Remote acoustical and perceptual assessments might facilitate studies with larger samples of participants and repeated measures that could clarify treatment effects and identify optimal treatment candidates. Furthermore, remote acoustical and perceptual assessment might allow clinicians to monitor clients' treatment responses and optimize treatment approaches during telepractice. Thus, the purpose of this study was to evaluate the accuracy of remote signal transmission and recording for acoustical and perceptual assessment of EVT.
- Rusho, R. Z., Ahmed, A. H., Kruger, S., Alam, W., Meyer, D., Howard, D., Story, B., Jacob, M., & Lingala, S. G. (2024). Prospectively accelerated dynamic speech magnetic resonance imaging at 3 T using a self-navigated spiral-based manifold regularized scheme. NMR in biomedicine, 37(8), e5135.More infoThis work develops and evaluates a self-navigated variable density spiral (VDS)-based manifold regularization scheme to prospectively improve dynamic speech magnetic resonance imaging (MRI) at 3 T. Short readout duration spirals (1.3-ms long) were used to minimize sensitivity to off-resonance. A custom 16-channel speech coil was used for improved parallel imaging of vocal tract structures. The manifold model leveraged similarities between frames sharing similar vocal tract postures without explicit motion binning. The self-navigating capability of VDS was leveraged to learn the Laplacian structure of the manifold. Reconstruction was posed as a sensitivity-encoding-based nonlocal soft-weighted temporal regularization scheme. Our approach was compared with view-sharing, low-rank, temporal finite difference, extra dimension-based sparsity reconstruction constraints. Undersampling experiments were conducted on five volunteers performing repetitive and arbitrary speaking tasks at different speaking rates. Quantitative evaluation in terms of mean square error over moving edges was performed in a retrospective undersampling experiment on one volunteer. For prospective undersampling, blinded image quality evaluation in the categories of alias artifacts, spatial blurring, and temporal blurring was performed by three experts in voice research. Region of interest analysis at articulator boundaries was performed in both experiments to assess articulatory motion. Improved performance with manifold reconstruction constraints was observed over existing constraints. With prospective undersampling, a spatial resolution of 2.4 × 2.4 mm/pixel and a temporal resolution of 17.4 ms/frame for single-slice imaging, and 52.2 ms/frame for concurrent three-slice imaging, were achieved. We demonstrated implicit motion binning by analyzing the mechanics of the Laplacian matrix. Manifold regularization demonstrated superior image quality scores in reducing spatial and temporal blurring compared with all other reconstruction constraints. While it exhibited faint (nonsignificant) alias artifacts that were similar to temporal finite difference, it provided statistically significant improvements compared with the other constraints. In conclusion, the self-navigated manifold regularized scheme enabled robust high spatiotemporal resolution dynamic speech MRI at 3 T.
- Story, B. H. (2024). An Approach to Explaining Formants. Perspectives of the ASHA Special Interest Groups, 9(2), 461-471. doi:10.1044/2023_persp-23-00200
- Herbst, C. T., Story, B. H., & Meyer, D. (2023). Acoustical Theory of Vowel Modification Strategies in Belting. Journal of voice : official journal of the Voice Foundation.More infoVarious authors have argued that belting is to be produced by "speech-like" sounds, with the first and second supraglottic vocal tract resonances (f and f) at frequencies of the vowels determined by the lyrics to be sung. Acoustically, the hallmark of belting has been identified as a dominant second harmonic, possibly enhanced by first resonance tuning (f≈2f). It is not clear how both these concepts - (a) phonating with "speech-like," unmodified vowels; and (b) producing a belting sound with a dominant second harmonic, typically enhanced by f - can be upheld when singing across a singer's entire musical pitch range. For instance, anecdotal reports from pedagogues suggest that vowels with a low f, such as [i] or [u], might have to be modified considerably (by raising f) in order to phonate at higher pitches. These issues were systematically addressed in silico with respect to treble singing, using a linear source-filter voice production model. The dominant harmonic of the radiated spectrum was assessed in 12987 simulations, covering a parameter space of 37 fundamental frequencies (f) across the musical pitch range from C3 to C6; 27 voice source spectral slope settings from -4 to -30 dB/octave; computed for 13 different IPA vowels. The results suggest that, for most unmodified vowels, the stereotypical belting sound characteristics with a dominant second harmonic can only be produced over a pitch range of about a musical fifth, centered at f≈0.5f. In the [ɔ] and [ɑ] vowels, that range is extended to an octave, supported by a low second resonance. Data aggregation - considering the relative prevalence of vowels in American English - suggests that, historically, belting with f≈2f was derived from speech, and that songs with an extended musical pitch range likely demand considerable vowel modification. We thus argue that - on acoustical grounds - the pedagogical commandment for belting with unmodified, "speech-like" vowels can not always be fulfilled.
- Story, B. H., & Bunton, K. (2023). The relation of velopharyngeal coupling area and vocal tract scaling to identification of stop-nasal cognates. The Journal of the Acoustical Society of America, 154(6), 3741-3759.More infoThe purpose of this study was to determine whether the threshold of velopharyngeal (VP) coupling area at which listeners switch from identifying a consonant as a stop to a nasal in North American English was different for speech produced by a model based on an adult male, an adult female, and a 4-year-old child. V1CV2 stimuli were generated with a speech production model that encodes phonetic segments as relative acoustic targets imposed on an underlying vocal tract and laryngeal structure that can be scaled according to sex and age. Each V1CV2 was synthesized with a set of VP coupling functions whose maximum area ranged from 0 to 0.1 cm2. Results showed that scaling the vocal tract and vocal folds had essentially no effect on the VP coupling area at which listener identification shifted from stop to nasal. The range of coupling areas at which the crossover occurred was 0.037-0.049 cm2 for the male model, 0.040-0.055 cm2 for the female model, and 0.039-0.052 cm2 for the 4-year-old child model, and overall mean was 0.044 cm2. Calculations of band limited peak nasalance indicated that 85% peak nasalance during the consonant was well aligned with listener responses.
- Chuang, Y. J., Hwang, S. J., Buhr, K. A., Miller, C. A., Avey, G. D., Story, B. H., & Vorperian, H. K. (2022). Anatomic development of the upper airway during the first five years of life: A three-dimensional imaging study. PloS one, 17(3), e0264981.More infoNormative data on the growth and development of the upper airway across the sexes is needed for the diagnosis and treatment of congenital and acquired respiratory anomalies and to gain insight on developmental changes in speech acoustics and disorders with craniofacial anomalies.
- Herbst, C. T., & Story, B. H. (2022). Computer simulation of vocal tract resonance tuning strategies with respect to fundamental frequency and voice source spectral slope in singing. The Journal of the Acoustical Society of America, 152(6), 3548.More infoA well-known concept of singing voice pedagogy is "formant tuning," where the lowest two vocal tract resonances ( f, f) are systematically tuned to harmonics of the laryngeal voice source to maximize the level of radiated sound. A comprehensive evaluation of this resonance tuning concept is still needed. Here, the effect of f, f variation was systematically evaluated in silico across the entire fundamental frequency range of classical singing for three voice source characteristics with spectral slopes of -6, -12, and -18 dB/octave. Respective vocal tract transfer functions were generated with a previously introduced low-dimensional computational model, and resultant radiated sound levels were expressed in dB(A). Two distinct strategies for optimized sound output emerged for low vs high voices. At low pitches, spectral slope was the predominant factor for sound level increase, and resonance tuning only had a marginal effect. In contrast, resonance tuning strategies became more prevalent and voice source strength played an increasingly marginal role as fundamental frequency increased to the upper limits of the soprano range. This suggests that different voice classes (e.g., low male vs high female) likely have fundamentally different strategies for optimizing sound output, which has fundamental implications for pedagogical practice.
- Ikuma, T., Story, B., McWhorter, A. J., Adkins, L., & Kunduk, M. (2022). Harmonics-to-noise ratio estimation with deterministically time-varying harmonic model for pathological voice signals. The Journal of the Acoustical Society of America, 152(3), 1783.More infoThe harmonics-to-noise ratio (HNR) and other spectral noise parameters are important in clinical objective voice assessment as they could indicate the presence of nonharmonic phenomena, which are tied to the perception of hoarseness or breathiness. Existing HNR estimators are built on the voice signals to be nearly periodic (fixed over a short period), although voice pathology could induce involuntary slow modulation to void this assumption. This paper proposes the use of a deterministically time-varying harmonic model to improve the HNR measurements. To estimate the time-varying model, a two-stage iterative least squares algorithm is proposed to reduce model overfitting. The efficacy of the proposed HNR estimator is demonstrated with synthetic signals, simulated tremor signals, and recorded acoustic signals. Results indicate that the proposed algorithm can produce consistent HNR measures as the extent and rate of tremor are varied.
- Titze, I. R., & Story, B. (2022). A Newly Constituted National Center for Voice and Speech. Journal of voice : official journal of the Voice Foundation, 36(2), 149.
- Echternach, M., Herbst, C. T., Köberlein, M., Story, B., Döllinger, M., & Gellrich, D. (2021). Are source-filter interactions detectable in classical singing during vowel glides?. The Journal of the Acoustical Society of America, 149(6), 4565.More infoIn recent studies, it has been assumed that vocal tract formants (F) and the voice source could interact. However, there are only few studies analyzing this assumption in vivo. Here, the vowel transition /i/-/a/-/u/-/i/ of 12 professional classical singers (6 females, 6 males) when phonating on the pitch D4 [fundamental frequency (ƒ) ca. 294 Hz] were analyzed using transnasal high speed videoendoscopy (20.000 fps), electroglottography (EGG), and audio recordings. F data were calculated using a cepstral method. Source-filter interaction candidates (SFICs) were determined by (a) algorithmic detection of major intersections of F/nƒ and (b) perceptual assessment of the EGG signal. Although the open quotient showed some increase for the /i-a/ and /u-i/ transitions, there were no clear effects at the expected F/nƒ intersections. In contrast, ƒ adjustments and changes in the phonovibrogram occurred at perceptually derived SFICs, suggesting level-two interactions. In some cases, these were constituted by intersections between higher nƒ and F. The presented data partially corroborates that vowel transitions may result in level-two interactions also in professional singers. However, the lack of systematically detectable effects suggests either the absence of a strong interaction or existence of confounding factors, which may potentially counterbalance the level-two-interactions.
- Mailend, M. L., Maas, E., & Story, B. H. (2021). Apraxia of speech and the study of speech production impairments: Can we avoid further confusion? Reply to Romani (2021). Cognitive neuropsychology, 38(4), 309-317.More infoWe agree with Cristina Romani (CR) about reducing confusion and agree that the issues raised in her commentary are central to the study of apraxia of speech (AOS). However, CR critiques our approach from the perspective of basic cognitive neuropsychology. This is confusing and misleading because, contrary to CR's claim, we did not attempt to inform models of typical speech production. Instead, we relied on such models to study the impairment in the clinical category of AOS (translational cognitive neuropsychology). Thus, the approach along with the underlying assumptions is different. This response aims to clarify these assumptions, broaden the discussion regarding the methodological approach, and address CR's concerns. We argue that our approach is well-suited to meet the goals of our recent studies and is commensurate with the current state of the science of AOS. Ultimately, a plurality of approaches is needed to understand a phenomenon as complex as AOS.
- Story, B. H., & Bunton, K. (2021). Identification of voiced stop consonants produced by acoustically driven vocal tract modulations. JASA express letters, 1(8), 085203.More infoA recently developed speech production model, in which speech segments are specified by relative acoustic events called resonance deflection patterns, was used to generate speech signals that were presented to listeners in a perceptual test. The purpose was to determine the effect of variations of the magnitude and polarity of the third resonance deflection on identification of the consonant in a VCV disyllable while the deflections of the first and second resonances were held constant. Result showed that listeners' identification changed from /d/ to /ɡ/ when the polarity of the third resonance deflection switched from positive to negative.
- Story, B. H., & Bunton, K. (2021). The relation of velopharyngeal coupling area to the identification of stop versus nasal consonants in North American English based on speech generated by acoustically driven vocal tract modulations. The Journal of the Acoustical Society of America, 150(5), 3618.More infoThe purpose of this study was to determine the threshold of velopharyngeal coupling area at which listeners switch from identifying a consonant as a stop to a nasal in North American English, based on VCV stimuli generated with a speech production model that encodes phonetic segments as relative acoustic targets. Each VCV was synthesized with a set of velopharyngeal coupling functions whose area ranged from 0 to 0.1 cm. Results show that consonants were identified by listeners as a stop when the coupling area was less than 0.035-0.057 cm, depending on place of articulation and final vowel. The smallest coupling area (0.035 cm) at which the stop-to-nasal switch occurred was found for an alveolar consonant in the /ɑCi/ context, whereas the largest (0.057 cm) was for a bilabial in /ɑCɑ/. For each stimulus, the balance of oral versus nasal acoustic energy was characterized by the peak nasalance during the consonant. Stimuli with peak nasalance below 40% were mostly identified by listeners as stops, whereas those above 40% were identified as nasals. This study was intended to be a precursor to further investigations using the same model but scaled to represent the developing speech production system of male and female talkers.
- Story, B. H., Story, B. H., Mailend, M. L., Mailend, M. L., Beeson, P. M., Beeson, P. M., Story, B. H., Story, B. H., Mailend, M. L., Mailend, M. L., Maas, E., Maas, E., Forster, K. I., Forster, K. I., Beeson, P. M., & Beeson, P. M. (2021). Examining speech motor planning difficulties in apraxia of speech and aphasia via the sequential production of phonetically similar words.. Cognitive neuropsychology, 38(1), 72-87. doi:10.1080/02643294.2020.1847059More infoThis study investigated the underlying nature of apraxia of speech (AOS) by testing two competing hypotheses. The Reduced Buffer Capacity Hypothesis argues that people with AOS can plan speech only one syllable at a time Rogers and Storkel [1999. Planning speech one syllable at a time: The reduced buffer capacity hypothesis in apraxia of speech. Aphasiology, 13(9-11), 793-805. https://doi.org/10.1080/026870399401885]. The Program Retrieval Deficit Hypothesis states that selecting a motor programme is difficult in face of competition from other simultaneously activated programmes Mailend and Maas [2013. Speech motor programming in apraxia of speech: Evidence from a delayed picture-word interference task. American Journal of Speech-Language Pathology, 22(2), S380-S396. https://doi.org/10.1044/1058-0360(2013/12-0101)]. Speakers with AOS and aphasia, aphasia without AOS, and unimpaired controls were asked to prepare and hold a two-word utterance until a go-signal prompted a spoken response. Phonetic similarity between target words was manipulated. Speakers with AOS had longer reaction times in conditions with two similar words compared to two identical words. The Control and the Aphasia group did not show this effect. These results suggest that speakers with AOS need additional processing time to retrieve target words when multiple motor programmes are simultaneously activated.
- Bergevin, C., Narayan, C., Williams, J., Mhatre, N., Steeves, J. K., Bernstein, J. G., & Story, B. (2020). Overtone focusing in biphonic tuvan throat singing. eLife, 9.More infoKhoomei is a unique singing style originating from the republic of Tuva in central Asia. Singers produce two pitches simultaneously: a booming low-frequency rumble alongside a hovering high-pitched whistle-like tone. The biomechanics of this biphonation are not well-understood. Here, we use sound analysis, dynamic magnetic resonance imaging, and vocal tract modeling to demonstrate how biphonation is achieved by modulating vocal tract morphology. Tuvan singers show remarkable control in shaping their vocal tract to narrowly focus the harmonics (or overtones) emanating from their vocal cords. The biphonic sound is a combination of the fundamental pitch and a focused filter state, which is at the higher pitch (1-2 kHz) and formed by merging two formants, thereby greatly enhancing sound-production in a very narrow frequency range. Most importantly, we demonstrate that this biphonation is a phenomenon arising from linear filtering rather than from a nonlinear source.
- Story, B. H., Kadiri, S. R., Gowda, D., & Alku, P. (2020). Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals. IEEE Transactions on Audio, Speech, and Language Processing, 28, 1901-1914. doi:10.1109/taslp.2020.3000037More infoIn this paper, we propose a new method for the accurate estimation and tracking of formants in speech signals using time-varying quasi-closed-phase (TVQCP) analysis. Conventional formant tracking methods typically adopt a two-stage estimate-and-track strategy wherein an initial set of formant candidates are estimated using short-time analysis (e.g., 10–50 ms), followed by a tracking stage based on dynamic programming or a linear state-space model. One of the main disadvantages of these approaches is that the tracking stage, however good it may be, cannot improve upon the formant estimation accuracy of the first stage. The proposed TVQCP method provides a single-stage formant tracking that combines the estimation and tracking stages into one. TVQCP analysis combines three approaches to improve formant estimation and tracking: (1) it uses temporally weighted quasi-closed-phase analysis to derive closed-phase estimates of the vocal tract with reduced interference from the excitation source, (2) it increases the residual sparsity by using the $L_1$ optimization and (3) it uses time-varying linear prediction analysis over long time windows (e.g., 100–200 ms) to impose a continuity constraint on the vocal tract model and hence on the formant trajectories. Formant tracking experiments with a wide variety of synthetic and natural speech signals show that the proposed TVQCP method performs better than conventional and popular formant tracking tools, such as Wavesurfer and Praat (based on dynamic programming), the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on deep neural networks trained in a supervised manner). Matlab scripts for the proposed method can be found at: https://github.com/njaygowda/ftrack
- Wagner, M., Vorperian, H. K., Story, B. H., Milenkovic, P. H., & Kent, R. D. (2020). Effects of sampling rate and type of anti-aliasing filter on linear-predictive estimates of formant frequencies in men, women, and children.. The Journal of the Acoustical Society of America, 147(3), EL221. doi:10.1121/10.0000824More infoThe purpose of this study was to assess the effect of downsampling the acoustic signal on the accuracy of linear-predictive (LPC) formant estimation. Based on speech produced by men, women, and children, the first four formant frequencies were estimated at sampling rates of 48, 16, and 10 kHz using different anti-alias filtering. With proper selection of number of LPC coefficients, anti-alias filter and between-frame averaging, results suggest that accuracy is not improved by rates substantially below 48 kHz. Any downsampling should not go below 16 kHz with a filter cut-off centered at 8 kHz.
- Marja-Liisa, M., Edwin, M., Story, B. H., Kenneth, F., & Pelagie, B. (2019). Speech motor planning in the context of phonetically similar words: Evidence from apraxia of speech and aphasia. Neuropsychologia.
- N, N., Manu, A., Story, B. H., & Paavo, A. (2019). Estimation of the glottal source from coded telephone speech using deep neural networks. Speech Communication, 106, 95-104. doi:https://doi.org/10.1016/j.specom.2018.12.002
- Paavo, A., Tiina, M., Jarmo, M., Juha, K., Story, B. H., Manu, A., Mika, S., Erkki, V., & Ahmed, G. (2019). OPENGLOT - An open environment for the evaluation of glottal inverse filtering. Speech Communication, 107, 38-47. doi:https://doi.org/10.1016/j.specom.2019.01.005
- Story, B. H., & Bunton, K. (2019). A model of speech production based on the acoustic relativity of the vocal tract. The Journal of the Acoustical Society of America, 146(4), 2522.More infoA model is described in which the effects of articulatory movements to produce speech are generated by specifying relative acoustic events along a time axis. These events consist of directional changes of the vocal tract resonance frequencies that, when associated with a temporal event function, are transformed via acoustic sensitivity functions, into time-varying modulations of the vocal tract shape. Because the time course of the events may be considerably overlapped in time, coarticulatory effects are automatically generated. Production of sentence-level speech with the model is demonstrated with audio samples and vocal tract animations.
- Willi, M. M., Warner, N. L., & Story, B. H. (2019). Prediction of listener perception of place-of-articulation in reduced speech. Journal of the Acoustical Society of America, 145(3), 1912-1912. doi:10.1121/1.5101942More infoPrevious research on stop consonant production found that less than 60% of the stops sampled from a connected speech corpus contained a clearly defined hold duration followed by a plosive release [Crystal and House, JASA(1988)]. How listeners perceive reduced, voiced stop consonant variants is not well understood. The purpose of the current study was to investigate whether an acoustic cue called a relative formant deflection pattern was capable of predicting listeners’ perceptions of these approximant-like, voiced stop consonants variants. A new methodology motivated by a computational model of speech production was used to extract relative formant deflection patterns from excised VCV segments from a reduced speech database. Participants listened to a total of 56 excised VCV stimuli containing approximant-like, voice stop consonant variants and performed a force choice test (i.e., /b-d-g/). The agreement between the perceptions predicted by the relative formant deflection patterns and listeners’ behavioral performance was compared. The expected relative formant deflection pattern correctly predicted listeners' primary response for percent /b/ and /g/ identifications, but not for listeners’ percent /d/ identifications. The implications of these results on a possible invariant acoustic correlate for listeners’ perceptions of place-of-articulation information will be discussed.Previous research on stop consonant production found that less than 60% of the stops sampled from a connected speech corpus contained a clearly defined hold duration followed by a plosive release [Crystal and House, JASA(1988)]. How listeners perceive reduced, voiced stop consonant variants is not well understood. The purpose of the current study was to investigate whether an acoustic cue called a relative formant deflection pattern was capable of predicting listeners’ perceptions of these approximant-like, voiced stop consonants variants. A new methodology motivated by a computational model of speech production was used to extract relative formant deflection patterns from excised VCV segments from a reduced speech database. Participants listened to a total of 56 excised VCV stimuli containing approximant-like, voice stop consonant variants and performed a force choice test (i.e., /b-d-g/). The agreement between the perceptions predicted by the relative formant deflection patterns and listeners’ behaviora...
- Williams, J., Story, B. H., Steeves, J. K., Narayan, C. R., Mhatre, N., Bernstein, J. G., & Bergevin, C. (2019). Author response: Overtone focusing in biphonic tuvan throat singing. eLife. doi:10.7554/elife.50476.sa2
- Parham, M., Story, B. H., Paavo, A., & Hiroshi, A. (2018). Estimation of the glottal flow from speech pressure signals: Evaluation of three variants of iterative adaptive inverse filtering using computational physical modelling of voice production.. Speech Communication, 104, 24-38. doi:https://doi.org/10.1016/j.specom.2018.09.005
- Story, B. H. (2018). Acoustic communication by vocal tract modulation. Journal of the Acoustical Society of America, 143(3), 1787-1787. doi:10.1121/1.5035848More infoIn both human and nonhuman animals, the airway system may serve as an instrument for acoustic communication. Flow-induced tissue vibrations and noise sources generate the acoustic excitation, whereas the configuration of the vocal tract provides a variable resonant filter system that transforms the excitation into a “message.“ This presentation will describe the development of a vocal tract model in which the effects of articulatory movements that produce speech are generated by specifying independent acoustic events along a time axis. These events consist of directional changes in the first three resonance frequencies of an acoustically-neutral airway configuration and are transformed, via acoustic sensitivity functions, into time-varying modulations of the vocal tract shape. The duration of each event may be considerably overlapped in time with other events to produce efficient transmission of information through the effects of coarticulation. The model will be used to demonstrate construction of syllab...
- Samlan, R. A., & Story, B. H. (2017). Influence of left-right vocal fold asymmetries on voice quality in simulated paramedian vocal fold paralysis. Journal of Speech, Language, and Hearing Research.
- Story, B. H. (2017). Stories of speech science. Journal of the Acoustical Society of America, 142(4), 2616-2616. doi:10.1121/1.5014580More infoA fundamental aspect of teaching, on any topic, is the continual pursuit of telling a story. Although technology and advances in teaching methods may facilitate new and exciting forms of presenting course materials, they do not, by themselves, build the context for the content of a course. Every lecture, activity, homework assignment, project, quiz, and examination can be regarded as chapters that build, over the duration of a course, a compelling and engaging story in which students take part. The aim of this talk is to encourage development of speech science courses that weave together history, theory, technology, visual and auditory experience, assessment, and, importantly, the instructor’s own research to spin a good tale. [Work supported by NIH R01-DC011275 and NSF BCS-1145011.]
- Story, B. H., & Bunton, K. E. (2017). An acoustically-driven vocal tract model for stop consonant production. Speech Communication, 87, 1-17.
- Story, B. H., & Bunton, K. E. (2017). Vowel space density as an indicator of speech performance. Journal of the Acoustical Society of America Express Letters, 141(5), EL458-EL464.
- Story, B. H., Vorperian, H., & Bunton, K. E. (2017). Vocal tract growth model for males and females using area function transformations based on anatomic measurements. Journal of the Acoustical Society of America.
- Neely, K., Story, B. H., & Bunton, K. E. (2016). A modeling study of the effects of vocal tract movement duration and magnitude on the F2 trajectory in CV words. Journal of Speech, Language, and Hearing Science, 59, 1327-1334. doi:10.1044/2016_JSLHR-S-14-0331
- Story, B. H. (2016). The role of artificial speech in understanding the acoustic characteristics of spoken communication. Journal of the Acoustical Society of America, 140(4), 3316-3316. doi:10.1121/1.4970564More infoModels have long been used to understand the relation of anatomical structure and articulatory movement to the acoustics and perception of speech. Realized as speech synthesizers or artificial talkers, such models simplify and emulate the speech production system. One type of simplification is to view speech production as a set of simultaneously imposed modulations of the airway system. Specifically, the vibratory motion of the vocal folds modulates the glottal airspace, while slower movements of the tongue, jaw, lips, and velum modulate the shape of the pharyngeal and oral cavities, and coupling to the nasal system. The precise timing of these modulations produces an acoustic wave from which listeners extract phonetic and talker-specific information. The first aim of the presentation will be to review two historical models of speech production that exemplify a system in which structure is modulated with movement to produce intelligible speech. The second aim is to describe theoretical aspects of a comput...
- Story, B. H., & Bunton, K. (2016). Formant measurement in children's speech based on spectral filtering. Speech communication, 76, 93-111.More infoChildren's speech presents a challenging problem for formant frequency measurement. In part, this is because high fundamental frequencies, typical of a children's speech production, generate widely spaced harmonic components that may undersample the spectral shape of the vocal tract transfer function. In addition, there is often a weakening of upper harmonic energy and a noise component due to glottal turbulence. The purpose of this study was to develop a formant measurement technique based on cepstral analysis that does not require modification of the cepstrum itself or transformation back to the spectral domain. Instead, a narrow-band spectrum is low-pass filtered with a cutoff point (i.e., cutoff "quefrency" in the terminology of cepstral analysis) to preserve only the spectral envelope. To test the method, speech representative of a 2-3 year-old child was simulated with an airway modulation model of speech production. The model, which includes physiologically-scaled vocal folds and vocal tract, generates sound output analogous to a microphone signal. The vocal tract resonance frequencies can be calculated independently of the output signal and thus provide test cases that allow for assessing the accuracy of the formant tracking algorithm. When applied to the simulated child-like speech, the spectral filtering approach was shown to provide a clear spectrographic representation of formant change over the time course of the signal, and facilitates tracking formant frequencies for further analysis.
- Story, B. H., & Bunton, K. E. (2016). Arizona Child Acoustic Database Repository. Folia Phoniatrica et Logopaedica, 68(3), 107-111.
- Story, B. H., & Lester-smith, R. A. (2016). The effects of physiological adjustments on the perceptual and acoustical characteristics of vibrato as a model of vocal tremor.. The Journal of the Acoustical Society of America, 140(5), 3827. doi:10.1121/1.4967454More infoThe purpose of this study was to investigate the effects of physiological adjustments on listeners' perception of the magnitude of modulation of voice and to determine the characteristics of the acoustical modulations that explained listeners' judgments. This research was carried out using singers producing vibrato as a model of vocal tremor. Twenty healthy adults participated in a perceptual study involving pair-comparisons of the magnitude of "shakiness" with singers' samples, which differed by fundamental frequency, vocal quality, and vowel. Results revealed that listeners perceived a higher magnitude of voice modulation when female samples had a pressed vocal quality. Acoustical analyses were performed with voice samples to determine the features that predicted listeners' judgments. Based on regression analyses, listeners' judgments were predicted to some extent by modulation information in frequency bands across the spectrum.
- Story, B. H., Story, B. H., & Bunton, K. (2016). Identification of stop consonants produced by an acoustically-driven model of a child-like vocal tract. Journal of the Acoustical Society of America, 140(4), 3218-3218. doi:10.1121/1.4970143More infoA model of a child-like vocal tract has been developed such that the deformation patterns superimposed on a vowel substrate to generate coarticulated consonants are specified by a time-varying set of directional shifts in the first three resonance frequencies. These deflection patterns are denoted as a combination of three numbers each of which can vary between -1 and 1; a negative value implies a downward shift in a resonance frequency whereas an upward shift results for positive value. For example, a “bilabial” consonant specified as [-1,-1,-1] would be transformed via calculations of acoustic sensitivity functions to a time-varying vocal tract shape that presents the expected constriction at the lips, but also modifies other parts of the vocal tract that may be necessary for producing the appropriate formant transitions into and out of the consonant. Using this model, three sets of 30 VCV utterances were generated in which the values of deflection patterns were set to produce vocal tract shapes that hy...
- Story, B. H., Story, B. H., Bunton, K. E., & Bunton, K. E. (2016). Formant measurement in children's speech based on spectral filtering. Speech Communication, 76, 93-111. doi:http://dx.doi.org/10.1016/j.specom.2015.11.001
- Willi, M. M., & Story, B. H. (2016). Prediction of listener perception of reduced, voice stop consonant simulations based on patterns of formant deflections. Journal of the Acoustical Society of America, 140(4), 3216-3216. doi:10.1121/1.4970133More infoPrevious research on stop consonants found that less than 60 percent of the stops sampled from a speech corpus contained a clearly defined period of silence or prevoicing prior to the plosive release [Crystal & House, JASA, 1988]. How listeners perceive a reduced form of stop consonants without these cues is not well understood. The purpose of this experiment was to investigate whether recasting typical formant transitions into a measure called a “relative formant deflection pattern” provides a means of predicting listeners’ perceptions of approximant-like, voiced stop consonant variants. A computational model of speech production, in which consonant constriction location was varied along the length of the vocal tract, was used to generate place continua of approximate-like, voiced stop consonants imposed on a vowel-to-vowel transition. Stimuli were presented to listeners in three conditions: 1) normal simulated speech, 2) sinewave speech in which three tones replicated the time course of the F1, F2, and ...
- Carbonell, K. M., Lester, R. A., Story, B. H., & Lotto, A. J. (2015). Discriminating simulated vocal tremor source using amplitude modulation spectra. Journal of voice : official journal of the Voice Foundation, 29(2), 140-7.More infoSources of vocal tremor are difficult to categorize perceptually and acoustically. This article describes a preliminary attempt to discriminate vocal tremor sources through the use of spectral measures of the amplitude envelope. The hypothesis is that different vocal tremor sources are associated with distinct patterns of acoustic amplitude modulations.
- Lester, R. A., & Story, B. H. (2015). The effects of physiological adjustments on the perceptual and acoustical characteristics of simulated laryngeal vocal tremor. The Journal of the Acoustical Society of America, 138(2), 953-63.More infoThe purpose of this study was to determine if adjustments to the voice source [i.e., fundamental frequency (F0), degree of vocal fold adduction] or vocal tract filter (i.e., vocal tract shape for vowels) reduce the perception of simulated laryngeal vocal tremor and to determine if listener perception could be explained by characteristics of the acoustical modulations. This research was carried out using a computational model of speech production that allowed for precise control and manipulation of the glottal and vocal tract configurations. Forty-two healthy adults participated in a perceptual study involving pair-comparisons of the magnitude of "shakiness" with simulated samples of laryngeal vocal tremor. Results revealed that listeners perceived a higher magnitude of voice modulation when simulated samples had a higher mean F0, greater degree of vocal fold adduction, and vocal tract shape for /i/ vs /ɑ/. However, the effect of F0 was significant only when glottal noise was not present in the acoustic signal. Acoustical analyses were performed with the simulated samples to determine the features that affected listeners' judgments. Based on regression analyses, listeners' judgments were predicted to some extent by modulation information present in both low and high frequency bands.
- Story, B. H. (2015). Ken Stevens’ influence on the development of paradigms for speech synthesis. Journal of the Acoustical Society of America, 137(4), 2328-2328. doi:10.1121/1.4920506More infoSynthetic speech has long been used as a means of understanding both speech production and speech perception, as well as for technological applications such as text-to-speech devices. Paradigms for developing speech synthesis systems have included electrical circuits, digital filters, and computational models that replicate either the structure or acoustic characteristics of the voice source and vocal tract. This presentation will focus on how Ken Stevens’ investigations of speech, spanning more than five decades, have directly influenced essentially every paradigm of speech synthesis, including formant synthesis, articulatory synthesis, and speech production modeling. [Work supported by NIH R01-DC011275 and NSF BCS-1145011.]
- Titze, I. R., Baken, R. J., Bozeman, K. W., Granqvist, S., Henrich, N., Herbst, C. T., Howard, D. M., Hunter, E. J., Kaelin, D., Kent, R. D., Kreiman, J., Kob, M., Löfqvist, A., McCoy, S., Miller, D. G., Noé, H., Scherer, R. C., Smith, J. R., Story, B. H., , Švec, J. G., et al. (2015). Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization. The Journal of the Acoustical Society of America, 137(5), 3005-7.
- Tucker, B. V., & Story, B. (2015). The relation of the temporal variation of F1, F2, and F3 to articulator movement. significance, 2, F3.
- Willi, M. M., & Story, B. H. (2015). Acoustic modeling of the perception of place information in incomplete stops. Journal of the Acoustical Society of America, 137(4), 2305-2305. doi:10.1121/1.4920420More infoPrevious research on stop consonant production found that less than 60% of the stops sampled from a connected speech corpus contained a clearly defined hold duration followed by a plosive release [Crystal & House, JASA 1988]. How listeners perceive the remaining portion of incomplete stop consonants is not well understood. The purpose of the current study was to investigate whether relative formant deflection patterns, a potential model of acoustic invariance proposed by Story and Bunton (2010), is capable of predicting listeners’ perceptions of acoustically continuous, voiced stop consonants lacking a canonical hold duration. Listeners were randomly presented a total of 60 voiced stop-consonant VCV stimuli, each 100 ms in duration, synthesized using a computational model of speech production. Stimuli were created using a continuum of 20 equal step constrictions along the length of the vocal tract in three vowel-to-vowel contexts [see Story & Bunton, JSLHR 2010]. Participants listened to the stimuli and performed a forced choice test (i.e., /b-d-g/). The phonetic boundaries predicted by the relative formant deflection patterns and phonetic boundaries obtained by the forced choice test were compared to determine the ability of the acoustic model to predict participants’ perceptions. The acoustic and perceptual results are reported. [Work supported by NIH R01-DC011275.]
- Airaksinen, M., Raitio, T., Story, B., & Alku, P. (2014). Quasi closed phase glottal inverse filtering analysis with weighted linear prediction. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22(3), 596--607.
- Auvinen, H., Raitio, T., Airaksinen, M., Siltanen, S., Story, B. H., & Alku, P. (2014). Automatic glottal inverse filtering with the Markov chain Monte Carlo method. Computer Speech \& Language, 28(5), 1139-1155.
- Carbonell, K. M., Lester, R. A., Story, B. H., & Lotto, A. J. (2014). Discriminating Simulated Vocal Tremor Source Using Amplitude Modulation Spectra. Journal of Voice.
- Lester, R. A., Story, B. H., Story, B. H., Lotto, A. J., & Lester, R. A. (2014). Acoustical bases for the perception of simulated laryngeal vocal tremor. Journal of the Acoustical Society of America, 136(4), 2293-2293. doi:10.1121/1.4900289More infoVocal tremor involves atypical modulation of the fundamental frequency (F0) and intensity of the voice. Previous research on vocal tremor has focused on measuring the modulation rate and extent of the F0 and intensity without characterizing other modulations present in the acoustic signal (i.e., modulation of the harmonics). Characteristics of the voice source and vocal tract filter are known to affect the amplitude of the harmonics and could potentially be manipulated to reduce the perception of vocal tremor. The purpose of this study was to determine the adjustments that could be made to the voice source or vocal tract filter to alter the acoustic output and reduce the perception of modulation. This research was carried out using a computational model of speech production that allows for precise control and modulation of the glottal and vocal tract configurations. Results revealed that listeners perceived a higher magnitude of voice modulation when simulated samples had a higher mean F0, greater degree ...
- Monson, B. B., Hunter, E. J., Lotto, A. J., & Story, B. H. (2014). The perceptual significance of high-frequency energy in the human voice. Frontiers in psychology, 5.
- Monson, B. B., Lotto, A. J., & Story, B. H. (2014). Detection of high-frequency energy level changes in speech and singing. Journal of the Acoustical Society of America, 135(1), 400-406.More infoAbstract: Previous work has shown that human listeners are sensitive to level differences in high-frequency energy (HFE) in isolated vowel sounds produced by male singers. Results indicated that sensitivity to HFE level changes increased with overall HFE level, suggesting that listeners would be more "tuned" to HFE in vocal production exhibiting higher levels of HFE. It follows that sensitivity to HFE level changes should be higher (1) for female vocal production than for male vocal production and (2) for singing than for speech. To test this hypothesis, difference limens for HFE level changes in male and female speech and singing were obtained. Listeners showed significantly greater ability to detect level changes in singing vs speech but not in female vs male speech. Mean differences limen scores for speech and singing were about 5 dB in the 8-kHz octave (5.6-11.3 kHz) but 8-10 dB in the 16-kHz octave (11.3-22 kHz). These scores are lower (better) than those previously reported for isolated vowels and some musical instruments. © 2014 Acoustical Society of America.
- Monson, B. B., Lotto, A. J., & Story, B. H. (2014). Gender and vocal production mode discrimination using the high frequencies for speech and singing. Frontiers in psychology, 5.
- Samlan, R. A., Story, B. H., Lotto, A. J., & Bunton, K. (2014). Acoustic and Perceptual Effects of Left--Right Laryngeal Asymmetries Based on Computational Modeling. Journal of Speech, Language, and Hearing Research, 57(5), 1619--1637.
- Story, B. H. (2014). Eerie voices: Odd combinations, extremes, and irregularities. Journal of the Acoustical Society of America, 136(4), 2272-2272. doi:10.1121/1.4900210More infoThe human voice can project an eerie quality when certain characteristics are present in a particular context. Some types of eerie voices may be derived from physiological scaling of the speech production system that is either humanly impossible or nearly so. By combining previous work on adult speech, and current research on speech development, the purpose of this study was to simulate vocalizations and speech based on unusual configurations of the vocal tract and vocal folds, and by imposing irregularities on movement and vibration. The resulting sound contains qualities that are human-like, but not typical, and hence may give the perceptual impression of eeriness. [Supported in part by NIH R01-DC011275.]
- Story, B. H. (2014). Structure, Movement, Sound, and Perception. Perspectives on speech science and orofacial disorders, 24, 7-20.More infoModels that take the form of artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The article begins with a brief history of two artificial speaking devices that exemplify the representation of speech production as a system of modulations. The development of a recent airway modulation model is then described that simulates the time-varying changes of the vocal tract and acoustic wave propagation. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener.
- Story, B. H. (2014). Structure, Movement, Sound, and Perception. SIG 5 Perspectives on Speech Science and Orofacial Disorders, 24(1), 7--20.
- Story, B. H., Story, B. H., Monson, B. B., & Lotto, A. J. (2014). Speech spectral intensity discrimination at frequencies above 6 kHz. Journal of the Acoustical Society of America, 136(4), 2307-2307. doi:10.1121/1.4900347More infoHearing aids and other communication devices (e.g., mobile phones) have made some recent efforts to extend their bandwidths to represent higher frequencies. The impact of this expansion on speech perception is not well characterized. To assess human sensitivity to speech high-frequency energy (HFE, defined here as energy in the 8- and 16-kHz octave bands), difference limens for HFE level changes in male and female speech and singing were obtained. Listeners showed significantly greater ability to detect level changes in singing vs. speech, but not in female vs. male speech. Mean differences limen scores for speech and singing were about 5 dB in the 8-kHz octave (5.6–11.3 kHz) but 8–10 dB in the 16-kHz octave (11.3–22 kHz). These scores are lower (better) than scores previously reported for isolated vowels and some musical instruments, and similar to scores previously reported for white noise.
- Alku, P., Pohjalainen, J., Vainio, M., Laukkanen, A., & Story, B. H. (2013). Formant frequency estimation of high-pitched vowels using weighted linear prediction. Journal of the Acoustical Society of America, 134(2), 1295-1313.More infoPMID: 23927127;Abstract: All-pole modeling is a widely used formant estimation method, but its performance is known to deteriorate for high-pitched voices. In order to address this problem, several all-pole modeling methods robust to fundamental frequency have been proposed. This study compares five such previously known methods and introduces a technique, Weighted Linear Prediction with Attenuated Main Excitation (WLP-AME). WLP-AME utilizes temporally weighted linear prediction (LP) in which the square of the prediction error is multiplied by a given parametric weighting function. The weighting downgrades the contribution of the main excitation of the vocal tract in optimizing the filter coefficients. Consequently, the resulting all-pole model is affected more by the characteristics of the vocal tract leading to less biased formant estimates. By using synthetic vowels created with a physical modeling approach, the results showed that WLP-AME yields improved formant frequencies for high-pitched sounds in comparison to the previously known methods (e.g., relative error in the first formant of the vowel [a] decreased from 11% to 3% when conventional LP was replaced with WLP-AME). Experiments conducted on natural vowels indicate that the formants detected by WLP-AME changed in a more regular manner between repetitions of different pitch than those computed by conventional LP. © 2013 Acoustical Society of America.
- Bunton, K., Story, B. H., & Titze, I. (2013). Estimation of vocal tract area functions in children based on measurement of lip termination area and inverse acoustic mapping. Proceedings of Meetings on Acoustics, 19.More infoAbstract: Although vocal tract area functions for adult talkers can be acquired with medical imaging techniques such as Magnetic Resonance Imaging (MRI), similar information concerning children's vocal tracts during speech production is difficult to obtain. This is largely because the demanding nature of the data collection tasks is not suitable for children. The purpose of this study was to determine the feasibility of mapping formant frequencies measured from the [i, ae, a, u] vowels produced by three children (age range 4 to 6 years), to estimated vocal tract area functions. Formants were measured with a pitch-synchronous LPC approach, and the inverse mapping was based on calculations of acoustic sensitivity functions [Story, J. Acoust. Soc., Am., 119, 715-718]. In addition, the mapping was constrained by measuring the lip termination area from digital video frames collected simultaneously with the audio sample. Experimental results were augmented with speech simulations to provide some validation of the technique. © 2013 Acoustical Society of America.
- Lester, R. A., & Story, B. H. (2013). Acoustic characteristics of respiratory-induced vocal tremor. American Journal of Speech Language Pathology, 22, 205-211.
- Lester, R. A., & Story, B. H. (2013). Modulation of voice related to simulated vocal fold length change with cricothyroid and thyroarytenoid muscle activation. Proc. 10th Intl. Conf. Adv. Quan. Laryng., Voice, and Spch Res, 63--64.
- Lester, R. A., Barkmeier-Kraemer, J., & Story, B. H. (2013). Physiologic and acoustic patterns of essential vocal tremor. Journal of Voice, 27(4), 422-432.More infoPMID: 23490130;Abstract: Objectives/Hypothesis: This article describes a case study of physiologic and acoustic patterns of essential vocal tremor (EVT). Simulations of vocal tremor were used to test hypotheses regarding measured acoustic patterns and expected physiologic sources. Study Design: This is a case study of EVT using an analysis by synthesis approach. Methods: Oscillations of vocal tract and laryngeal structures were identified using rigid videostroboscopic examination. Acoustical analyses of sustained phonation were completed using the methods previously described in the literature and custom-written MATLAB functions. Simulations of the client's vocal tremor were created using a computational model. Results: The client exhibited vocal fold length changes and oscillation within the laryngeal vestibule during sustained phonation at a comfortable pitch and loudness. Despite the involvement of vocal fold length changes, a low average extent of fundamental frequency (F0) modulation (ie, 5.3%) and high average extent of intensity modulation (ie, 23.0%) were measured. Simulations of vocal tremor involving modulation of F0 demonstrated that this source of tremor contributes to frequency-induced intensity modulation, although there was a greater extent of F0 modulation than intensity modulation. Conclusions: The greater extent of intensity than F 0 modulation in one client with EVT exhibiting predominant vocal fold length changes contrasted with the lower extent of intensity than F0 modulation in simulated vocal tremor involving F0 modulation. These findings demonstrate that other potential sources of intensity modulation outside the larynx should be determined during the evaluation of clients with vocal tremor. © 2013 The Voice Foundation.
- Samlan, R. A., Story, B. H., & Bunton, K. (2013). Relation of perceived breathiness to laryngeal kinematics and acoustic measures based on computational modeling. Journal of Speech, Language, and Hearing Research, 56(4), 1209-1223.More infoPMID: 23785184;PMCID: PMC3984008;Abstract: Purpose: In this study, the authors sought to determine (a) how specific vocal fold structural and vibratory features relate to breathy voice quality and (b) the relation of perceived breathiness to 4 acoustic correlates of breathiness. Method: A computational, kinematic model of the vocal fold medial surfaces was used to specify features of vocal fold structure and vibration in a manner consistent with breathy voice. Four model parameters were altered: vocal process separation, surface bulging, vibratory nodal point, and epilaryngeal constriction. Twelve naBve listeners rated breathiness of 364 samples relative to a reference. The degree of breathiness was then compared to (a) the underlying kinematic profile and (b) 4 acoustic measures: cepstral peak prominence (CPP), harmonics-to-noise ratio, and two measures of spectral slope. Results: Vocal process separation alone accounted for 61.4% of the variance in perceptual rating. Adding nodal point ratio and bulging to the equation increased the explained variance to 88.7%. The acoustic measure CPP accounted for 86.7% of the variance in perceived breathiness, and explained variance increased to 92.6% with the addition of one spectral slope measure. Conclusion: Breathiness ratings were best explained kinematically by the degree of vocal process separation and acoustically by CPP. © American Speech-Language-Hearing Association.
- Schleusing, O., Kinnunen, T., Story, B., & Vesin, J. (2013). Joint source-filter optimization for accurate vocal tract estimation using differential evolution. IEEE Transactions on Audio, Speech and Language Processing, 21(8), 1560-1572.More infoAbstract: In this work, we present a joint source-filter optimization approach for separating voiced speech into vocal tract (VT) and voice source components. The presented method is pitch-synchronous and thereby exhibits a high robustness against vocal jitter, shimmer and other glottal variations while covering various voice qualities. The voice source is modeled using the Liljencrants-Fant (LF) model, which is integrated into a time-varying auto-regressive speech production model with exogenous input (ARX). The non-convex optimization problem of finding the optimal model parameters is addressed by a heuristic, evolutionary optimization method called differential evolution. The optimization method is first validated in a series of experiments with synthetic speech. Estimated glottal source and VT parameters are the criteria used for comparison with the iterative adaptive inverse filter (IAIF) method and the linear prediction (LP) method under varying conditions such as jitter, fundamental frequency $(f-0)$ as well as environmental and glottal noise. The results show that the proposed method largely reduces the bias and standard deviation of estimated VT coefficients and glottal source parameters. Furthermore, the performance of the source-filter separation is evaluated in experiments using speech generated with a physical model of speech production. The proposed method reliably estimates glottal flow waveforms and lower formant frequencies. Results obtained for higher formant frequencies indicate that research on more accurate voice source models and their interaction with the VT is necessary to improve the source-filter separation. The proposed optimization approach promises to be a useful tool for future research addressing this topic. © 2006-2012 IEEE.
- Story, B. H., Story, B. H., & Bunton, K. (2013). Production of child-like vowels with nonlinear interaction of glottal flow and vocal tract resonances. Journal of the Acoustical Society of America, 133(5), 060303-060303. doi:10.1121/1.4806754More infoAcoustically, the mechanisms of vocal sound production may be considered to exist along a continuum. At one end, the glottal flow wave is weakly coupled to the resonances of the vocal tract such that the output is a linear combination of their respective acoustic characteristics, whereas at the other end there is strong nonlinear coupling of the flow source to the vocal tract resonances. To express phonetic properties in the output, such as formants, the linear case requires that the source produce sound that is rich in harmonic or broadband energy. In contrast, the nonlinear case allows for the possibility of an harmonically-rich source signal to be generated even when the glottal area variation is so simple that it may contain only one harmonic (i.e., a sinusoid) [Titze, J. Acoust. Soc. Am. 123 (2008)]. The latter case is most likely to occur when the fundamental frequency is relatively high, such as in children’s speech. The purpose of this study was to investigate the nonlinear end of the continuum with respect to the harmonic content of the glottal flow and pressure waveforms for vowels generated with a model of a child-like speech production system. [Research supported by NIH R01-DC011275, NSF BCS-1145011.]
- Story, B., & Story, B. H. (2013). Phrase-level speech simulation with an airway modulation model of speech production. Computer speech & language, 27(4).More infoArtificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated.
- Story, B., Hunter, E., & Scherer, R. (2013). The academic family tree of Ingo Titze. The Journal of the Acoustical Society of America, 134(5), 4018--4018.
- Alku, P., Pohjalainen, J., Vainio, M., Laukkanen, A., & Story, B. (2012). Improved formant frequency estimation from high-pitched vowels by downgrading the contribution of the glottal source with weighted linear prediction. 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, 2, 1610-1613.More infoAbstract: Since performance of conventional linear prediction (LP) deteriorates in formant estimation of high-pitched voices, several all-pole modeling methods robust to F0 have been developed. This study compares five such previously known methods and proposes a new technique, Weighted Linear Prediction with Attenuated Main Excitation (WLP-AME). WLP-AME utilizes weighted linear prediction in which the square of the prediction error is multiplied with a weighting function that downgrades the contribution of the glottal source in the model optimization. Consequently, the resulting all-pole model is affected more by the vocal tract characteristics, which leads to more accurate formant estimates. By using synthetic vowels created with a physical modeling approach, the study shows that WLP-AME yields improved formant frequency estimates for high-pitched vowels in comparison to the previously known methods.
- Bunton, K., Bunton, K., Story, B. H., & Story, B. H. (2012). The relation of nasality and nasalance to nasal port area based on a computational model. Cleft Palate-Craniofacial Journal, 49(6), 741-749.More infoPMID: 21970695;PMCID: PMC3638741;Abstract: Objective: The purpose of this study was to examine the relation of perceptual ratings of nasality by experienced listeners, measures of nasalance, and the size of the nasal port opening for three simulated English corner vowels, /i/, /u/, and /a/. Design: Samples were generated using a computational model that allowed for exact control of nasal port size and a direct measure of nasalance. Perceptual ratings were obtained using a paired-stimulus presentation. Participants: Five experienced listeners. Main Outcome Measures: Measures of nasalance and perceptual nasality ratings. Results: Differences in nasalance and perceptual ratings of nasality were noted among the three vowels, with values being greater for the high vowels /i/ and /u/ compared to the low vowel /a/. Listeners detected nasality for the high and low vowels simulated with nasal port areas of 0.01 and 0.15 cm2, respectively. Correlations between ratings of nasality and nasalance were high for all three vowels. Conclusions: Results of the present study show a high correlation between ratings of nasality and measures of nasalance for nasal port areas ranging from 0 to 0.5 cm2. The correlations were based on sustained vowel samples. The restricted speech sample limits generalization of the findings to clinical data; however, the results are a demonstration of the usefulness of modeling to understand the perceptual phenomena of nasality. © Copyright 2012 American Cleft Palate-Craniofacial Association.
- Carbonell, K. M., Story, B., Lester, R., & Lotto, A. J. (2012). Discriminating vocal tremor source from amplitude envelope modulations. The Journal of the Acoustical Society of America, 132(3), 2090--2090.
- Monson, B. B., Hunter, E. J., & Story, B. H. (2012). Horizontal directivity of low- and high-frequency energy in speech and singing. Journal of the Acoustical Society of America, 132(1), 433-441.More infoPMID: 22779490;PMCID: PMC3407162;Abstract: Speech and singing directivity in the horizontal plane was examined using simultaneous multi-channel full-bandwidth recordings to investigate directivity of high-frequency energy, in particular. This method allowed not only for accurate analysis of running speech using the long-term average spectrum, but also for examination of directivity of separate transient phonemes. Several vocal production factors that could affect directivity were examined. Directivity differences were not found between modes of production (speech vs singing) and only slight differences were found between genders and production levels (soft vs normal vs loud), more pronounced in the higher frequencies. Large directivity differences were found between specific voiceless fricatives, with /s,l/ more directional than /f,θ/ in the 4, 8, 16 kHz octave bands. © 2012 Acoustical Society of America.
- Monson, B. B., Lotto, A. J., & Story, B. H. (2012). Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives. Journal of the Acoustical Society of America, 132(3), 1754-1764.More infoPMID: 22978902;PMCID: PMC3460988;Abstract: The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech. © 2012 Acoustical Society of America.
- Monson, B. B., Story, B., & Lotto, A. (2012). Analysis of high-frequency energy in singing and speech. The Journal of the Acoustical Society of America, 131(4), 3378--3378.
- Vitela, A. D., Lotto, A. J., & Story, B. H. (2012). “Talker normalization” effects elicited with no change in talker. The Journal of the Acoustical Society of America, 132(3), 1967--1967.
- Bunton, K., & Story, B. H. (2011). A test of formant frequency analyzes with simulated child-like vowels.. The Journal of the Acoustical Society of America, 129(4), 2626--2626.
- Monson, B. B., Lotto, A. J., & Story, B. H. (2011). Perception of high-frequency energy in singing and speech.. The Journal of the Acoustical Society of America, 129(4), 2581--2581.
- Monson, B. B., Vitela, A. D., Story, B. H., & Lotto, A. J. (2011). Perceptually relevant information in energy above 5 kHz for speech and singing. The Journal of the Acoustical Society of America, 130(4), 2569--2569.
- Samlan, R. A., & Story, B. H. (2011). Relation of structural and vibratory kinematics of the vocal folds to two acoustic measures of breathy voice based on computational modeling. Journal of Speech, Language, and Hearing Research, 54(5), 1267-1283.More infoPMID: 21498582;PMCID: PMC3184371;Abstract: Purpose: To relate vocal fold structure and kinematics to 2 acoustic measures: cepstral peak prominence (CPP) and the amplitude of the first harmonic relative to the second (H1-H2). Method: The authors used a computational, kinematic model of the medial surfaces of the vocal folds to specify features of vocal fold structure and vibration in a manner consistent with breathy voice. Four model parameters were altered: degree of vocal fold adduction, surface bulging, vibratory nodal point, and supraglottal constriction. CPP and H1-H2 were measured from simulated glottal area, glottal flow, and acoustic waveforms and were related to the underlying vocal fold kinematics. Results: CPP decreased with increased separation of the vocal processes, whereas the nodal point location had little effect. H1-H2 increased as a function of separation of the vocal processes in the range of 1.0mmto 1.5mmand decreased with separation >1.5 mm. Conclusions: CPP is generally a function of vocal process separation. H1*-H2* (see paragraph 6 of article text for an explanation of the asterisks) will increase or decrease with vocal process separation on the basis of vocal fold shape, pivot point for the rotational mode, and supraglottal vocal tract shape, limiting its utility as an indicator of breathy voice. Future work will relate the perception of breathiness to vocal fold kinematics and acoustic measures.
- Story, B. H. (2011). An overview of acoustic research in Speech Communication.. The Journal of the Acoustical Society of America, 129(4), 2406--2406.
- Story, B. H. (2011). TubeTalker: An airway modulation model of human sound production. Proceedings of the International Workshop on Performative Speech and Singing Synthesis March, 14--15.
- Story, B. H., & Bunton, K. (2011). Decomposition of vowel and consonant contributions to the time-varying vocal tract shape.. The Journal of the Acoustical Society of America, 129(4), 2456--2456.
- Titze, I. R., Worley, A. S., & Story, B. H. (2011). Source-vocal tract interaction in female operatic singing and theater belting, J. Singing, 67(5), 561--572.
- Barkmeier-Kraemer, J., & Story, B. (2010). Conceptual and clinical updates on vocal tremor. ASHA Leader, 15(14).
- Barkmeier-Kraemer, J., & Story, B. (2010). Conceptual and clinical updates on vocal tremor. ASHA Leader, 15, 16--19.
- Bunton, K., & Story, B. H. (2010). Identification of synthetic vowels based on a time-varying model of the vocal tract area function. Journal of the Acoustical Society of America, 127(4), EL146-EL152.More infoPMID: 20369982;PMCID: PMC2855717;Abstract: The purpose of this study was to conduct an identification experiment with synthetic vowels based on the same sets of speaker-dependent area functions as in Bunton and Story [(2009) J. Acoust. Soc. Am. 125, 19-22], but with additional time-varying characteristics that are more representative of natural speech. The results indicated that vowels synthesized using an area function model that allows for time variation of the vocal tract shape and includes natural vowel durations were more accurately identified for 7 of 11 English vowels than those based on static area functions. © 2010 Acoustical Society of America.
- Story, B. H., & Bunton, K. (2010). Relation of vocal tract shape, formant transitions, and stop consonant identification. Journal of Speech, Language, and Hearing Research, 53(6), 1514-1528.More infoPMID: 20643794;PMCID: PMC3145491;Abstract: Purpose: The present study was designed to investigate the relation of formant transitions to place-of-articulation for stop consonants. A speech production model was used to generate simulated utterances containing voiced stop consonants, and a perceptual experiment was performed to test their identification by listeners. Method: Based on a model of the vocal tract shape, a theoretical basis for reducing highly variable formant transitions to more invariant formant deflection patterns as a function of constriction location was proposed. A speech production model was used to simulate vowel-consonant-vowel (VCV) utterances for 3 underlying vowel-vowel contexts and for which the constriction location was incrementally moved from the lips toward the velar part of the vocal tract. These simulated VCVs were presented to listeners who were asked to identify the consonant. Results: Listener responses indicated that phonetic boundaries were well aligned with points along the vocal tract length where there was a shift in the deflection polarity of either the 2nd or 3rd formant. Conclusions: This study demonstrated that regions of the vocal tract exist that, when constricted, shift the formant frequencies in a predictable direction. Based on a perceptual experiment, the boundaries of these acoustically defined regions were shown to coincide with phonetic categories for stop consonants. © American Speech-Language-Hearing Association.
- Vitela, A. D., Story, B. H., & Lotto, A. J. (2010). Predicting the effect of talker differences on perceived vowel category.. The Journal of the Acoustical Society of America, 128(4), 2349--2349.
- Alku, P., Magi, C., Yrttiaho, S., Bäckström, T., & Story, B. (2009). Closed phase covariance analysis based on constrained linear prediction for glottal inverse filtering. Journal of the Acoustical Society of America, 125(5), 3289-3305.More infoPMID: 19425671;Abstract: Closed phase (CP) covariance analysis is a widely used glottal inverse filtering method based on the estimation of the vocal tract during the glottal CP. Since the length of the CP is typically short, the vocal tract computation with linear prediction (LP) is vulnerable to the covariance frame position. The present study proposes modification of the CP algorithm based on two issues. First, and most importantly, the computation of the vocal tract model is changed from the one used in the conventional LP into a form where a constraint is imposed on the dc gain of the inverse filter in the filter optimization. With this constraint, LP analysis is more prone to give vocal tract models that are justified by the source-filter theory; that is, they show complex conjugate roots in the formant regions rather than unrealistic resonances at low frequencies. Second, the new CP method utilizes a minimum phase inverse filter. The method was evaluated using synthetic vowels produced by physical modeling and natural speech. The results show that the algorithm improves the performance of the CP-type inverse filtering and its robustness with respect to the covariance frame position. © 2009 Acoustical Society of America.
- Samlan, R. A., Story, B. H., & Bunton, K. (2009). Kinematic modeling and acoustic measures of breathy voice.. The Journal of the Acoustical Society of America, 126(4), 2221--2221.
- Story, B. H. (2009). A possible role of nonlinear source-filter interaction in simulation of childlike speech.. The Journal of the Acoustical Society of America, 125(4), 2637--2637.
- Story, B. H. (2009). Advances in simulation of sentence-level speech production with kinematic models of the vocal tract and vocal folds.. The Journal of the Acoustical Society of America, 126(4), 2205--2205.
- Story, B. H. (2009). Erratum: "A parametric model of the vocal tract area function for vowel and consonant production" (Journal of the Acoustical Society of America (2005) 117, (3231-3254)). Journal of the Acoustical Society of America, 125(2), 1248-.
- Story, B. H., & Bunton, K. (2009). Relation of vocal tract constriction location to identification of voiced stop consonants.. The Journal of the Acoustical Society of America, 125(4), 2569--2569.
- Story, B., & Story, B. H. (2009). Vocal tract modes based on multiple area function sets from one speaker. The Journal of the Acoustical Society of America, 125(4).More infoThe purpose of this study was to derive vocal tract modes from a wider range of vowel area functions for a specific speaker than has been previously reported. Area functions from Story et al. [(1996). J. Acoust. Soc. Am. 100, 537-554] and Story [(2008). J. Acoust. Soc. Am. 123, 327-335] were combined in a composite set from which modes were derived with principal component analysis. Along with scaling coefficients, these modes were used to generate a [F1, F2] formant space. In comparison to formant spaces similarly generated based on the two area function sets alone, the combined version provides a wider range of both F1 and F2 values. This new set of modes may be useful for inverse mapping of formant frequencies to area functions or for modeling of vocal tract shape changes.
- Story, B., & Story, B. H. (2009). Vowel and consonant contributions to vocal tract shape. The Journal of the Acoustical Society of America, 126(2).More infoThe purpose of this study was to develop a method by which a vowel-consonant-vowel (VCV) utterance based on x-ray microbeam articulatory data could be separated into a vowel-to-vowel transition and a consonant superposition function. The result is a model that represents a vowel sequence as a time-dependent perturbation of the neutral vocal tract shape governed by coefficients of canonical deformation patterns. Consonants were modeled as superposition functions that can force specific portions of the vocal tract shape to be constricted or expanded, over a specific time course. The three VCVs [pa], [ta], and [ka], produced by one female speaker, were analyzed and reconstructed with the developed model. They were shown to be reasonable approximations of the original VCVs, as assessed qualitatively by visual inspection and quantitatively by calculating rms error and correlation coefficients. This establishes a method for future modeling of other speech material.
- Story, B., Bunton, K., & Story, B. H. (2009). Identification of synthetic vowels based on selected vocal tract area functions. The Journal of the Acoustical Society of America, 125(1).More infoThe purpose of this study was to determine the degree to which synthetic vowel samples based on previously reported vocal tract area functions of eight speakers could be accurately identified by listeners. Vowels were synthesized with a wave-reflection type of vocal tract model coupled to a voice source. A particular vowel was generated by specifying an area function that had been derived from previous magnetic resonance imaging based measurements. The vowel samples were presented to ten listeners in a forced choice paradigm in which they were asked to identify the vowel. Results indicated that the vowels [i], [ae], and [u] were identified most accurately for all of speakers. The identification errors of the other vowels were typically due to confusions with adjacent vowels.
- Lowell, S. Y., Barkmeier-Kraemer, J. M., Hoit, J. D., & Story, B. H. (2008). Respiratory and laryngeal function during spontaneous speaking in teachers with voice disorders. Journal of speech, language, and hearing research, 51(2), 333--349.
- Story, B. (2008). Quantal events generated by the structural and temporal variation of the vocal tract.. The Journal of the Acoustical Society of America, 124(4), 2527--2527.
- Story, B. H., Story, B. H., Lowell, S. Y., Hoit, J. D., & Barkmeier-kraemer, J. M. (2008). Erratum: Respiratory and laryngeal function during spontaneous speaking in teachers with voice disorders (Journal of Speech, Language, and Hearing Research (2008), 51, 2, (333-349 10.1044/1092-4388(2008/025)).. Journal of Speech Language and Hearing Research, 51(3), 814-814. doi:10.1044/1092-4388(2008/058)
- Story, B., & Story, B. H. (2008). Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. The Journal of the Acoustical Society of America.More infoA new set of area functions for vowels has been obtained with magnetic resonance imaging from the same speaker as that previously reported in 1996 [Story et al., J. Acoust. Soc. Am. 100, 537-554 (1996)]. The new area functions were derived from image data collected in 2002, whereas the previously reported area functions were based on magnetic resonance images obtained in 1994. When compared, the new area function sets indicated a tendency toward a constricted pharyngeal region and expanded oral cavity relative to the previous set. Based on calculated formant frequencies and sensitivity functions, these morphological differences were shown to have the primary acoustic effect of systematically shifting the second formant (F2) downward in frequency. Multiple instances of target vocal tract shapes from a specific speaker provide additional sampling of the possible area functions that may be produced during speech production. This may be of benefit for understanding intraspeaker variability in vowel production and for further development of speech synthesizers and speech models that utilize area function information.
- Carmichel, E. L., Harris, F. P., Story, B. H., & others, . (2007). Effects of binaural electronic hearing protectors on localization and response time to sounds in the horizontal plane. Noise and Health, 9(37), 83.
- Carmichel, E., Harris, F., & Story, B. (2007). Effects of binaural electronic hearing protectors on localization and response time to sounds in the horizontal plane. Noise and Health, 9(37), 83-95.More infoPMID: 18087114;Abstract: The effects of electronic hearing protector devices (HPDs) on localization and response time (RT) to stimuli were assessed at six locations in the horizontal plane. The stimuli included a firearm loading, telephone ringing and.5-kHz and 4-kHz tonebursts presented during continuous traffic noise. Eight normally hearing adult listeners were evaluated under two conditions: (a) ears unoccluded; (b) ears occluded with one of three amplitude-sensitive sound transmission HPDs. All HPDs were found to affect localization, and performance was dependent on stimuli and location. Response time (RT) was less in the unoccluded condition than for any of the HPD conditions for the broadband stimuli. In the HPD conditions, RT to incorrect responses was significantly less than RT to correct responses for 120° and 240°, the two locations with the greatest number of errors. The RTs to incorrect responses were significantly greater than to correct responses for 60° and 300°, the two locations with the least number of errors. The HPDs assessed in this study did not preserve localization ability under most stimulus conditions.
- Pruthi, T., Espy-Wilson, C. Y., & Story, B. H. (2007). Simulation and analysis of nasalized vowels based on magnetic resonance imaging data. Journal of the Acoustical Society of America, 121(6), 3858-3873.More infoPMID: 17552733;Abstract: In this study, vocal tract area functions for one American English speaker, recorded using magnetic resonance imaging, were used to simulate and analyze the acoustics of vowel nasalization. Computer vocal tract models and susceptance plots were used to study the three most important sources of acoustic variability involved in the production of nasalized vowels: velar coupling area, asymmetry of nasal passages, and the sinus cavities. Analysis of the susceptance plots of the pharyngeal and oral cavities, - (Bp + B o), and the nasal cavity, Bn, helped in understanding the movement of poles and zeros with varying coupling areas. Simulations using two nasal passages clearly showed the introduction of extra pole-zero pairs due to the asymmetry between the passages. Simulations with the inclusion of maxillary and sphenoidal sinuses showed that each sinus can potentially introduce one pole-zero pair in the spectrum. Further, the right maxillary sinus introduced a pole-zero pair at the lowest frequency. The effective frequencies of these poles and zeros due to the sinuses in the sum of the oral and nasal cavity outputs changes with a change in the configuration of the oral cavity, which may happen due to a change in the coupling area, or in the vowel being articulated. © 2007 Acoustical Society of America.
- Pruthi, T., Espy-Wilson, C. Y., & Story, B. H. (2007). Simulation and analysis of nasalized vowels based on magnetic resonance imaging dataa). The Journal of the Acoustical Society of America, 121(6), 3858--3873.
- Sapir, S., Spielman, J. L., Ramig, L. O., Story, B. H., & Fox, C. (2007). Effects of intensive voice treatment (the Lee Silverman Voice Treatment [LSVT]) on vowel articulation in dysarthric individuals with idiopathic Parkinson disease: Acoustic and perceptual findings. Journal of Speech, Language, and Hearing Research, 50(4), 899-912.More infoPMID: 17675595;Abstract: Purpose: To evaluate the effects of intensive voice treatment targeting vocal loudness (the Lee Silverman Voice Treatment [LSVT]) on vowel articulation in dysarthric individuals with idiopathic Parkinson's disease (PD). Method: A group of individuals with PD receiving LSVT (n = 14) was compared to a group of individuals with PD not receiving LSVT (n = 15) and a group of age-matched healthy individuals (n = 14) on the variables vocal sound pressure level (VocSPL); various measures of the first (F1) and second (F2) formants of the vowels /i/, /u/, and /a/; vowel triangle area; and perceptual vowel ratings. The vowels were extracted from the words key, stew, and Bobby embedded in phrases. Perceptual vowel rating was performed by trained raters using a visual analog scale. Results: Only VocSPL, F2 of the vowel /u/ (F2u), and the ratio F2i/F2u significantly differed between patients and healthy individuals pretreatment. These variables, along with perceptual vowel ratings, significantly changed (improved) in the group receiving LSVT only. Conclusion: These results, along with previous findings, add further support to the generalized therapeutic impact of intensive voice treatment on orofacial functions (speech, swallowing, facial expression) and respiratory and laryngeal functions in individuals with PD. © American Speech-Language-Hearing Association.
- Sapir, S., Spielman, J. L., Ramig, L. O., Story, B. H., & Fox, C. (2007). Effects of intensive voice treatment (the Lee Silverman Voice Treatment [LSVT]) on vowel articulation in dysarthric individuals with idiopathic Parkinson disease: acoustic and perceptual findings. Journal of Speech, Language, and Hearing Research, 50(4), 899--912.
- Sapir, S., Spielman, J. L., Ramig, L. O., Story, B. H., & Fox, C. (2007). Erratum: Effects of intensive voice treatment (the Lee Silverman Voice Treatment [LSVT]) on vowel articulation in dysarthric individuals with idiopathic parkinson disease: Acoustic and perceptual findings (Journal of Speech, Language, and Hearing Research (2007), 50, PART 4, (899-912) DOI: 10.1044/1092-4388(2007/064)). Journal of Speech, Language, and Hearing Research, 50(6), 1652-.
- Sapir, S., Spielman, J., Ramig, L., Story, B., & Fox, C. (2007). Effects of intensive voice treatment (LSVT\textregistered) on vowel articulation in dysarthric individuals with idiopathic Parkinson disease: Acoustic and perceptual findings. Journal of Speech, Language, and Hearing Research, 50(4), 899--912.
- Story, B. (2007). The acoustic consequences of time-dependent changes of the vocal tract shape. The Journal of the Acoustical Society of America, 121(5), 3158--3158.
- Story, B. H. (2007). A comparison of vocal tract perturbation patterns based on statistical and acoustic considerations. Journal of the Acoustical Society of America, 122(4), EL107-EL114.More infoPMID: 17902738;PMCID: PMC2278006;Abstract: The purpose of this study was to investigate the relation between vocal tract deformation patterns obtained from statistical analyses of a set of area functions representative of a vowel repertoire, and the acoustic properties of a neutral vocal tract shape. Acoustic sensitivity functions were calculated for a mean area function based on seven different speakers. Specific linear combinations of the sensitivity functions corresponding to the first two formant frequencies were shown to possess essentially the same amplitude variation along the vocal tract length as the statistically derived deformation patterns reported in previous studies. © 2007 Acoustical Society of America.
- Story, B. H. (2007). A comparison of vocal tract perturbation patterns based on statistical and acoustic considerations. The Journal of the Acoustical Society of America, 122(4), EL107--EL114.
- Story, B. H. (2007). Acoustically‐guided vocal tract modifications for singing. Journal of the Acoustical Society of America, 121(5), 3087-3087. doi:10.1121/1.4808508More infoThe sound quality of a specific vowel can be dramatically altered by subtle modifications of the vocal tract shape. These modifications create changes in the pattern of formant frequencies. For example, the well‐known singing formant, which is a clustering of resonance frequencies, is typically the result of constricting the epilaryngeal space or expanding the lower pharyngeal space to create a large cross‐sectional area discontinuity between them. Other modifications such as lip protrusion/spreading or larynx lowering/raising will also impose changes on the formant frequency pattern that may be desirable for singing or speech production. This presentation will focus on the acoustic sensitivity of the resonance frequencies to subtle perturbations of specific vowel configurations. Using calculated sensitivity functions, it will be shown how specific regions along the vocal tract can be constricted or expanded to perturb one or more of the formant frequencies. In effect, this technique provides a means of ‘‘tuning’’ the vocal tract shape to produce a desired frequency response. [Work supported by NIH R01‐DC04789.]
- Story, B. H. (2007). Modification of emotional speech and voice quality based on changes to the vocal tract structure. Emotions in the Human Voice, 1, 123--136.
- Story, B. H. (2007). Time dependence of vocal tract modes during production of vowels and vowel sequences. Journal of the Acoustical Society of America, 121(6), 3770-3789.More infoPMID: 17552726;PMCID: PMC2310171;Abstract: Vocal tract shaping patterns based on articulatory fleshpoint data from four speakers in the University of Wisconsin x-ray microbeam (XRMB) database [J. Westbury, UW-Madison, (1994)] were determined with a principal component analysis (PCA). Midsagittal cross-distance functions representative of approximately the front 6 cm of the oral cavity for each of 11 vowels and vowel-vowel (VV) sequences were obtained from the pellet positions and the hard palate profile for the four speakers. A PCA was independently performed on each speaker's set of cross-distance functions representing static vowels only, and again with time-dependent cross-distance functions representing vowels and VV sequences. In all cases, results indicated that the first two orthogonal components (referred to as modes) accounted for more than 97% of the variance in each speaker's set of cross-distance functions. In addition, the shape of each mode was shown to be similar across the speakers suggesting that the modes represent common patterns of vocal tract deformation. Plots of the resulting time-dependent coefficient records showed that the four speakers activated each mode similarly during production of the vowel sequences. Finally, a procedure was described for using the time-dependent mode coefficients obtained from the XRMB data as input for an area function model of the vocal tract. © 2007 Acoustical Society of America.
- Story, B. H. (2007). Time dependence of vocal tract modes during production of vowels and vowel sequences. The Journal of the Acoustical Society of America, 121(6), 3770--3789.
- Story, B., & Story, B. H. (2007). Time dependence of vocal tract modes during production of vowels and vowel sequences. The Journal of the Acoustical Society of America, 121(6).More infoVocal tract shaping patterns based on articulatory fleshpoint data from four speakers in the University of Wisconsin x-ray microbeam (XRMB) database [J. Westbury, UW-Madison, (1994)] were determined with a principal component analysis (PCA). Midsagittal cross-distance functions representative of approximately the front 6 cm of the oral cavity for each of 11 vowels and vowel-vowel (VV) sequences were obtained from the pellet positions and the hard palate profile for the four speakers. A PCA was independently performed on each speaker's set of cross-distance functions representing static vowels only, and again with time-dependent cross-distance functions representing vowels and VV sequences. In all cases, results indicated that the first two orthogonal components (referred to as modes) accounted for more than 97% of the variance in each speaker's set of cross-distance functions. In addition, the shape of each mode was shown to be similar across the speakers suggesting that the modes represent common patterns of vocal tract deformation. Plots of the resulting time-dependent coefficient records showed that the four speakers activated each mode similarly during production of the vowel sequences. Finally, a procedure was described for using the time-dependent mode coefficients obtained from the XRMB data as input for an area function model of the vocal tract.
- Titze, I. R. (2007). Source-vocal tract interaction in singing. The Journal of the Acoustical Society of America, 121(5), 3087--3087.
- Alku, P., Story, B., & Airas, M. (2006). Estimation of the voice source from speech pressure signals: Evaluation of an inverse filtering technique using physical modelling of voice production. Folia Phoniatrica et Logopaedica, 58(2), 102-113.More infoPMID: 16479132;Abstract: Objective: The goal of the study is to use physical modelling of voice production to assess the performance of an inverse filtering method in estimating the glottal flow from acoustic speech pressure signals. Methods: An automatic inverse filtering method is presented, and speech pressure signals are generated using physical modelling of voice production so as to obtain test vowels with a known shape of the glottal excitation waveform. The speech sounds produced consist of 4 different vowels, each with 10 different values of the fundamental frequency. Both the original glottal flows given by physical modelling and their estimates computed by inverse filtering were parametrised with two robust voice source parameters: the normalized amplitude quotient and the difference (in decibels) between the levels of the first and second harmonics. Results: The results show that for both extracted parameters the error introduced by inverse filtering was, in general, small. The effect of the distortion caused by inverse filtering on the parameter values was clearly smaller than the change in the corresponding parameters when the phonation type was altered. The distortion was largest for high-pitched vowels with the lowest value of the first formant. Conclusions: The study shows that the proposed inverse filtering technique combined with the extracted parameters constitutes a voice source analysis tool that is able to measure the voice source dynamics automatically with satisfactory accuracy. Copyright © 2006 S. Karger AG.
- Farinella, K. A., Hixon, T. J., Hoit, J. D., Story, B. H., & Jones, P. A. (2006). Listener perception of respiratory-induced voice tremor. American Journal of Speech-Language Pathology, 15(1), 72--84.
- Lowell, S. Y., & Story, B. H. (2006). Simulated effects of cricothyroid and thyroarytenoid muscle activation on adult-male vocal fold vibration. Journal of the Acoustical Society of America, 120(1), 386-397.More infoPMID: 16875234;Abstract: Adjustments to cricothyroid and thyroarytenoid muscle activation are critical to the control of fundamental frequency and aerodynamic aspects of vocal fold vibration in humans. The aerodynamic and physical effects of these muscles are not well understood and are difficult to study in vivo. Knowledge of the contributions of these two muscles is essential to understanding both normal and disordered voice physiology. In this study, a three-mass model for voice simulation in adult males was used to produce systematic changes to cricothyroid and thyroarytenoid muscle activation levels. Predicted effects on fundamental frequency, aerodynamic quantities, and physical quantities of vocal fold vibration were assessed. Certain combinations of these muscle activations resulted in aerodynamic and physical characteristics of vibration that might increase the mechanical stress placed on the vocal fold tissue. © 2006 Acoustical Society of America.
- Lowell, S. Y., & Story, B. H. (2006). Simulated effects of cricothyroid and thyroarytenoid muscle activation on adult-male vocal fold vibration. The Journal of the Acoustical Society of America, 120(1), 386--397.
- Mathur, S., Story, B. H., & Rodr\'\iguez, J. J. (2006). Vocal-tract modeling: Fractional elongation of segment lengths in a waveguide model with half-sample delays. Audio, Speech, and Language Processing, IEEE Transactions on, 14(5), 1754--1762.
- Story, B. H. (2006). Technique for "tuning" vocal tract area functions based on acoustic sensitivity functions (L). Journal of the Acoustical Society of America, 119(2), 715-718.More infoPMID: 16521730;Abstract: A technique for modifying vocal tract area functions is developed by using sum and difference combinations of acoustic sensitivity functions to perturb an initial vocal tract configuration. First, sensitivity functions [e.g., Fant and Pauli, Proc. Speech Comm. Sem. 74, 1975] are calculated for a given area function, at its specific formant frequencies. The sensitivity functions are then multiplied by scaling coefficients that are determined from the difference between a desired set of formant frequencies and those supported by the current area function. The scaled sensitivity functions are then summed together to generate a perturbation of the area function. This produces a new area function whose associated formant frequencies are closer to the desired values than the previous one. This process is repeated iteratively until the coefficients are equal to zero or are below a threshold value. © 2006 Acoustical Society of America.
- Story, B. H., & Bunton, K. (2006). Comparison of vocal tract shaping patterns derived from articulatory fleshpoint data and MRI-based area functions. The Journal of the Acoustical Society of America, 120(5), 3372--3373.
- Story, B., & Story, B. H. (2006). Technique for "tuning" vocal tract area functions based on acoustic sensitivity functions. The Journal of the Acoustical Society of America, 119(2).More infoA technique for modifying vocal tract area functions is developed by using sum and difference combinations of acoustic sensitivity functions to perturb an initial vocal tract configuration. First, sensitivity functions [e.g., Fant and Pauli, Proc. Speech Comm. Sem. 74, 1975] are calculated for a given area function, at its specific formant frequencies. The sensitivity functions are then multiplied by scaling coefficients that are determined from the difference between a desired set of formant frequencies and those supported by the current area function. The scaled sensitivity functions are then summed together to generate a perturbation of the area function. This produces a new area function whose associated formant frequencies are closer to the desired values than the previous one. This process is repeated iteratively until the coefficients are equal to zero or are below a threshold value.
- Story, B., Lowell, S. Y., & Story, B. H. (2006). Simulated effects of cricothyroid and thyroarytenoid muscle activation on adult-male vocal fold vibration. The Journal of the Acoustical Society of America, 120(1).More infoAdjustments to cricothyroid and thyroarytenoid muscle activation are critical to the control of fundamental frequency and aerodynamic aspects of vocal fold vibration in humans. The aerodynamic and physical effects of these muscles are not well understood and are difficult to study in vivo. Knowledge of the contributions of these two muscles is essential to understanding both normal and disordered voice physiology. In this study, a three-mass model for voice simulation in adult males was used to produce systematic changes to cricothyroid and thyroarytenoid muscle activation levels. Predicted effects on fundamental frequency, aerodynamic quantities, and physical quantities of vocal fold vibration were assessed. Certain combinations of these muscle activations resulted in aerodynamic and physical characteristics of vibration that might increase the mechanical stress placed on the vocal fold tissue.
- Tucker, B. V., & Story, B. H. (2006). The relation of the temporal variation of F2 to articulator movement. The Journal of the Acoustical Society of America, 120(5), 3373--3373.
- Alku, P., Story, B., & Airas, M. (2005). Estimation of the voice source from speech pressure signals: evaluation of an inverse filtering technique using physical modelling of voice production.. Folia phoniatrica et logopaedica: official organ of the International Association of Logopedics and Phoniatrics (IALP), 58(2), 102--113.
- Farinella, K. A., & Story, B. H. (2005). Simulation and analysis of tremor in speech production. The Journal of the Acoustical Society of America, 117(4), 2544--2544.
- Story, B. (2005). Acoustically-guided articulation patterns for vowel production. The Journal of the Acoustical Society of America, 117(4), 2619--2620.
- Story, B. H. (2005). A parametric model of the vocal tract area function for vowel and consonant simulation. Journal of the Acoustical Society of America, 117(5), 3231-3254.More infoPMID: 15957790;Abstract: A model of the vocal-tract area function is described that consists of four tiers. The first tier is a vowel substrate defined by a system of spatial eigenmodes and a neutral area function determined from MRI-based vocal-tract data. The input parameters to the first tier are coefficient values that, when multiplied by the appropriate eigenmode and added to the neutral area function, construct a desired vowel. The second tier consists of a consonant shaping function defined along the length of the vocal tract that can be used to modify the vowel substrate such that a constriction is formed. Input parameters consist of the location, area, and range of the constriction. Location and area roughly correspond to the standard phonetic specifications of place and degree of constriction, whereas the range defines the amount of vocal-tract length over which the constriction will influence the tract shape. The third tier allows length modifications for articulatory maneuvers such as lip rounding/spreading and larynx lowering/raising. Finally, the fourth tier provides control of the level of acoustic coupling of the vocal tract to the nasal tract. All parameters can be specified either as static or time varying, which allows for multiple levels of coarticulation or coproduction. © 2005 Acoustical Society of America.
- Story, B. H. (2005). Synergistic modes of vocal tract articulation for American English vowels. Journal of the Acoustical Society of America, 118(6), 3834-3859.More infoPMID: 16419828;Abstract: The purpose of this study was to investigate the spatial similarity of vocal tract shaping patterns across speakers and the similarity of their acoustic effects. Vocal tract area functions for 11 American English vowels were obtained from six speakers, three female and three male, using magnetic resonance imaging (MRI). Each speaker's set of area functions was then decomposed into mean area vectors and representative modes (eigenvectors) using principal components analysis (PCA). Three modes accounted for more than 90% of the variance in the original data sets for each speaker. The general shapes of the first two modes were found to be highly correlated across all six speakers. To demonstrate the acoustic effects of each mode, both in isolation and combined, a mapping between the mode scaling coefficients and [F1, F2] pairs was generated for each speaker. The mappings were unique for all six speakers in terms of the exact shape of the [F1, F2] vowel space, but the general effect of the modes was the same in each case. The results support the idea that the modes provide a common system for perturbing a unique underlying neutral vocal tract shape. © 2005 Acoustical Society of America.
- Story, B., & Story, B. H. (2005). Synergistic modes of vocal tract articulation for American English vowels. The Journal of the Acoustical Society of America, 118(6).More infoThe purpose of this study was to investigate the spatial similarity of vocal tract shaping patterns across speakers and the similarity of their acoustic effects. Vocal tract area functions for 11 American English vowels were obtained from six speakers, three female and three male, using magnetic resonance imaging (MRI). Each speaker's set of area functions was then decomposed into mean area vectors and representative modes (eigenvectors) using principal components analysis (PCA). Three modes accounted for more than 90% of the variance in the original data sets for each speaker. The general shapes of the first two modes were found to be highly correlated across all six speakers. To demonstrate the acoustic effects of each mode, both in isolation and combined, a mapping between the mode scaling coefficients and [F1, F2] pairs was generated for each speaker. The mappings were unique for all six speakers in terms of the exact shape of the [F1, F2] vowel space, but the general effect of the modes was the same in each case. The results support the idea that the modes provide a common system for perturbing a unique underlying neutral vocal tract shape.
- Bergan, C. C., Titze, I. R., & Story, B. (2004). The perception of two vocal qualities in a synthesized vocal utterance: Ring and pressed voice. Journal of Voice, 18(3), 305-317.More infoPMID: 15331103;Abstract: Two vocal qualities, ring quality and pressed quality, were analyzed perceptually. Listeners were asked to rate (on a scale from 0 to 10) the "amount of ring" in one listening and the "amount of pressedness" in another listening. The stimulus was the synthesized utterance /ya-ya-ya-ya-ya/. In the continuum representation of ring, the skewing quotient and the cross section of the epilaryngeal tube area were systematically varied, independently and by a covariation rule. In the continuum representation of pressed, the flow amplitude and open quotient were similarly varied. Results indicated that the crossover point between ring and no ring occurred with an epilaryngeal area of around 1.0 cm2, and the crossover point between pressed and not pressed quality occurred at an open quotient of about 0.4. Fundamental frequency also had an effect on the perceptions, with a higher fundamental frequency receiving higher ratings of ring and pressed for otherwise the same parameters. Listeners demonstrated highly variable perceptions in both continua with poor intersubject, intrasubject, and intergroup reliability.
- Sapir, S., Spielman, J., Ramig, L. O., Hinds, S. L., Countryman, S., Fox, C., & Story, B. (2004). Erratum: Effects of intensive voice treatment (the Lee Silverman Voice Treatment [LSVT]) on ataxic dysarthria: A case study (American Journal of Speech-Language Pathology (November 2003) (398)). American Journal of Speech-Language Pathology, 13(1), 93-.
- Story, B. (2004). A formant-to-area conversion technique based on acoustic sensitivity functions. The Journal of the Acoustical Society of America, 116(4), 2631--2631.
- Story, B. H. (2004). On the ability of a physiologically constrained area function model of the vocal tract to produce normal formant patterns under perturbed conditions. Journal of the Acoustical Society of America, 115(4), 1760-1770.More infoPMID: 15101654;Abstract: An area function model of the vocal tract is tested for its ability to produce typical vowel formant frequencies with a perturbation at the lips. The model, which consists of a neutral shape and two weighted orthogonal shaping patterns (modes), has previously been shown to produce a nearly one-to-one mapping between formant frequencies and the weighting coefficients of the modes [Story and Titze, J. Phonetics, 26, 223-260 (1998)]. In this study, a perturbation experiment was simulated by imposing a constant area "lip tube" on the model. The mapping between the mode coefficients and formant frequencies was then recomputed with the lip tube in place and showed that formant frequencies (F1 and F2) representative of the vowels [℧,o,u] could no longer be produced with the model. However, when the mode coefficients were allowed to exceed their typical bounding values, the mapping between them and the formant frequencies was expanded such that the vowels [℧,o,u] were compensated. The area functions generated by these exaggerated coefficients were shown to be similar to vocal-tract shapes reported for real speakers under similar perturbed conditions [Savariaux, Perrier, and Orliaguet, J. Acoust. Soc. Am., 98, 2428-2442 (1995)]. This suggests that the structure of this particular model captures some of the human ability to configure the vocal-tract shape under both ordinary and extraordinary conditions. © 2004 Acoustical Society of America.
- Story, B. H. (2004). Vowel acoustics for speaking and singing. Acta Acustica united with Acustica, 90(4), 629--640.
- Li, K., & Story, B. (2003). An investigation of perceptual tolerance limits of stop constriction regions along the vocal tract. The Journal of the Acoustical Society of America, 114(4), 2337--2337.
- Sapir, S., Spielman, J., Ramig, L. O., Hinds, S. L., Countryman, S., Fox, C., & Story, B. (2003). Effects of Intensive Voice Treatment (the Lee Silverman Voice Treatment [LSVT]) on Ataxic Dysarthria: A Case Study. American Journal of Speech-Language Pathology, 12(4), 387-399.More infoPMID: 14658991;Abstract: This study examined the effects of intensive voice treatment (the Lee Silverman Voice Treatment [LSVT®]) on ataxic dysarthria in a woman with cerebellar dysfunction secondary to thiamine deficiency. Perceptual and acoustic measures were made on speech samples recorded just before the LSVT program was administered, immediately after it was administered, and at 9 months follow-up. Results indicate short- and long-term improvement in phonatory and articulatory functions, speech intelligibility, and overall communication and job-related activity following LSVT. This study's findings provide initial support for the application of LSVT to the treatment of speech disorders accompanying ataxic dysarthria. Potential neural mechanisms that may underlie the effects of loud phonation and LSVT are addressed.
- Story, B. (2003). Simulation of VCV syllables with a parametric area function model of the vocal tract. Journal of the Acoustical Society of America, 114(4), 2394.
- Story, B. H. (2003). Using imaging and modeling techniques to understand the relation between vocal tract shape to acoustic characteristics. Proc. Stockholm Music Acoustics Conf, 435--438.
- Titze, I. R., Bergan, C. C., Hunter, E. J., & Story, B. (2003). Source and filter adjustments affecting the perception of the vocal qualities twang and yawn. Logopedics Phoniatrics Vocology, 28(4), 147-155.More infoPMID: 14686543;Abstract: Two vocal qualities, twang and yawn, were synthesized and rated perceptually. The stimuli consisted of synthesized vocal productions of a sentence-length utterance 'ya ya ya ya ya,' which had speech-like intonation, In a continuum transformation from normal to twang, the area in the pharynx was gradually decreased, along with vocal tract shortening and a decreased open quotient in the glottal airflow. In a continuum transformation toward yawn, the area in the pharynx was gradually increased, along with vocal tract lengthening and an increased open quotient. The normal (untransformed) vocal tract area was pre-determined by earlier studies involving MRI scans of a human subject's vocal tract. Listeners were asked to rate (on a scale from 1-10) the 'amount of twang' in one listening session and the 'amount of yawn' in another listening session. Overall, the perception of twang increased directly with pharyngeal area narrowing, vocal tract shortening, and decreased open quotient. The perception of yawn increased with pharyngeal area widening, vocal tract lengthening, and increased open quotient. Adjustments of one parameter alone yielded less significant perceptual changes than the above combinations, with open quotient showing the greatest effect in isolation. Listeners demonstrated variable perceptions in both continua with poor inter-subject, intra-subject, and inter-group reliability.
- Story, B. (2002). A parametric area function model of three female vocal tracts based on orthogonal modes. The Journal of the Acoustical Society of America, 112(5), 2418--2418.
- Story, B. H. (2002). An overview of the physiology, physics and modeling of the sound source for vowels. Acoustical Science and Technology, 23(4), 195-206.More infoAbstract: The vibration of the vocal folds produces the primary sound source for vowels. This paper first reviews vocal fold anatomy and the kinematics associated with typical vibratory motion. A brief historical background is then presented on the basic physics of vocal fold vibration and various efforts directed at mathematical modeling of the vocal folds. Finally, a low-dimensional model is used to simulate the vocal fold vibration under various conditions of vocal tract loading. In particular, a "no-tract" case is compared to two cases in which the voice source is coupled to vocal tract area functions representing the vowels /i/ and /a/, respectively.
- Story, B. H. (2002). An overview of the physiology, physics and modeling of the sound source for vowels.. Acoustical Science and Technology, 23(4), 195--206.
- Story, B. H., & Titze, I. R. (2002). A preliminary study of voice quality transformation based on modifications to the neutral vocal tract area function. Journal of Phonetics, 30(3), 485-509.More infoAbstract: The idea is pursued that voice quality can be partially represented by the underlying shape of a speaker's neutral vocal tract. Using an area function model, which allows direct access to the neutral tract shape, four separate modifications were made to one male speaker's vocal tract. The modifications involve imposing constrictive or expansive effects on the pharyngeal and oral portions of the neutral area function as well as on lip aperture and the epi-laryngeal tube. A single word utterance was first synthesized by superimposing deformation patterns appropriate for the word onto the original neutral tract shape (area function). Then, four additional samples of the word were synthesized using different modified neutral area function each time. The modifications were assessed by comparing F1-F2 formant trajectories of the original utterance with those of the modifications. The formant frequencies were observed to shift within the F1-F2 plane in directions predictable from simple tube acoustics. However, the modified voice qualities did not preserve the shape of the original F1-F2 trajectory. In other words, the modifications did not create a simple linear transformation of formant frequencies even though the "articulatory dynamics" (deformation patterns of the area function) were identical in all cases. These somewhat artificial vocal tract modifications were also compared with formant frequencies extracted from recordings of a speaker attempting to produce the same types of modifications. In general, the speaker's formant trajectories showed some similarities to the synthesized versions. However, the speaker also seemed to grade the "level" of the voice quality that was exerted on the utterance depending on whether the demands of the voice quality were in competition with the linguistic demands of a given phonetic segment. Finally, to demonstrate this type of voice quality modification in a broader context, the same procedures were applied to sentence-level speech and results were again shown as F1-F2 formant trajectories. © 2002 Elsevier Science Ltd. All rights reserved.
- Titze, I. R., & Story, B. H. (2002). Rules for controlling low-dimensional vocal fold models with muscle activation. Journal of the Acoustical Society of America, 112(3 I), 1064-1076.More infoPMID: 12243155;Abstract: A low-dimensional, self-oscillation model of the vocal folds is used to capture three primary modes of vibration, a shear mode and two compressional modes. The shear mode is implemented with either two vertical masses or a rotating plate, and the compressional modes are implemented with an additional bar mass between the vertically stacked masses and the lateral boundary. The combination of these elements allows for the anatomically important body-cover differentiation of vocal fold tissues. It also allows for reconciliation of lumped-element mechanics with continuum mechanics, but in this reconciliation the oscillation region is restricted to a nearly rectangular glottis (as in all low-dimensional models) and a small effective thickness of vibration (
- Titze, I. R., Story, B., Smith, M., & Long, R. (2002). A reflex resonance model of vocal vibrato. Journal of the Acoustical Society of America, 111(5 I), 2272-2282.More infoPMID: 12051447;Abstract: A reflex mechanism with a long latency (>40 ms) is implicated as a plausible cause of vocal vibrato. At least one pair of agonist-antagonist muscles that can change vocal-fold length is needed, such as the cricothyroid muscle paired with the thyroarytenoid muscle, or the cricothyroid muscle paired with the lateral cricoarytenoid muscle or a strap muscle. Such an agonist-antagonist muscle pair can produce negative feedback instability in vocal-fold length with this long reflex latency, producing oscillations on the order of 5-7 Hz. It is shown that singers appear to increase the gain in the reflex loop to cultivate the vibrato, which grows out of a spectrum of 0-15-Hz physiologic tremors in raw form. © 2002 Acoustical Society of America.
- Story, B. H. (2001). A distinctive region model based on empirical vocal tract area functions. The Journal of the Acoustical Society of America, 110(5), 2761--2762.
- Story, B. H. (2001). Speech synthesis by mapping articulator movement patterns to a shape-based area function model of the vocal tract. The Journal of the Acoustical Society of America, 109(5), 2444--2445.
- Story, B. H., Titze, I. R., & Hoffman, E. A. (2001). The relationship of vocal tract shape to three voice qualities. Journal of the Acoustical Society of America, 109(4), 1651-1667.More infoPMID: 11325134;Abstract: Three-dimensional vocal tract shapes and consequent area functions representing the vowels [i, æ, a, u] have been obtained from one male and one female speaker using magnetic resonance imaging (MRI). The two speakers were trained vocal performers and both were adept at manipulation of vocal tract shape to alter voice quality. Each vowel was performed three times, each with one of the three voice qualities: normal, yawny, and twangy. The purpose of the study was to determine some ways in which the vocal tract shape can be manipulated to alter voice quality while retaining a desired phonetic quality. To summarize any overall tract shaping tendencies mean area functions were subsequently computed across the four vowels produced within each specific voice quality. Relative to normal speech, both the vowel area functions and mean area functions showed, in general, that the oral cavity is widened and tract length increased for the yawny productions. The twangy vowels were characterized by shortened tract length, widened lip opening, and a slightly constricted oral cavity. The resulting acoustic characteristics of these articulatory alterations consisted of the first two formants (F1 and F2) being close together for all yawny vowels and far apart for all the twangy vowels. © 2001 Acoustical Society of America.
- Tom, K., Titze, I. R., Hoffman, E. A., & Story, B. H. (2001). Three-dimensional vocal tract imaging and formant structure: Varying vocal register, pitch, and loudness. Journal of the Acoustical Society of America, 109(2), 742-747.More infoPMID: 11248978;Abstract: Although advances in techniques for image acquisition and analysis have facilitated the direct measurement of three-dimensional vocal tract air space shapes associated with specific speech phonemes, little information is available with regard to changes in three-dimensional (3-D) vocal tract shape as a function of vocal register, pitch, and loudness. In this study, 3-D images of the vocal tract during falsetto and chest register phonations at various pitch and loudness conditions were obtained using electron beam computed tomography (EBCT). Detailed measurements and differences in vocal tract configuration and formant characteristics derived from the eight measured vocal tract shapes are reported. © 2001 Acoustical Society of America.
- Story, B. H. (2000). A study of compensation for a labial perturbation of the vowel/u/using an area function model of the vocal tract. The Journal of the Acoustical Society of America, 108(5), 2509--2509.
- Story, B. H., & Titze, I. R. (2000). An investigation of voice quality based on modifications of the neutral vocal tract shape. Proceedings of 5th Seminar on Speech Production: Models and Data, 349--352.
- Story, B. H., Laukkanen, A., & Titze, I. R. (2000). Acoustic impedance of an artificially lengthened and constricted vocal tract. Journal of Voice, 14(4), 455-469.More infoPMID: 11130104;Abstract: Voice training techniques often make use of exercises involving partial occlusion of the vocal tract, typically at the anterior part of the oral cavity or at the lips. In this study two techniques are investigated: a bilabial fricative and a small diameter hard-walled tube placed between the lips. Because the input acoustic impedance of the vocal tract is known to affect both the shaping of the glottal flow pulse and the vibrational pattern of the vocal folds, a study of the input impedance is an essential step in understanding the benefits of these two techniques. The input acoustic impedance of the vocal tract was investigated theoretically for cases of a vowel, bilabial occlusion (fully closed lips), a bilabial fricative, and artificially lengthening the tract with small diameter tubes. The results indicate that the tubes increase the input impedance in the range of the fundamental frequency of phonation by lowering the first formant frequency to nearly that of the bilabial occlusion (the lower bound on the first formant) while still allowing a continuous airflow. The bilabial fricative also has the effect of lowering the first formant frequency and increasing the low-frequency impedance, but not as effectively as the extension tubes.
- Titze, I. R., Story, B. H., Burnett, G. C., Holzrichter, J. F., Ng, L. C., & Lea, W. A. (2000). Comparison between electroglottography and electromagnetic glottography. Journal of the Acoustical Society of America, 107(1), 581-588.More infoPMID: 10641666;Abstract: Newly developed glottographic sensors, utilizing high-frequency propagating electromagnetic waves, were compared to a well-established electroglottographic device. The comparison was made on four male subjects under different phonation conditions, including three levels of vocal fold adduction (normal, breathy, and pressed), three different registers (falsetto, chest, and fry), and two different pitches. Agreement between the sensors was always found for the glottal closure event, but for the general wave shape the agreement was better for falsetto and breathy voice than for pressed voice and vocal fry. Differences are attributed to the field patterns of the devices. Whereas the electroglottographic device can operate only in a conduction mode, the electromagnetic device can operate in either the forward scattering (diffraction) mode or in the backward scattering (reflection) mode. Results of our tests favor the diffraction mode because a more favorable angle imposed on receiving the scattered (reflected) signal did not improve the signal strength. Several observations are made on the uses of the electromagnetic sensors for operation without skin contact and possibly in an array configuration for improved spatial resolution within the glottis.
- Edgerton, M. E., Bless, D., Thibeault, S., Fagerholm, M., & Story, B. (1999). The acoustic analysis of reinforced harmonics. The Journal of the Acoustical Society of America, 105(2), 1329--1329.
- Story, B., & Titze, I. (1999). A preliminary study of speech transformation using empirically defined articulatory modes. The Journal of the Acoustical Society of America, 105(2), 1092--1092.
- Druker, D. G., Titze, I. R., & Story, B. H. (1998). Glottal source parameter estimation by comparison of measured signals with simulated signals. The Journal of the Acoustical Society of America, 103(5), 2775--2775.
- Long, R., Story, B., & Titze, I. (1998). Vocal tract shape estimation using three noninvasive transducers. Proceedings of the ICA/ASA Joint Meeting, Seattle, WA, 20--26.
- Michaelis, D., Fr\"ohlich, M., Strube, H. W., Kruse, E., Story, B., & Titze, I. R. (1998). Grenzen der Jitter-und Shimmer-Messung pathologischer Stimmen mit dem un\"uberwachten Waveform-Matching-Verfahren. Fortschritte der Akustik-DAGA, 98--382.
- Story, B. H., & Titze, I. R. (1998). Parameterization of vocal tract area functions by empirical orthogonal modes. Journal of Phonetics, 26(3), 223--260.
- Story, B. H., Titze, I. R., & Hoffman, E. A. (1998). Contributions of vocal tract shape to voice quality. The Journal of the Acoustical Society of America, 104(3), 1805--1805.
- Story, B. H., Titze, I. R., & Hoffman, E. A. (1998). Vocal tract area functions for an adult female speaker based on volumetric imaging. The Journal of the Acoustical Society of America, 104(1), 471--487.
- Story, B. H., Titze, I. R., & Long, R. (1998). Synthesis of sentence-level speech based on measured vocal tract area functions. Proceedings of the ICA/ASA Joint Meeting, 2663, 2664.
- Story, B., Titze, I., & Long, R. (1998). Simulation of sentence-level speech based on measured vocal tract area functions. The Journal of the Acoustical Society of America, 103(5), 3056--3056.
- Patterson, D. K., Pepperberg, I. M., Story, B. H., & Hoffman, E. A. (1997). How parrots talk: insights based on CT scans, image processing, and mathematical models. Proceedings of SPIE - The International Society for Optical Engineering, 3033, 14-24.More infoAbstract: Little is known about mechanisms of speech production in parrots. Recently, however, techniques for correlating vocal tract shape with vowel production in humans have become more sophisticated and we have adapted these techniques for use with parrots. We scanned two grey parrot heads with intact vocal tracts. One specimen, 'Oldbird' was fixed with its beak propped open; the second 'Youngbird' was fixed with its beak closed. Using VIDA software, we (1) established that differences in tongue and larynx positioning resulted from opening or closing the beak; and (2) obtained lengths and area functions for the trachea, glottis, pharynx, mouth, and choana for both specimens and esophageal length and area functions for the first specimen. We entered lengths and area functions into a 1D wave propagation model to determine the natural formant frequencies associated with an open versus closed beak. We also determined how manipulating lengths and area functions could affect formant frequency and relative intensity. Finally, by comparing observed grey parrot vowel formant, we predict how the parrot uses its vocal tract to produce speech.
- Perrier, P., Laboissiere, R., Abry, C., Maeda, S., Deng, L., Ramsay, G., Sun, D., Titze, I., Wong, D., Story, B., & others, . (1997). Advance table of contents. Speech Communication, 22(8), 1.
- Story, B. H., Hoffman, E. A., & Titze, I. R. (1997). Volumetric image-based comparison of male and female vocal tract shapes. Proceedings of SPIE - The International Society for Optical Engineering, 3033, 25-37.More infoAbstract: A collection of 3D vocal tract shapes corresponding to vowels and consonants of American English have been acquired for a 27 year old adult female subject using a magnetic resonance imaging. Each 3D shape was condensed into a set of cross-sectional areas of oblique sections perpendicular to the centerline of the vocal tract's long axis. Such a collection of areas is typically called an 'area function'. This set of images and subsequent area functions for the female subject compliments a previous similar study concerning an adult male subject. It is the purpose of this paper to explore the morphological differences between the male and female subjects for three 'cardinal' vowels. Comparisons have been made of the 3D vocal tract shapes, area functions, and acoustic characteristics of the three vowels. The primary difference between genders is that the female pharynx is approximately 37 percent shorter than the male. Limited acoustic modeling has suggested that this shortened pharynx may play a significant role in defining male versus female voice quality.
- Story, B. H., Titze, I. R., & Wong, D. (1997). A simplified model for the simulation and transformation of speech. Engineering Applications of Artificial Intelligence, 10(6), 593-601.More infoAbstract: This paper explores a model that reduces speech production to the specification of four time-varying parameters; F1 and F2, voice fundamental frequency (F0), and a relative amplitude of the voice. The trajectory of the first two formants, F1 and F2, is treated as a series of coordinate pairs that are mapped from the F1F2 plane into a two-dimensional plane of coefficients. These coefficients are multipliers of two empirically-based orthogonal basis vectors which, when added to a neutral vowel area function, will produce a new area function with the desired locations of F1 and F2. Thus, area functions and voice parameters extracted at appropriate time intervals can be fed into a speech simulation model to recreate the original speech. A transformation of the speech can also be imposed by manipulating the area function and voice characteristics prior to the recreation of speech by simulation. The model has initially been developed for vowel-like speech utterances, but the effect of consonants on the F1F2 trajectory is also briefly addressed. © 1998 Published by Elsevier Science Ltd. All rights reserved.
- Titze, I. R., & Story, B. H. (1997). Acoustic interactions of the voice source with the lower vocal tract. Journal of the Acoustical Society of America, 101(4), 2234-2243.More infoPMID: 9104025;Abstract: The linear source-filter theory of speech production assumes that vocal fold vibration is independent of the vocal tract. The justification is that the glottis often behaves as a high-impedance (constant flow) source. Recent imaging of the vocal tract has demonstrated, however, that the epilarynx tube is quite narrow, making the input impedance to the vocal tract comparable to the glottal impedance. Strong interactions can exist, therefore. In particular, the inertance of the vocal tract facilitates vocal fold vibration by lowering the oscillation threshold pressure. This has a significant impact on singing. Not only does the epilarynx tube produce the desirable singer's formant (vocal ring), but it acts like the mouthpiece of a trumpet to shape the flow and influence the mode of vibration. Effects of the piriform sinuses, pharynx expansion, and nasal coupling are also discussed.
- Titze, I., Wong, D., Story, B., & Long, R. (1997). Considerations in voice transformation with physiologic scaling principles. Speech Communication, 22(2-3), 113-123.More infoAbstract: This study begins to explore the importance of the physiological domain in voice transformation. A general approach is outlined for transforming the voice quality of sentence-level speech while maintaining the same phonetic content. Transformations will eventually include gender, age, voice quality, emotional state, disordered state, dialect or impersonation. In this paper, only a specific voice quality, twang, is described as an example. The basic question is: relative to pure signal processing, can voices be transformed more effectively if biomechanical, acoustic and anatomical scaling principles are applied? At present, two approaches are contrasted, a Linear Predictive Coding approach and a biomechanical simulation approach. © 1997 Elsevier Science B.V.
- Berry, D. A., Herzel, H., Titze, I. R., & Story, B. H. (1996). Bifurcations in excised larynx experiments. Journal of Voice, 10(2), 129-138.More infoPMID: 8734387;Abstract: Bifurcation analysis was applied to vocal fold vibration in excised larynx experiments. Phonation onset and vocal instabilities were studied in a parameter plane spanned by subglottal pressure and asymmetry of either vocal fold adduction or elongation. Various phonatory regimes were observed, including single vocal fold oscillations. Selected spectra demonstrated correspondence between these regimes and vocal registers noted in the literature. To illustrate the regions spanned by the various phonatory regimes, two-dimensional bifurcation diagrams were generated. Many instabilities or bifurcations were noted in the regions of coexistence, i.e., regions in which the phonatory regimes overlap. Bifurcations were illustrated with spectrograms and fundamental frequency contours. Where possible, results from these studies were related to clinical observations.
- Long, R., Lange, R., Wong, D., Story, B., & Titze, I. (1996). Transformation from normal to twang and sob vocal qualities. The Journal of the Acoustical Society of America, 100(4), 2663--2663.
- Story, B. H., Hoffman, E. A., & Titze, I. R. (1996). Vocal tract imaging: a comparison of MRI and EBCT. Proceedings of SPIE - The International Society for Optical Engineering, 2709, 209-222.More infoAbstract: Vocal tract imaging for the vowels /i/ and /a/ using both EBCT and MRI was carried out for one subject (29 yr old male, native of midwestern United States) using an Imatron C-150 electron beam CT scanner and a GE Signa 1.5 Tesla scanner, respectively. Each image set was analyzed using a general display and quantitation package called VIDA TM (Volumetric Image Display and Analysis). The image analysis consisted of segmenting the airspace from the surrounding tissue, obtaining a 3D vocal tract shape via shape based interpolation, and finally using an iterative bisection algorithm to determine the vocal tract area function. The results show that the 3D representations of the vocal tract shapes derived from EBCT show subtle deformations of the airway by articulatory structures and teeth that are not observed in the MRI based representations. Shaded surface renderings of each vocal tract shape and for each imaging technique are shown and the apparent trade-offs between the two imaging methods are discussed.
- Story, B. H., Titze, I. R., & Hoffman, E. A. (1996). Vocal tract area functions from magnetic resonance imaging. Journal of the Acoustical Society of America, 100(1), 537-554.More infoPMID: 8675847;Abstract: There have been considerable research efforts in the area of vocal tract modeling but there is still a small body of information regarding direct 3-D measurements of the vocal tract shape. The purpose of this study was to acquire, using magnetic resonance imaging (MRI), an inventory of speaker- specific, three-dimensional, vocal tract air space shapes that correspond to a particular set of vowels and consonants. A set of 18 shapes was obtained for one male subject who vocalized while being scanned for 12 vowels, 3 nasals, and 3 plosives. The 3-D shapes were analyzed to find the cross- sectional areas evaluated within planes always chosen to be perpendicular to the centerline extending from the glottis to the mouth to produce an 'area function.' This paper provides a speaker-specific catalogue of area functions for 18 vocal tract shapes. Comparisons of formant locations extracted from the natural (recorded) speech of the imaged subject and from simulations using the newly acquired area functions show reasonable similarity but suggest that the imaged vocal tract shapes may be somewhat centralized. Additionally, comparisons of the area functions reported in this study are compared with those from four previous studies and demonstrate general similarities in shape but also obvious differences that can be attributed to differences in imaging techniques, image processing methods, and anatomical differences of the imaged subjects.
- Tom, K., Titze, I. R., Hoffman, E. A., & Story, B. H. (1996). Volumetric EBCT imaging of the vocal tract applied to male falsetto singing. Proceedings of SPIE - The International Society for Optical Engineering, 2709, 132-142.More infoAbstract: As part of an analysis by synthesis approach to studying vocal intensity control in falsetto register, volumetric imaging of the vocal tract (the upper airway from the glottis to the lips) using electron beam computed tomography was performed on a classically trained singer, a countertenor, who uses a falsetto singing technique. Eight pitch and loudness conditions were imaged, a subset of which will be presented here. Each set of scans consisted of contiguous 3 mm axial `slices' encompassing the arch of the hard palate superiorly and the first tracheal ring inferiorly. Images were analyzed in three stages: image segmentation, 3D airway reconstruction and airway measurement. The vocal tract airway was segmented from surrounding tissue by assigning airway voxels a unique gray scale value. Reconstruction of the vocal tract in three dimensions was accomplished using shape based interpolation on the segmented images. Cross-sectional areas and vocal tract length were acquired from shape based interpolated data. Vocal tract area functions derived from these measurements were used to simulate the subject's phonations, which in turn allowed estimation of glottal and supraglottal contributions to vocal intensity.
- Tom, K., Titze, I., Hoffman, E., & Story, B. (1996). Volumetric EBCT imaging of the vocal tract applied to male falsetto singing [2709-13]. PROCEEDINGS-SPIE THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING, 132--143.
- Wong, D., Lange, R., Long, R., Story, B., & Titze, I. (1996). LPC-based voice transformation from adult to child. The Journal of the Acoustical Society of America, 100(4), 2762--2762.
- Alipour, F., & Story, B. H. (1995). A three-dimensional solution of the wave equation in a model of the vocal tract. The Journal of the Acoustical Society of America, 98(5), 2930--2930.
- Berry, D. A., Titze, I. R., Story, B. H., & Herzel, H. (1995). Bifurcations in excised larynx experiments. The Journal of the Acoustical Society of America, 98(5), 2930--2930.
- Story, B. H. (1995). Physiologically-Based Speech Simulation Using AN Enhanced Wave-Reflection Model of the Vocal Tract..
- Story, B. H., & Titze, I. R. (1995). Voice simulation with a body-cover model of the vocal folds. Journal of the Acoustical Society of America, 97(2), 1249-1260.More infoPMID: 7876446;Abstract: A simple, low-dimensional model of the body-cover vocal-fold structure is proposed as a research tool to study both normal and pathological vocal-fold vibration. It maintains the simplicity of a two-mass model but allows for physiologically relevant adjustments and separate vibration of the body and the cover. The classic two-mass model of the vocal folds [K. Ishizaka and J. L. Flanagan, Bell Syst. Tech. J. 51, 1233-1268 (1972)] has been extended to a three-mass model in order to more realistically represent the body-cover vocal-fold structure [M. Hirano, Folia Phoniar. 26, 89-94 (1974)]. The model consists of two 'cover' masses coupled laterally to a 'body' mass by nonlinear springs and viscous damping elements. The body mass, which represents muscle tissue, is further coupled laterally to a rigid wall (assumed to represent the thyroid cartilage) by a nonlinear spring and a damping element. The two cover springs are intended to represent the elastic properties of the epithelium and the lamina propria while the body spring simulates the tension produced by contraction of the thyroarytenoid muscle. Thus contractions of the cricothyroid and thyroarytenoid muscles are incorporated in the values used for the stiffness parameters of the body and cover springs. Additionally, the two cover masses are coupled to each other through a linear spring which can represent vertical mucosal wave propagation. Simulations show reasonable similarity to observed vocal-fold motion, measured vertical phase difference, and mucosal wave velocity, as well as experimentally obtained intraglottal pressure.
- Story, B. H., Titze, I. R., & Hoffman, E. A. (1995). Vocal tract shapes and area functions from magnetic resonance imaging (MRI). The Journal of the Acoustical Society of America, 98(5), 2930--2930.
- Story, B., Hoffman, E., & Titze, I. (1995). Speech simulation based on MR images of the vocal tract [2433-21]. PROCEEDINGS-SPIE THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING, 179--179.
- Titze, I. R., Mapes, S., & Story, B. (1994). Acoustics of the tenor high voice. Journal of the Acoustical Society of America, 95(2), 1133-1142.More infoPMID: 8132903;Abstract: The spectra of six tenors were analyzed at high pitches, F4 to B4. Because of the wide separation between harmonics, formant frequencies could not be extracted in the traditional way. Rather, an analysis-by-synthesis technique was used to match the spectra of a model to the measured spectra, using parameter optimization. Results suggest that tenors maintain their first formant frequencies well above the fundamental for all vowels except [u]. The purpose of this seems to be to distribute the acoustic energy between harmonics 2, 3, and 4 rather than to boost the fundamental. Tuning the first formant to the fundamental is a technique used effectively by sopranos but seems to be deliberately avoided by tenors in order to preserve a male quality.
- Story, B. H., & Titze, I. R. (1993). Voice simulation with a three-mass model of the vocal folds. The Journal of the Acoustical Society of America, 94(3), 1762--1762.
Proceedings Publications
- Story, B. H., & Bunton, K. E. (2015, July). A spectral filtering method for tracking formants in children’s speech. In Acoustical Society of America, 23.
- Vos, R., Angus, J. A., & Story, B. H. (2014, -). A New Algorithm for Vocal Tract Shape Extraction from Singer's Waveforms. In Audio Engineering Society Convention 136.
- Airaksinen, M., Story, B. H., & Alku, P. (2013). Quasi closed phase analysis for glottal inverse filtering.. In INTERSPEECH, 143--147.
- Alku, P., Pohjalainen, J., Vainio, M., Laukkanen, A., & Story, B. H. (2012). Improved formant frequency estimation from high-pitched vowels by downgrading the contribution of the glottal source with weighted linear prediction.. In INTERSPEECH.
- Sapir, S., Spielman, J., Ramig, L., Story, B., & Fox, C. (2006). Impact of intensive vocal loudness treatment (LSVT) on vowel articulation in Parkinsonian speech: Acoustic and perceptual findings. In EUROPEAN JOURNAL OF NEUROLOGY, 13, 37--37.
- Alku, P., Airas, M., & Story, B. H. (2004). Evaluation of an inverse filtering technique using physical modeling of voice production.. In INTERSPEECH.
- Mathur, S., & Story, B. H. (2003). Vocal tract modeling: implementation of continuous length variations in a half-sample delay Kelly-Lochbaum model. In Signal Processing and Information Technology, 2003. ISSPIT 2003. Proceedings of the 3rd IEEE International Symposium on, 753--756.
- Story, B. H. (2003). Physical modeling of voice and voice quality. In ISCA Tutorial and Research Workshop on Voice Quality: Functions, Analysis and Synthesis.
- Patterson, D. K., Pepperberg, I. M., Story, B. H., & Hoffman, E. A. (1997). How parrots talk: Insights based on CT scans, image processing, and mathematical models. In Medical Imaging 1997, 14--24.
- Story, B. H., Hoffman, E. A., & Titze, I. R. (1997). Volumetric image-based comparison of male and female vocal tract shapes. In Medical Imaging 1997, 25--37.
- Story, B. H., Hoffman, E. A., & Titze, I. R. (1996). Vocal tract imaging: A comparison of MRI and EBCT. In Medical Imaging 1996, 209--222.
- Story, B. H., Titze, I. R., & Wong, D. (1996). A simplified model for simulation and transformation of speech. In Intelligence and Systems, 1996., IEEE International Joint Symposia on, 320--327.
- Titze, I. R., Wong, D., Lange, R., & Story, B. (1996). Comparison of three techniques for voice transformation. In Speech production seminar, 215--220.
- Tom, K., Titze, I. R., Hoffman, E. A., & Story, B. H. (1996). Volumetric EBCT imaging of the vocal tract applied to male falsetto singing. In Medical Imaging 1996, 132--142.
- Story, B. H., Hoffman, E. A., & Titze, I. R. (1995). Speech simulation based on MR images of the vocal tract. In Medical Imaging 1995, 179--190.
Presentations
- Story, B. H. (2019, April 2019). Mechanisms of speech production. 11th Annual Interdisciplinary Integration Symposium. Lincoln, NE: Postural Restoration Institute.
- Dan, K., Mugmon, M. S., & Story, B. H. (2018, March 2018). Soundscape: The UA's remarkable chimes and echoes. Society for Ethnomusicology, Rocky Mountain Scholars Conference. Tucson, AZ: Society for Ethnomusicology.
- Story, B. H. (2018, July 2018). Speech performance density as a measure of long-term speaking characteristics. 11th International Conference on Voice Physiology and Biomechanics. East Lansing, MI: International Conference on Voice Physiology and Biomechanics.
- Story, B. H. (2018, May 2018). Acoustic communication by vocal tract modulation. 175th Meeting of the Acoustical Society of America. Minneapolis, MN: Acoustical Society of America.
- Vorperian, H. E., Bunton, K. E., & Story, B. H. (2018, February). Intelligibility of monosyllabic words produced by an acoustically-driven model of the vocal tract. 19th Biennial Conference on Motor Speech Disorders. Savannah, GA.
- Story, B. H. (2017, April). Resonance: A seminar on music and health. Fred Fox School of Music and the Department of Speech, Language and Hearing Sciences. University of Arizona.
- Story, B. H. (2017, December). Stories of speech science. Acoustical Society of America. New Orleans: Acoustical Society of America.
- Story, B. H. (2017, July). Acoustic communication by airway modulation. 4th International Symposium on Acoustic Communication by Animals. Omaha, NE.
- Story, B. H. (2017, March). Acoustically-guided planning of vocal tract movement for production of connected speech. Linguistics and Communication Science and Disorders ColloquiumUniversity of Alberta.
- Story, B. H., & Bunton, K. E. (2017, December). The relation of auditory perceptual ratings of nasality to nasal port area in connected speech. 174th meeting of the Acoustical Society of America. New Orleans, LA: Acoustical Society of America.
- Taylor, G. L., Bunton, K. E., & Story, B. H. (2017, April). Clear Speech modifications in children aged 6-10 years.. Arizona Speech-Language-Hearing Association Convention. Tucson, AZ.
- Story, B. H., & Bunton, K. E. (2016, March). Simulations of child-like speech as test material for speech analysis algorithms. International Conference on Voice Physiology and Biomechanics. Chile.
- Story, B. H. (2015, January). The elusive shape of a child’s vocal tract. 9th Meeting of the Auditory Cognitive Neuroscience Society. Tucson, AZ.
- Story, B. H. (2015, May). Ken Stevens’ influence on the development of paradigms for speech synthesis. Acoustical Society of America. Pittsburgh, PA.
- Monson, B. B., Lotto, A. J., & Story, B. H. (2014, October). Speech spectral intensity discrimination at frequencies above 6 kHz. The Journal of the Acoustical Society of America.
- Neely, K., Bunton, K., & Story, B. H. (2014, April). Variation in formant trajectories as evidence for articulatory gesture overlap. Arizona Speech Language and Hearing Association Annual Convention. Tucson, AZ: Arizona Speech Language and Hearing Association.
- Samlan, R. A., & Story, B. H. (2014, October). Influence of left-right asymmetries on voice quality in paramedian vocal fold paralysis. The Fall Voice Conference. San Antonio, TX.
- Story, B. H. (2014, March). Acoustic sensitivity of the vocal tract as a guide to speech development. International Conference on Motor Speech. Sarasota, FL: Madonna Rehabilitation Hospital.
- Story, B. H. (2014, October). Eerie voices: Odd combinations, extremes, and irregularities. The Journal of the Acoustical Society of America. Indianapolis, IN.More infoThis is an abstract (published in the listed journal) of a presentation.
- Story, B. H., & Vorperian, H. (2014, March). Speaker-specific modeling of vocal tract shape and vowel space. International Conference on Motor Speech. Sarasota, FL: Madonna Rehabilitation Hospital.
- Story, B. H., Story, B. H., Bunton, K., & Bunton, K. (2014, April). A model of children's speech production. International Conference on Vocal Fold Physiology and Biomechanics. Salt Lake City, UT.
- Story, B. H., Story, B. H., Bunton, K., & Bunton, K. (2014, March). Vocal tract area functions for child talkers. 17th Biennial Conference on Motor Speech Disorders: Motor Speech Disorders and Speech Motor Control. Sarasota, FL.
- Bunton, K. E., Story, B. H., & Titze, I. (2013, June). Estimation of vocal tract area functions in children based on measurement of lip termination area and inverse acoustic mapping. Joint meeting of the Acoustical Society of America, International Congress on Acoustics, and Canadian Acoustics Association. Montreal, Quebec.
- Samlan, R. A., Story, B. H., Bunton, K. E., & Lotto, A. J. (2013, November). The acoustic and perceptual effects of left-right asymmetries based on computational modeling. American Speech Language and Hearing Association Convention. Chicago, IL.
- Samlan, R. A., Story, B. H., Lotto, A. J., & Bunton, K. E. (2013, November). Acoustic and perceptual effects of left-right asymmetries in simulated vocal fold paralysis. American Speech Language and Hearing Association Annual Meeting. Chicago, IL.
- Story, B. H., & Bunton, K. E. (2013, June). Production of child-like vowels with nonlinear interaction of glottal flow and vocal tract resonances.. Joint meeting of the Acoustical Society of America, International Congress on Acoustics, and Canadian Acoustics Association. Montreal, Quebec.
Poster Presentations
- Bunton, K. E., & Story, B. H. (2020, December). Articulation and identification of voiced stop consonants produced by acoustically-driven vocal tract modulations.. Paper presented at the 179th Acoustical Society Meeting, Acoustics Virtually Everywhere (virtual conference)..
- Bunton, K. E., & Story, B. H. (2020, February). The relation of nasal coupling area to the perception of stop versus nasal consonants. 20th Biennial Converence on Motor Speech Disorders: Motor Speech Disorders and Speech Motor Control. Santa Barbara, CA: Madonna Hospitals.
- Chris, B., Chandan, N., Joy, W., Natasha, M., Jennifer, S., & Story, B. H. (2019, February 2019). Overtone Focusing in Tuvan Throat Singing. ARO Midwinter Meeting. Baltimore, MD: Association for Research in Otolaryngology.
- Story, B. H., Bunton, K. E., & Diamond, R. (2018, May). Changes in vowel space characteristics during speech development based on longitudinal measures of formant frequencies. Presented at the 175th Meeting of the Acoustical Society of America. Minneapolis, MN.
- Bunton, K. E., & Story, B. H. (2016, November). Identification of stop consonants produced by an acoustically driven model of child-like vocal tract. 5th Meeting of the Acoustical Society of America and Acoustical Society of Japan. Honolulu, HI: Acoustical Society of America and Acoustical Society of Japan.
- Bunton, K. E., & Story, B. H. (2016, October). Speech performance density as an indicator of clear speech. Fall Voice Conference. Scottsdale, AZ.
- Neely, K., Bunton, K. E., & Story, B. H. (2016, March). Comparison of lip rounding by children and adults. International Conference on Motor Speech. Newport Beach, CA.
- Story, B. H., & Lester, R. (2016, March). Predicting Listener Perception of Simulated Laryngeal Vocal Tremor Using A Novel Measure of Pitch Modulation Strength. International Conference on Motor Speech. Newport Beach, CA.
- Story, B. H., Bunton, K. E., & Vorperian, H. (2016, March). Effects of vocal tract growth on gender and vowel identification based on simulated children’s vowels. International Conference on Motor Speech. Newport Beach, CA.
- Story, B. H., & Bunton, K. E. (2015, May). A spectral filtering method for tracking formants in children's speech.. 169th Acoustical Society Meeting, 2pSC25, Journal of the Acoustical Society of America.
- Willi, M. M., & Story, B. H. (2015, January). Place and manner perception of incomplete stops. 9th Meeting of the Auditory Cognitive Neuroscience Society. Tucson, AZ.
- Willi, M. M., & Story, B. H. (2015, May). Acoustic modeling of the perception of place information in incomplete stops. Acoustical Society of America. Pittsburgh, PA.
- Lester, R. A., Story, B. H., & Lotto, A. J. (2014, October). Acoustical bases for the perception of simulated laryngeal vocal tremor. The Journal of the Acoustical Society of America.
- Lester, R., & Story, B. H. (2016, May). Acoustical bases for the perception of vibrato as a model of vocal tremor. 44th Annual Symposium of the Voice Foundation. Philadelphia, PA.
- Lotto, A. J., Lester, R., & Story, B. H. (2014, October). Acoustical bases for the perception of simulated laryngeal vocal tremor. Acoustical Society of America.
- Lotto, A. J., Story, B. H., & Monson, B. B. (2015, October). Speech spectral intensity discrimination at frequencies above 6 kHz. Acoustical Society of America.
- Bunton, K. E., & Story, B. H. (2012, March). Relation of constriction location, formant transitions, and consonant identification based on VCVs simulated with a child-like model of speech production.. 16th Biennial Conference on Motor Speech Disorders. Santa Rosa, CA.