Investigating the relationship between age, gender, and vowel categorization within and across languages: A comparison of Cantonese, English, and Japanese
Benjamin Munson1 and Andrew R. Plummer2*
*The order of authors is alphabetical, and reflects equal contributions on both individuals’ parts.
1Department of Speech-Language-Hearing Science, University of Minnesota
2Departments of Linguistics and Computer and Information Science, Ohio State University
Languages with a large number of contrasts among vowels, e.g., Cantonese and American English, with eleven monophthongal vowels each, are typologicially rare in relation to languages like Japanese, which has only five. Juxtaposing languages which exhibit such dramatic differences in vowel inventory size has the potential to illuminate aspects of speech perception that typically fall outside canonical consideration. For example, careful cross-language comparison of the relative locations and diffuseness of regions in formant space that correspond to vowel categories may reveal cross-language differences in the attention given to finer-grained phonetic detail encoding other categorical information, e.g., age and gender information. If so, such comparisons may also reveal within-language differences based on small-group or even individualistic vocal learning experience.
In this talk, we present an array of data derived from Cantonese, American English, and Japanese listeners’ age, gender, and vowel category judgments in response to age-varying sets of synthetic vowels generated from a model of the growing vocal tract. The data were collected as part of the “Learning to Talk” research initiative focused on modeling the development of language-specific speech perception and production (http://learningtotalk.org/?q=node/25). Based on the vowel categories within their native languages, Cantonese-, American English- and Japanese-speaking listeners categorized seven sets of synthetic vowels generated from models of the vocal tract ranging in age from six months to 21 years of age, while also providing a numeric “goodness rating” indicating how well each vowel represented the chosen vowel category. Listeners also provided numeric age judgments and gender ratings along a visual analog scale ranging from “definitely male” to “definitely female” (or their Cantonese and Japanese equivalents) for subsets of the stimuli corresponding to the corner vowels /i/, /a/, and /u/.
We limit our discussion to three recent harvests yielded by the rich array of data described above. First, we will describe a statistical modeling methodology which provides the means to parse the formant space for each vocal tract age into listener-specific vowel category regions based on listerner-provided category judgments and goodness ratings. This listener-specific modeling not only serves as the basis for investigating broader language-specific vowel categorization patterns, but is crucial in modeling the likely caregiver-infant dyad-specific aspects of vowel category acquisition. Second, we examine language-specific differences in the perception of these vowels across vocal tract ages. Preliminary analyses reveal a bifurcation in gender rating patterns beginning at age 10, wherein vowel stimuli for this age are rated as either a young male or older female. Moreover, the gender rating patterns appear to differ across languages, suggesting a learned, language-specific association between vowel categories and social categories. Finally, we examine relationships among the judgments of age and gender and the vowel identifications for the stimulus set where age and gender were the most ambiguous. Here we show that these three variables are more closely related in the Cantonese and English data than in the Japanese data. These results suggests that the relatively large vowel systems of Cantonese and English may require listeners to use different sources of information in disambiguation than does the smaller vowel system of Japanese.
Funded by a collaborative grant from the National Science Foundation (BCS-0729306 [Mary E. Beckman & Eric Fossier-Lussier], BCS-0729140 [Jan Edwards], and BCS-0729277 [Benjamin Munson], “Using machine learning to model the interplay of production dynamics and perception dynamics in phonological acquisition”