SCorpus Project | The Phonology Laboratory

Studies of sound change in progress have identified the trajectories of change by examining particular linguistic variables in apparent time, or in real-time. While many social factors, such as age, gender, ethnicity, socio-economic class, and language ideology, have been identified to affect the progression of change in particular speech communities, the actuation of change, that is, why a particular change occurs at a particular place and time, remains elusive, presumably because pinpointing the moment of change requires a level of temporal resolution that most longitudinal studies are too coarse-grained to capture. Studies of phonetic imitation, which are designed to observe phonetic changes in a short period of time in a laboratory setting, generally lack the ecological validity necessary to scale up for the purpose of understanding language change.

This research project aims to advance the study of the actuation of sound change by focusing on the speech patterns of the United States Supreme Court (SCOTUS) Justices and the lawyers before the Court. SCOTUS oral arguments from 1955 (the year when the Court began recording oral arguments) to the present will be automatically aligned at the phone-level predicted from the orthographic transcripts. The resulting speech corpus will serve as the empirical basis for detailed modeling of the trajectories of sound change, both at the individual level as well as at the speech community level. This audio and textual corpus is not only enormous in size (more than 9000 hours of audio synchronized to the court transcript at the sentence level), but also has excellent temporal resolution both at the micro- (in terms of conversational between individuals) and the macro-levels (in terms of speech sample of the same individual over periods of years and, in some case, decades). There is also a plethora of metadata concerning the Justices, the advocates arguing before the Court, and the cases, providing a rich source of information that may explain the trajectory of language variation among the oral argument participants. As this corpus is linked to real world outcomes that have quantifiable impact, it provides an unparalleled opportunity to study the impact of language variation and change in the real world.