Automatic measurement of vowel duration via structured prediction

Yossi Adi, Joseph Keshet, Emily Cibelli, Erin Gustafson, Cynthia Clopper, Matthew Goldrick

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

A key barrier to making phonetic studies scalable and replicable is the need to rely on subjective, manual annotation. To help meet this challenge, a machine learning algorithm was developed for automatic measurement of a widely used phonetic measure: vowel duration. Manually-annotated data were used to train a model that takes as input an arbitrary length segment of the acoustic signal containing a single vowel that is preceded and followed by consonants and outputs the duration of the vowel. The model is based on the structured prediction framework. The input signal and a hypothesized set of a vowel's onset and offset are mapped to an abstract vector space by a set of acoustic feature functions. The learning algorithm is trained in this space to minimize the difference in expectations between predicted and manually-measured vowel durations. The trained model can then automatically estimate vowel durations without phonetic or orthographic transcription. Results comparing the model to three sets of manually annotated data suggest it outperformed the current gold standard for duration measurement, an hidden Markov model-based forced aligner (which requires orthographic or phonetic transcription as an input).

Original languageEnglish
Pages (from-to)4517-4527
Number of pages11
JournalJournal of the Acoustical Society of America
Volume140
Issue number6
DOIs
StatePublished - 1 Dec 2016

Bibliographical note

Publisher Copyright:
© 2016 Acoustical Society of America.

Funding

Research supported by NIH Grant No. 1R21HD077140 and NSF Grant No. BCS1056409.

FundersFunder number
National Science FoundationBCS1056409
National Institutes of Health
National Institute of Child Health and Human DevelopmentR21HD077140

    Fingerprint

    Dive into the research topics of 'Automatic measurement of vowel duration via structured prediction'. Together they form a unique fingerprint.

    Cite this