Abstract
We study spoken term detection - the task of determining whether and where a given word or phrase appears in a given segment of speech - in the setting of limited training data. This setting is becoming increasingly important as interest grows in porting spoken term detection to multiple lowresource languages and acoustic environments. We propose a discriminative algorithm that aims at maximizing the area under the receiver operating characteristic curve, often used to evaluate the performance of spoken term detection systems. We implement the approach using a set of feature functions based on multilayer perceptron classifiers of phones and articulatory features, and experiment on data drawn from the Switchboard database of conversational telephone speech. Our approach outperforms a baseline HMM-based system by a large margin across a number of training set sizes.
| Original language | English |
|---|---|
| Pages | 22-25 |
| Number of pages | 4 |
| State | Published - 2012 |
| Externally published | Yes |
| Event | 2012 Symposium on Machine Learning in Speech and Language Processing, MLSLP 2012 - Portland, United States Duration: 14 Sep 2012 → … |
Conference
| Conference | 2012 Symposium on Machine Learning in Speech and Language Processing, MLSLP 2012 |
|---|---|
| Country/Territory | United States |
| City | Portland |
| Period | 14/09/12 → … |
Bibliographical note
Publisher Copyright:© 2012 Machine Learning in Speech and Language Processing, MLSLP 2012. All rights reserved.
Keywords
- AUC
- Spoken term detection
- discriminative training
- structural SVM