TY - JOUR
T1 - Phoneme alignment based on discriminative learning
AU - Keshet, Joseph
AU - Shalev-Shwartz, Shai
AU - Singer, Yoram
AU - Chazan, Dan
PY - 2005/12/1
Y1 - 2005/12/1
N2 - We propose a new paradigm for aligning a phoneme sequence of a speech utterance with its acoustical signal counterpart. In contrast to common HMM-based approaches, our method employs a discriminative learning procedure in which the learning phase is tightly coupled with the alignment task at hand. The alignment function we devise is based on mapping the input acoustic-symbolic representations of the speech utterance along with the target alignment into an abstract vector space. We suggest a specific mapping into the abstract vector-space which utilizes standard speech features (e.g. spectral distances) as well as confidence outputs of a framewise phoneme classifier. Building on techniques used for large margin methods for predicting whole sequences, our alignment function distills to a classifier in the abstract vector-space which separates correct alignments from incorrect ones. We describe a simple iterative algorithm for learning the alignment function and discuss its formal properties. Experiments with the TIMIT corpus show that our method outperforms the current state-of-the-art approaches.
AB - We propose a new paradigm for aligning a phoneme sequence of a speech utterance with its acoustical signal counterpart. In contrast to common HMM-based approaches, our method employs a discriminative learning procedure in which the learning phase is tightly coupled with the alignment task at hand. The alignment function we devise is based on mapping the input acoustic-symbolic representations of the speech utterance along with the target alignment into an abstract vector space. We suggest a specific mapping into the abstract vector-space which utilizes standard speech features (e.g. spectral distances) as well as confidence outputs of a framewise phoneme classifier. Building on techniques used for large margin methods for predicting whole sequences, our alignment function distills to a classifier in the abstract vector-space which separates correct alignments from incorrect ones. We describe a simple iterative algorithm for learning the alignment function and discuss its formal properties. Experiments with the TIMIT corpus show that our method outperforms the current state-of-the-art approaches.
UR - http://www.scopus.com/inward/record.url?scp=33745211420&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
JO - 9th European Conference on Speech Communication and Technology
JF - 9th European Conference on Speech Communication and Technology
ER -