Abstract
Many of the language identification (LID) systems are based on language models using machine learning (ML) techniques that take into account the fluctuation of speech over time, such as Hidden Markov Models (HMM). Considering the fluctuation of speech results LID systems use relatively long recording intervals to obtain reasonable accuracy. This research tries to extract enough features from short recording intervals in order to enable successful classification of the tested spoken languages. The classification process is based on frames of 20 milliseconds (ms) where most of the previous LID systems were based on much longer time frames (from 3 seconds to 2 minutes). We defined and implemented 173 low level features divided into three feature sets: cepstrum, relative spectral (RASTA), and spectrum. The examined corpus, containing speech files in seven languages, is a subset of the Oregon Graduate Institute (OGI) telephone speech corpus. Six machine learning (ML) methods have been applied and compared and the best optimized results have been achieved by Random Forest (RF): 89%, 82%, and 80% for 2, 5, and 7 languages, respectively.
Original language | English |
---|---|
Pages | 275-285 |
Number of pages | 11 |
State | Published - 2015 |
Externally published | Yes |
Event | 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015 - Shanghai, China Duration: 30 Oct 2015 → 1 Nov 2015 |
Conference
Conference | 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015 |
---|---|
Country/Territory | China |
City | Shanghai |
Period | 30/10/15 → 1/11/15 |