Automatic classification of spoken languages using diverse acoustic features

Yaakov HaCohen-Kerner, Ruben Hagege

Research output: Contribution to conferencePaperpeer-review

3 Scopus citations

Abstract

Many of the language identification (LID) systems are based on language models using machine learning (ML) techniques that take into account the fluctuation of speech over time, such as Hidden Markov Models (HMM). Considering the fluctuation of speech results LID systems use relatively long recording intervals to obtain reasonable accuracy. This research tries to extract enough features from short recording intervals in order to enable successful classification of the tested spoken languages. The classification process is based on frames of 20 milliseconds (ms) where most of the previous LID systems were based on much longer time frames (from 3 seconds to 2 minutes). We defined and implemented 173 low level features divided into three feature sets: cepstrum, relative spectral (RASTA), and spectrum. The examined corpus, containing speech files in seven languages, is a subset of the Oregon Graduate Institute (OGI) telephone speech corpus. Six machine learning (ML) methods have been applied and compared and the best optimized results have been achieved by Random Forest (RF): 89%, 82%, and 80% for 2, 5, and 7 languages, respectively.

Original languageEnglish
Pages275-285
Number of pages11
StatePublished - 2015
Externally publishedYes
Event29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015 - Shanghai, China
Duration: 30 Oct 20151 Nov 2015

Conference

Conference29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015
Country/TerritoryChina
CityShanghai
Period30/10/151/11/15

Fingerprint

Dive into the research topics of 'Automatic classification of spoken languages using diverse acoustic features'. Together they form a unique fingerprint.

Cite this