Abstract
We present a system to enable speaker separation and identification, designed to operate without requiring any effort from the end-user. In the system, single channel conversations are transformed into i-vectors, clustered into speakers and matched to a database of known speakers. Enrollment is automatic and a voice print is constructed for the recording user, taking advantage of the meta-data identifying that user's conversations. Further information is used when available from other information sources such as video and the ASR transcribed content to identify speakers. We describe the system architecture, novel unsupervised enrollment algorithm and describe the difficulties encountered in solving this problem.
| Original language | English |
|---|---|
| Pages (from-to) | 1964-1965 |
| Number of pages | 2 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Volume | 2018-September |
| DOIs | |
| State | Published - 2018 |
| Externally published | Yes |
| Event | 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India Duration: 2 Sep 2018 → 6 Sep 2018 |
Bibliographical note
Publisher Copyright:© 2018 International Speech Communication Association. All rights reserved.
Keywords
- Diarization
- Speaker separation
- Speech recognition