Adding new classes without access to the original training data with applications to language identification

Hagai Taitelbaum, Ehud Ben-Reuven, Jacob Goldberger

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

In this study we address the problem of adding new classes to an existing neural network classifier. We assume that new training data with the new classes is available. In many applications, dataset used to train machine learning algorithms contain confidential information that cannot be accessed during the process of extending the class set. We propose a method for training an extended class-set classifier using only examples with labels from the new classes while avoiding the problem of forgetting the original classes. This incremental training method is applied to the problem of language identification. We report results on the 50 languages NIST 2015 dataset where we were able to classify all the languages even though only part of the classes was available during the first training phase and the other languages were only available during the second phase.

Original languageEnglish
Pages (from-to)1808-1812
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2018-September
DOIs
StatePublished - 2018
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: 2 Sep 20186 Sep 2018

Bibliographical note

Publisher Copyright:
© 2018 International Speech Communication Association. All rights reserved.

Funding

This research is partially supported by the BIU Center for Research in Applied Cryptography and Cyber Security in conjunction with the Israel National Cyber Directorate in the Prime Minister's Office.

FundersFunder number
Canadian Centre for Applied Research in Cancer Control

    Keywords

    • Adding new classes
    • Catastrophic forgetting
    • Language identification
    • Learning privacy

    Fingerprint

    Dive into the research topics of 'Adding new classes without access to the original training data with applications to language identification'. Together they form a unique fingerprint.

    Cite this