Aligning vector-spaces with noisy supervised lexicons

Noa Yehezkel Lubin, Jacob Goldberger, Yoav Goldberg

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

The problem of learning to translate between two vector spaces given a set of aligned points arises in several application areas of NLP. Current solutions assume that the lexicon which defines the alignment pairs is noise-free. We consider the case where the set of aligned points is allowed to contain an amount of noise, in the form of incorrect lexicon pairs and show that this arises in practice by analyzing the edited dictionaries after the cleaning process. We demonstrate that such noise substantially degrades the accuracy of the learned translation when using current methods. We propose a model that accounts for noisy pairs. This is achieved by introducing a generative model with a compatible iterative EM algorithm. The algorithm jointly learns the noise level in the lexicon, finds the set of noisy pairs, and learns the mapping between the spaces. We demonstrate the effectiveness of our proposed algorithm on two alignment problems: bilingual word embedding translation, and mapping between diachronic embedding spaces for recovering the semantic shifts of words across time periods.

Original languageEnglish
Title of host publicationLong and Short Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages460-465
Number of pages6
ISBN (Electronic)9781950737130
StatePublished - 2019
Event2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019 - Minneapolis, United States
Duration: 2 Jun 20197 Jun 2019

Publication series

NameNAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
Volume1

Conference

Conference2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019
Country/TerritoryUnited States
CityMinneapolis
Period2/06/197/06/19

Bibliographical note

Publisher Copyright:
© 2019 Association for Computational Linguistics

Funding

The work was supported by The Israeli Science Foundation (grant number 1555/15), and by the Israeli ministry of Science, Technology and Space through the Israeli-French Maimonide Cooperation program. We also, thank Roee Aharoni for helpful discussions and suggestions.

FundersFunder number
Israeli Science Foundation1555/15
Israeli-French Maimonide Cooperation program
Ministry of Science, Technology and Space
Israel Science Foundation

    Fingerprint

    Dive into the research topics of 'Aligning vector-spaces with noisy supervised lexicons'. Together they form a unique fingerprint.

    Cite this