DenoisingWord Embeddings by Averaging in a Shared Space

Avi Caciularu, Ido Dagan, Jacob Goldberger

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

We introduce a new approach for smoothing and improving the quality of word embeddings. We consider a method of fusing word embeddings that were trained on the same corpus but with different initializations. We project all the models to a shared vector space using an efficient implementation of the Generalized Procrustes Analysis (GPA) procedure, previously used in multilingual word translation. Our word representation demonstrates consistent improvements over the raw models as well as their simplistic average, on a range of tasks. As the new representations are more stable and reliable, there is a noticeable improvement in rare word evaluations.

Original languageEnglish
Title of host publication*SEM 2021 - 10th Conference on Lexical and Computational Semantics, Proceedings of the Conference
EditorsLun-Wei Ku, Vivi Nastase, Ivan Vulic
PublisherAssociation for Computational Linguistics (ACL)
Pages294-301
Number of pages8
ISBN (Electronic)9781954085770
StatePublished - 2021
Event10th Conference on Lexical and Computational Semantics, *SEM 2021 - Virtual, Bangkok, Thailand
Duration: 5 Aug 20216 Aug 2021

Publication series

Name*SEM 2021 - 10th Conference on Lexical and Computational Semantics, Proceedings of the Conference

Conference

Conference10th Conference on Lexical and Computational Semantics, *SEM 2021
Country/TerritoryThailand
CityVirtual, Bangkok
Period5/08/216/08/21

Bibliographical note

Publisher Copyright:
© 2021 Lexical and Computational Semantics

Funding

The authors would like to thank the anonymous reviewers for their comments and suggestions. The work described herein was supported in part by grants from Intel Labs, Facebook, the Israel Science Foundation grant 1951/17 and the German Research Foundation through the German-Israeli Project Cooperation (DIP, grant DA 1600/1-1).

FundersFunder number
DIPDA 1600/1-1
German-Israeli Project Cooperation
Intel Labs
Deutsche Forschungsgemeinschaft
Israel Science Foundation1951/17

    Fingerprint

    Dive into the research topics of 'DenoisingWord Embeddings by Averaging in a Shared Space'. Together they form a unique fingerprint.

    Cite this