Linguistic regularities in sparse and explicit word representations

Omer Levy, Yoav Goldberg

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

385 Scopus citations

Abstract

Recent work has shown that neural-embedded word representations capture many relational similarities, which can be recovered by means of vector arithmetic in the embedded space. We show that Mikolov et al.’s method of first adding and subtracting word vectors, and then searching for a word similar to the result, is equivalent to searching for a word that maximizes a linear combination of three pairwise word similarities. Based on this observation, we suggest an improved method of recovering relational similarities, improving the state-of-the-art results on two recent word-analogy datasets. Moreover, we demonstrate that analogy recovery is not restricted to neural word embeddings, and that a similar amount of relational similarities can be recovered from traditional distributional word representations.

Original languageEnglish
Title of host publicationCoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages171-180
Number of pages10
ISBN (Electronic)9781941643020
DOIs
StatePublished - 2014
Event18th Conference on Computational Natural Language Learning, CoNLL 2014 - Baltimore, United States
Duration: 26 Jun 201427 Jun 2014

Publication series

NameCoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings

Conference

Conference18th Conference on Computational Natural Language Learning, CoNLL 2014
Country/TerritoryUnited States
CityBaltimore
Period26/06/1427/06/14

Bibliographical note

Funding Information:
This paper is supported by the project of Natural Science Foundation of China (Grant Nos. 61272384, 61402134, and 61370170).

Publisher Copyright:
© 2014 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Linguistic regularities in sparse and explicit word representations'. Together they form a unique fingerprint.

Cite this