Scaling web-based acquisition of entailment relations

Idan Szpektor, Hristo Tanev, Ido Dagan, Bonaventura Coppola

Research output: Contribution to conferencePaperpeer-review

152 Scopus citations

Abstract

Paraphrase recognition is a critical step for natural language interpretation. Accordingly, many NLP applications would benefit from high coverage knowledge bases of paraphrases. However, the scalability of state-of-the-art paraphrase acquisition approaches is still limited. We present a fully unsupervised learning algorithm for Web-based extraction of entailment relations, an extended model of paraphrases. We focus on increased scalability and generality with respect to prior work, eventually aiming at a full scale knowledge base. Our current implementation of the algorithm takes as its input a verb lexicon and for each verb searches the Web for related syntactic entailment templates. Experiments show promising results with respect to the ultimate goal, achieving much better scalability than prior Web-based methods.

Original languageEnglish
Pages41-48
Number of pages8
StatePublished - 2004
Event2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004 - Barcelona, Spain
Duration: 25 Jul 200426 Jul 2004

Conference

Conference2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004
Country/TerritorySpain
CityBarcelona
Period25/07/0426/07/04

Bibliographical note

Publisher Copyright:
© 2005 Association for Computational Linguistics

Funding

The authors would like to thank Oren Glickman (Bar Ilan University) for helpful discussions and assistance in the evaluation, Bernardo Magnini for his scientific supervision at ITC-irst, Alessandro Vallin and Danilo Giampiccolo (ITC-irst) for their help in developing the human based evaluation, and Prof. Yossi Matias (Tel-Aviv University) for supervising the first author. This work was partially supported by the MOREWEB project, financed by Provincia Autonoma di Trento. It was also partly carried out within the framework of the ITC-IRST (TRENTO, ITALY) – UNIVERSITY OF HAIFA (ISRAEL) collaboration project. For data visualization and analysis the authors intensively used the CLARK system (www.bultreebank.org) developed at the Bulgarian Academy of Sciences . The authors would like to thank Oren Glickman (Bar Ilan University) for helpful discussions and assistance in the evaluation, Bernardo Magnini for his scientific supervision at ITC-irst, Alessandro Vallin and Danilo Giampiccolo (ITC-irst) for their help in developing the human based evaluation, and Prof. Yossi Matias (Tel-Aviv University) for supervising the first author. This work was partially supported by the MOREWEB project, financed by Provincia Autonoma di Trento. It was also partly carried out within the framework of the ITC-IRST (TRENTO, ITALY) - UNIVERSITY OF HAIFA (Israel) collaboration project. For data visualization and analysis the authors intensively used the CLARK system (www.bultreebank.org) developed at the Bulgarian Academy of Sciences.

FundersFunder number
Tel Aviv University
University of Haifa
Bulgarian Academy of Sciences
Provincia Autonoma di Trento

    Fingerprint

    Dive into the research topics of 'Scaling web-based acquisition of entailment relations'. Together they form a unique fingerprint.

    Cite this