Abstract
Paraphrase recognition is a critical step for natural language interpretation. Accordingly, many NLP applications would benefit from high coverage knowledge bases of paraphrases. However, the scalability of state-of-the-art paraphrase acquisition approaches is still limited. We present a fully unsupervised learning algorithm for Web-based extraction of entailment relations, an extended model of paraphrases. We focus on increased scalability and generality with respect to prior work, eventually aiming at a full scale knowledge base. Our current implementation of the algorithm takes as its input a verb lexicon and for each verb searches the Web for related syntactic entailment templates. Experiments show promising results with respect to the ultimate goal, achieving much better scalability than prior Web-based methods.
Original language | English |
---|---|
Pages | 41-48 |
Number of pages | 8 |
State | Published - 2004 |
Event | 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004 - Barcelona, Spain Duration: 25 Jul 2004 → 26 Jul 2004 |
Conference
Conference | 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004 |
---|---|
Country/Territory | Spain |
City | Barcelona |
Period | 25/07/04 → 26/07/04 |
Bibliographical note
Publisher Copyright:© 2005 Association for Computational Linguistics
Funding
The authors would like to thank Oren Glickman (Bar Ilan University) for helpful discussions and assistance in the evaluation, Bernardo Magnini for his scientific supervision at ITC-irst, Alessandro Vallin and Danilo Giampiccolo (ITC-irst) for their help in developing the human based evaluation, and Prof. Yossi Matias (Tel-Aviv University) for supervising the first author. This work was partially supported by the MOREWEB project, financed by Provincia Autonoma di Trento. It was also partly carried out within the framework of the ITC-IRST (TRENTO, ITALY) – UNIVERSITY OF HAIFA (ISRAEL) collaboration project. For data visualization and analysis the authors intensively used the CLARK system (www.bultreebank.org) developed at the Bulgarian Academy of Sciences . The authors would like to thank Oren Glickman (Bar Ilan University) for helpful discussions and assistance in the evaluation, Bernardo Magnini for his scientific supervision at ITC-irst, Alessandro Vallin and Danilo Giampiccolo (ITC-irst) for their help in developing the human based evaluation, and Prof. Yossi Matias (Tel-Aviv University) for supervising the first author. This work was partially supported by the MOREWEB project, financed by Provincia Autonoma di Trento. It was also partly carried out within the framework of the ITC-IRST (TRENTO, ITALY) - UNIVERSITY OF HAIFA (Israel) collaboration project. For data visualization and analysis the authors intensively used the CLARK system (www.bultreebank.org) developed at the Bulgarian Academy of Sciences.
Funders | Funder number |
---|---|
Tel Aviv University | |
University of Haifa | |
Bulgarian Academy of Sciences | |
Provincia Autonoma di Trento |