Enhancing unlexicalized parsing performance using a wide coverage lexicon, fuzzy tag-set mapping, and EM-HMM-based lexical probabilities

Yoav Goldberg, Reut Tsarfaty, Meni Adler, Michael Elhadad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

We present a framework for interfacing a PCFG parser with lexical information from an external resource following a different tagging scheme than the treebank. This is achieved by defining a stochastic mapping layer between the two resources. Lexical probabilities for rare events are estimated in a semi-supervised manner from a lexicon and large unannotated corpora. We show that this solution greatly enhances the performance of an unlexicalized Hebrew PCFG parser, resulting in state-of-the-art Hebrew parsing results both when a segmentation oracle is assumed, and in a real-word parsing scenario of parsing unsegmented tokens.

Original languageEnglish
Title of host publicationEACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages327-335
Number of pages9
ISBN (Print)9781932432169
DOIs
StatePublished - 2009
Externally publishedYes
Event12th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2009 - Athens, Greece
Duration: 30 Mar 20093 Apr 2009

Publication series

NameEACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings

Conference

Conference12th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2009
Country/TerritoryGreece
CityAthens
Period30/03/093/04/09

Fingerprint

Dive into the research topics of 'Enhancing unlexicalized parsing performance using a wide coverage lexicon, fuzzy tag-set mapping, and EM-HMM-based lexical probabilities'. Together they form a unique fingerprint.

Cite this