A memory-based approach to learning shallow natural language patterns

Shlomo Argamon, Ido Dagan, Yuval Krymolowski

Research output: Contribution to journalConference articlepeer-review

53 Scopus citations

Abstract

Recognizing shallow linguistic patterns, such as basic syntactic relationships between words, is a common task in applied natural language and text processing. The common practice for approaching this task is by tedious manual definition of possible pattern structures, often in the form of regular expressions or finite automata. This paper presents a novel memory-based learning method that recognizes shallow patterns in new text based on a bracketed training corpus. The training data are stored as-is, in efficient suffix-tree data structures. Generalization is performed on-line at recognition time by comparing subsequences of the new text to positive and negative evidence in the corpus. This way, no information in the training is lost, as can happen in other learning systems that construct a single generalized model at the time of training. The paper presents experimental results for recognizing noun phrase, subject-verb and verb-object patterns in English. Since the learning approach enables easy porting to new domains, we plan to apply it to syntactic patterns in other languages and to sub-language patterns for information extraction.

Original languageEnglish
Pages (from-to)67-73
Number of pages7
JournalProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
StatePublished - 1998
Event36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, COLING-ACL 1998 - Montreal, Canada
Duration: 10 Aug 199814 Aug 1998

Bibliographical note

Publisher Copyright:
© COLING-ACL 1998.All right reserved.

Fingerprint

Dive into the research topics of 'A memory-based approach to learning shallow natural language patterns'. Together they form a unique fingerprint.

Cite this