Mistake-driven learning with thesaurus for text categorization

Takefumi Yamazaki, I. Dagan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper extends the mistake-driven learner WINNOW to better utilize thesauri for text categorization. In our method not only words but also semantic categories given by the thesaurus are used as features in a classi- fer. New fltering and disambiguation methods are used as pre-processing to solve the problems caused by the use of the thesaurus. In order to verify our methods, we test a large body of tagged Japanese newspaper articles created by RWCP1 . Experimental results show that WINNOW with thesauri attains high accuracy and that the proposed fltering and disambiguation methods also contribute to the improved accuracy.
Original languageAmerican English
Title of host publicationNLPRS-97, the Natural Language Processing Pacific Rim Symposium
StatePublished - 1997

Bibliographical note

Place of conference:Phuket, Thailand

Fingerprint

Dive into the research topics of 'Mistake-driven learning with thesaurus for text categorization'. Together they form a unique fingerprint.

Cite this