Abstract
This paper extends the mistake-driven
learner WINNOW to better utilize thesauri
for text categorization.
In our method not only words but also
semantic categories given by the thesaurus
are used as features in a classi-
fer. New fltering and disambiguation
methods are used as pre-processing to
solve the problems caused by the use of
the thesaurus. In order to verify our
methods, we test a large body of tagged
Japanese newspaper articles created by
RWCP1
. Experimental results show that
WINNOW with thesauri attains high accuracy
and that the proposed fltering
and disambiguation methods also contribute
to the improved accuracy.
Original language | American English |
---|---|
Title of host publication | NLPRS-97, the Natural Language Processing Pacific Rim Symposium |
State | Published - 1997 |