Abstract
We suggest a corpus-independent feature set
appropriate for style-based text categorization
problems. To achieve this, we introduce a new
measure on linguistic features, called stability,
which captures the extent to which a language
element, such as a word or syntactic construct, is
replaceable by semantically equivalent elements.
This measure may be perceived as quantifying
the degree of available “synonymy” for a
language item. We show that frequent but
unstable features are especially useful for stylebased
text categorization
Original language | American English |
---|---|
Title of host publication | IJCAI |
State | Published - 2003 |