TY - GEN
T1 - Routing documents according to style
AU - Argamon, Shlomo
AU - Koppel, M.
AU - Avneri, Galit
N1 - Place of conference:Italy
PY - 1998
Y1 - 1998
N2 - Most research on automated text categorization has focused on determining
the topic of a given text. While topic is generally the main
characteristic of an information need, there are other characteristics
that are useful for information retrieval. In this paper we consider
the problem of text categorization according to style. For example,
in searching the web, we may wish to automatically determine if a
given page is promotional or informative, was written by a native English
speaker or not, and so on. Learning to determine the style of
a document is a dual to that of determining its topic, in that those
document features which capture the style of a document are precisely
those which are independent of its topic. We here deffne the features
of a document to be the frequencies of each of a set of function words
and parts-of-speech triples. We then use machine learning techniques
to classify documents. We test our methods on four collections of
downloaded newspaper and magazine articles.
AB - Most research on automated text categorization has focused on determining
the topic of a given text. While topic is generally the main
characteristic of an information need, there are other characteristics
that are useful for information retrieval. In this paper we consider
the problem of text categorization according to style. For example,
in searching the web, we may wish to automatically determine if a
given page is promotional or informative, was written by a native English
speaker or not, and so on. Learning to determine the style of
a document is a dual to that of determining its topic, in that those
document features which capture the style of a document are precisely
those which are independent of its topic. We here deffne the features
of a document to be the frequencies of each of a set of function words
and parts-of-speech triples. We then use machine learning techniques
to classify documents. We test our methods on four collections of
downloaded newspaper and magazine articles.
UR - https://scholar.google.co.il/scholar?q=Routing+Documents+According+to+Style+&btnG=&hl=en&as_sdt=0%2C5
M3 - Conference contribution
BT - First International workshop on innovative information systems
ER -