Detecting content-bearing words by serial clustering - extended abstract

A. Bookstein, S. T. Klein, T. Raita

Research output: Contribution to journalConference articlepeer-review

14 Scopus citations

Abstract

Information Retrieval Systems typically distinguish between content bearing words and terms on a stop list. But 'content-bearing' is relative to a collection. For optimal retrieval efficiency, it is desirable to have automated methods for custom building a stop list. This paper defines the notion of serial clustering of words in text, and explores the value of such clustering as an indicator of a word bearing content. The numerical measures we propose may also be of value in assigning weights to terms in requests. Experimental support is obtained from natural text databases in three different languages.

Original languageEnglish
Pages (from-to)319-327
Number of pages9
JournalSIGIR Forum (ACM Special Interest Group on Information Retrieval)
DOIs
StatePublished - 1995
Externally publishedYes
EventProceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - Seattle, WA, USA
Duration: 9 Jul 199513 Jul 1995

Fingerprint

Dive into the research topics of 'Detecting content-bearing words by serial clustering - extended abstract'. Together they form a unique fingerprint.

Cite this