Clumping properties of content-bearing words

A. Bookstein, S. T. Klein, T. Raita

Research output: Contribution to journalArticlepeer-review

29 Scopus citations

Abstract

Information Retrieval Systems identify content bearing words, and possibly also assign weights, as part of the process of formulating requests. For optimal retrieval efficiency, it is desirable that this be done automatically. This article defines the notion of serial clustering of words in text, and explores the value of such clustering as an indicator of a word's bearing content. This approach is flexible in the sense that it is sensitive to context: a term may be assessed as content-bearing within one collection, but not another. Our approach, being numerical, may also be of value in assigning weights to terms in requests. Experimental support is obtained from natural text databases in three different languages.

Original languageEnglish
Pages (from-to)102-114
Number of pages13
JournalJournal of the American Society for Information Science
Volume49
Issue number2
DOIs
StatePublished - 1998

Fingerprint

Dive into the research topics of 'Clumping properties of content-bearing words'. Together they form a unique fingerprint.

Cite this