TY - JOUR
T1 - Clumping properties of content-bearing words
AU - Bookstein, A.
AU - Klein, S. T.
AU - Raita, T.
PY - 1998
Y1 - 1998
N2 - Information Retrieval Systems identify content bearing words, and possibly also assign weights, as part of the process of formulating requests. For optimal retrieval efficiency, it is desirable that this be done automatically. This article defines the notion of serial clustering of words in text, and explores the value of such clustering as an indicator of a word's bearing content. This approach is flexible in the sense that it is sensitive to context: a term may be assessed as content-bearing within one collection, but not another. Our approach, being numerical, may also be of value in assigning weights to terms in requests. Experimental support is obtained from natural text databases in three different languages.
AB - Information Retrieval Systems identify content bearing words, and possibly also assign weights, as part of the process of formulating requests. For optimal retrieval efficiency, it is desirable that this be done automatically. This article defines the notion of serial clustering of words in text, and explores the value of such clustering as an indicator of a word's bearing content. This approach is flexible in the sense that it is sensitive to context: a term may be assessed as content-bearing within one collection, but not another. Our approach, being numerical, may also be of value in assigning weights to terms in requests. Experimental support is obtained from natural text databases in three different languages.
UR - http://www.scopus.com/inward/record.url?scp=0031999348&partnerID=8YFLogxK
U2 - 10.1002/(SICI)1097-4571(1998)49:2<102::AID-ASI2>3.0.CO;2-2
DO - 10.1002/(SICI)1097-4571(1998)49:2<102::AID-ASI2>3.0.CO;2-2
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0031999348
SN - 0002-8231
VL - 49
SP - 102
EP - 114
JO - Journal of the American Society for Information Science
JF - Journal of the American Society for Information Science
IS - 2
ER -