TY - GEN
T1 - Clustering for unsupervised relation identification
AU - Rosenfeld, Benjamin
AU - Feldman, Ronen
PY - 2007
Y1 - 2007
N2 - Unsupervised Relation Identification is the task of automatically discovering interesting relations between entities in a large text corpora. Relations are identified by clustering the frequently co-occurring pairs of entities in such a way that pairs occurring in similar contexts end up belonging to the same clusters. In this paper we compare several clustering setups, some of them novel and others already tried. The setups include feature extraction and selection methods and clustering algorithms. In order to do the comparison, we develop a clustering evaluation metric, specifically adapted for the relation identification task. Our experiments demonstrate significant superiority of the single-linkage hierarchical clustering with the novel threshold selection technique over the other tested clustering algorithms. Also, the experiments indicate that for successful relation identification it is important to use rich complex features of two kinds: features that test both relation slots together ("relation features"), and features that test only one slot each ("entity features"). We have found that using both kinds of features with the best of the algorithms produces very high-precision results, significantly improving over the previous work.
AB - Unsupervised Relation Identification is the task of automatically discovering interesting relations between entities in a large text corpora. Relations are identified by clustering the frequently co-occurring pairs of entities in such a way that pairs occurring in similar contexts end up belonging to the same clusters. In this paper we compare several clustering setups, some of them novel and others already tried. The setups include feature extraction and selection methods and clustering algorithms. In order to do the comparison, we develop a clustering evaluation metric, specifically adapted for the relation identification task. Our experiments demonstrate significant superiority of the single-linkage hierarchical clustering with the novel threshold selection technique over the other tested clustering algorithms. Also, the experiments indicate that for successful relation identification it is important to use rich complex features of two kinds: features that test both relation slots together ("relation features"), and features that test only one slot each ("entity features"). We have found that using both kinds of features with the best of the algorithms produces very high-precision results, significantly improving over the previous work.
KW - Clustering
KW - Information extraction
KW - Relation learning
KW - Unsupervised relation identification
UR - http://www.scopus.com/inward/record.url?scp=63449099228&partnerID=8YFLogxK
U2 - 10.1145/1321440.1321499
DO - 10.1145/1321440.1321499
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:63449099228
SN - 9781595938039
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 411
EP - 418
BT - CIKM 2007 - Proceedings of the 16th ACM Conference on Information and Knowledge Management
T2 - 16th ACM Conference on Information and Knowledge Management, CIKM 2007
Y2 - 6 November 2007 through 9 November 2007
ER -