Abstract
We propose a generalized bootstrapping algorithm in which categories are described by relevant seed features. Our method introduces two unsupervised steps that improve the initial categorization step of the bootstrapping scheme: (i) using Latent Semantic space to obtain a generalized similarity measure between instances and features, and (ii) the Gaussian Mixture algorithm, to obtain uniform classification probabilities for unlabeled examples. The algorithm was evaluated on two Text Categorization tasks and obtained state-of-theart performance using only the category names as initial seeds.
Original language | English |
---|---|
Pages | 129-136 |
Number of pages | 8 |
DOIs | |
State | Published - 2005 |
Event | Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT/EMNLP 2005, Co-located with the 2005 Document Understanding Conference, DUC and the 9th International Workshop on Parsing Technologies, IWPT - Vancouver, BC, Canada Duration: 6 Oct 2005 → 8 Oct 2005 |
Conference
Conference | Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT/EMNLP 2005, Co-located with the 2005 Document Understanding Conference, DUC and the 9th International Workshop on Parsing Technologies, IWPT |
---|---|
Country/Territory | Canada |
City | Vancouver, BC |
Period | 6/10/05 → 8/10/05 |