Abstract
In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper proposes a general method for efficiently training probabilistic classifiers, by selecting for training only the more informative examples in a stream of unlabeled examples. The method, committee-based sampling, evaluates the in-formativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set selected so far (Monte-Carlo sampling). The method is particularly attractive because it evaluates the expected information gain from a training example implicitly, making the model both easy to implement and generally applicable. We further show how to apply committee based sampling for training Hidden Markov Model classifiers, which are commonly used for complex classification tasks. The method was implemented and tested for the task of tagging words in natural language sentences with parts-of-speech. Experimental evaluation of committee-based sampling versus standard sequential training showed a substantial improvement in training efficiency.
Original language | English |
---|---|
Title of host publication | Proceedings of the 12th International Conference on Machine Learning, ICML 1995 |
Editors | Armand Prieditis, Stuart Russell |
Publisher | Morgan Kaufmann Publishers, Inc. |
Pages | 150-157 |
Number of pages | 8 |
ISBN (Electronic) | 1558603778, 9781558603776 |
State | Published - 1995 |
Event | 12th International Conference on Machine Learning, ICML 1995 - Tahoe City, United States Duration: 9 Jul 1995 → 12 Jul 1995 |
Publication series
Name | Proceedings of the 12th International Conference on Machine Learning, ICML 1995 |
---|
Conference
Conference | 12th International Conference on Machine Learning, ICML 1995 |
---|---|
Country/Territory | United States |
City | Tahoe City |
Period | 9/07/95 → 12/07/95 |
Bibliographical note
Publisher Copyright:© ICML 1995.All rights reserved
Funding
We thank Yoav Freund and Yishay Mansour for helpful discussions. The second author gratefully acknowledges the support of the Fulbright Foundation. We also thank the anonymous reviewers for their helpful comments.
Funders | Funder number |
---|---|
Fulbright Foundation |