Committee-Based Sampling For Training Probabilistic Classifiers

Ido Dagan, Sean P. Engelson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

396 Scopus citations

Abstract

In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper proposes a general method for efficiently training probabilistic classifiers, by selecting for training only the more informative examples in a stream of unlabeled examples. The method, committee-based sampling, evaluates the in-formativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set selected so far (Monte-Carlo sampling). The method is particularly attractive because it evaluates the expected information gain from a training example implicitly, making the model both easy to implement and generally applicable. We further show how to apply committee based sampling for training Hidden Markov Model classifiers, which are commonly used for complex classification tasks. The method was implemented and tested for the task of tagging words in natural language sentences with parts-of-speech. Experimental evaluation of committee-based sampling versus standard sequential training showed a substantial improvement in training efficiency.

Original languageEnglish
Title of host publicationProceedings of the 12th International Conference on Machine Learning, ICML 1995
EditorsArmand Prieditis, Stuart Russell
PublisherMorgan Kaufmann Publishers, Inc.
Pages150-157
Number of pages8
ISBN (Electronic)1558603778, 9781558603776
StatePublished - 1995
Event12th International Conference on Machine Learning, ICML 1995 - Tahoe City, United States
Duration: 9 Jul 199512 Jul 1995

Publication series

NameProceedings of the 12th International Conference on Machine Learning, ICML 1995

Conference

Conference12th International Conference on Machine Learning, ICML 1995
Country/TerritoryUnited States
CityTahoe City
Period9/07/9512/07/95

Bibliographical note

Publisher Copyright:
© ICML 1995.All rights reserved

Funding

We thank Yoav Freund and Yishay Mansour for helpful discussions. The second author gratefully acknowledges the support of the Fulbright Foundation. We also thank the anonymous reviewers for their helpful comments.

FundersFunder number
Fulbright Foundation

    Fingerprint

    Dive into the research topics of 'Committee-Based Sampling For Training Probabilistic Classifiers'. Together they form a unique fingerprint.

    Cite this