Committee-Based Sampling For Training Probabilistic Classifiers

Ido Dagan, Sean P. Engelson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

374 Scopus citations

Abstract

In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper proposes a general method for efficiently training probabilistic classifiers, by selecting for training only the more informative examples in a stream of unlabeled examples. The method, committee-based sampling, evaluates the in-formativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set selected so far (Monte-Carlo sampling). The method is particularly attractive because it evaluates the expected information gain from a training example implicitly, making the model both easy to implement and generally applicable. We further show how to apply committee based sampling for training Hidden Markov Model classifiers, which are commonly used for complex classification tasks. The method was implemented and tested for the task of tagging words in natural language sentences with parts-of-speech. Experimental evaluation of committee-based sampling versus standard sequential training showed a substantial improvement in training efficiency.

Original languageEnglish
Title of host publicationProceedings of the 12th International Conference on Machine Learning, ICML 1995
EditorsArmand Prieditis, Stuart Russell
PublisherMorgan Kaufmann Publishers, Inc.
Pages150-157
Number of pages8
ISBN (Electronic)1558603778, 9781558603776
StatePublished - 1995
Event12th International Conference on Machine Learning, ICML 1995 - Tahoe City, United States
Duration: 9 Jul 199512 Jul 1995

Publication series

NameProceedings of the 12th International Conference on Machine Learning, ICML 1995

Conference

Conference12th International Conference on Machine Learning, ICML 1995
Country/TerritoryUnited States
CityTahoe City
Period9/07/9512/07/95

Bibliographical note

Publisher Copyright:
© ICML 1995.All rights reserved

Fingerprint

Dive into the research topics of 'Committee-Based Sampling For Training Probabilistic Classifiers'. Together they form a unique fingerprint.

Cite this