TY - JOUR

T1 - Active learning using smooth relative regret approximations with applications

AU - Ailon, Nir

AU - Begleiter, Ron

AU - Ezra, Esther

PY - 2014/3

Y1 - 2014/3

N2 - The disagreement coefficient of Hanneke has become a central data independent invariant in proving active learning rates. It has been shown in various ways that a concept class with low complexity together with a bound on the disagreement coefficient at an optimal solution allows active learning rates that are superior to passive learning ones. We present a different tool for pool based active learning which follows from the existence of a certain uniform version of low disagreement coefficient, but is not equivalent to it. In fact, we present two fundamental active learning problems of significant interest for which our approach allows nontrivial active learning bounds. However, any general purpose method relying on the disagreement coefficient bounds only, fails to guarantee any useful bounds for these problems. The applications of interest are: Learning to rank from pairwise preferences, and clustering with side information (a.k.a. semi-supervised clustering). The tool we use is based on the learner's ability to compute an estimator of the difference between the loss of any hypothesis and some fixed \pivotal" hypothesis to within an absolute error of at most " times the disagreement measure (ℓ1 distance) between the two hypotheses. We prove that such an estimator implies the existence of a learning algorithm which, at each iteration, reduces its in-class excess risk to within a constant factor. Each iteration replaces the current pivotal hypothesis with the minimizer of the estimated loss difference function with respect to the previous pivotal hypothesis. The label complexity essentially becomes that of computing this estimator.

AB - The disagreement coefficient of Hanneke has become a central data independent invariant in proving active learning rates. It has been shown in various ways that a concept class with low complexity together with a bound on the disagreement coefficient at an optimal solution allows active learning rates that are superior to passive learning ones. We present a different tool for pool based active learning which follows from the existence of a certain uniform version of low disagreement coefficient, but is not equivalent to it. In fact, we present two fundamental active learning problems of significant interest for which our approach allows nontrivial active learning bounds. However, any general purpose method relying on the disagreement coefficient bounds only, fails to guarantee any useful bounds for these problems. The applications of interest are: Learning to rank from pairwise preferences, and clustering with side information (a.k.a. semi-supervised clustering). The tool we use is based on the learner's ability to compute an estimator of the difference between the loss of any hypothesis and some fixed \pivotal" hypothesis to within an absolute error of at most " times the disagreement measure (ℓ1 distance) between the two hypotheses. We prove that such an estimator implies the existence of a learning algorithm which, at each iteration, reduces its in-class excess risk to within a constant factor. Each iteration replaces the current pivotal hypothesis with the minimizer of the estimated loss difference function with respect to the previous pivotal hypothesis. The label complexity essentially becomes that of computing this estimator.

KW - Active learning

KW - Clustering with side information

KW - Disagreement coefficient

KW - Learning to rank from pairwise preferences

KW - Semi-supervised clustering

KW - Smooth relative regret approximation

UR - http://www.scopus.com/inward/record.url?scp=84899819568&partnerID=8YFLogxK

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:84899819568

SN - 1532-4435

VL - 15

SP - 885

EP - 920

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

ER -