Adaptive regularization for weight matrices

Koby Crammer, Gal Chechik

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Algorithms for learning distributions over weight-vectors, such as AROW (Crammer et al., 2009) were recently shown empirically to achieve state-of-the-art performance at various problems, with strong theoretical guaranties. Extending these algorithms to matrix models pose challenges since the number of free parameters in the covariance of the distribution scales as n 4 with the dimension n of the matrix, and n tends to be large in real applications. We describe, analyze and experiment with two new algorithms for learning distribution of matrix models. Our first algorithm maintains a diagonal covariance over the parameters and can handle large covariance matrices. The second algorithm factors the covariance to capture inter-features correlation while keeping the number of parameters linear in the size of the original matrix. We analyze both algorithms in the mistake bound model and show a superior precision performance of our approach over other algorithms in two tasks: retrieving similar images, and ranking similar documents. The factored algorithm is shown to attain faster convergence rate.

Original languageEnglish
Title of host publicationProceedings of the 29th International Conference on Machine Learning, ICML 2012
Pages425-432
Number of pages8
StatePublished - 2012
Event29th International Conference on Machine Learning, ICML 2012 - Edinburgh, United Kingdom
Duration: 26 Jun 20121 Jul 2012

Publication series

NameProceedings of the 29th International Conference on Machine Learning, ICML 2012
Volume1

Conference

Conference29th International Conference on Machine Learning, ICML 2012
Country/TerritoryUnited Kingdom
CityEdinburgh
Period26/06/121/07/12

Bibliographical note

Funding Information:
This work was partially supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20068. Deyu Meng was partially supported by the China NSFC project under contract 61373114. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation grant number OCI-1053575. It used the Blacklight system at the Pittsburgh Supercomputing Center (PSC).

Funding

This work was partially supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20068. Deyu Meng was partially supported by the China NSFC project under contract 61373114. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation grant number OCI-1053575. It used the Blacklight system at the Pittsburgh Supercomputing Center (PSC).

FundersFunder number
Department of Interior National Business CenterD11PC20068
Pittsburgh Supercomputing Center
National Science FoundationOCI-1053575
Intelligence Advanced Research Projects Activity
National Natural Science Foundation of China61373114

    Fingerprint

    Dive into the research topics of 'Adaptive regularization for weight matrices'. Together they form a unique fingerprint.

    Cite this