Abstract
Algorithms for learning distributions over weight-vectors, such as AROW (Crammer et al., 2009) were recently shown empirically to achieve state-of-the-art performance at various problems, with strong theoretical guaranties. Extending these algorithms to matrix models pose challenges since the number of free parameters in the covariance of the distribution scales as n 4 with the dimension n of the matrix, and n tends to be large in real applications. We describe, analyze and experiment with two new algorithms for learning distribution of matrix models. Our first algorithm maintains a diagonal covariance over the parameters and can handle large covariance matrices. The second algorithm factors the covariance to capture inter-features correlation while keeping the number of parameters linear in the size of the original matrix. We analyze both algorithms in the mistake bound model and show a superior precision performance of our approach over other algorithms in two tasks: retrieving similar images, and ranking similar documents. The factored algorithm is shown to attain faster convergence rate.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 29th International Conference on Machine Learning, ICML 2012 |
| Pages | 425-432 |
| Number of pages | 8 |
| State | Published - 2012 |
| Event | 29th International Conference on Machine Learning, ICML 2012 - Edinburgh, United Kingdom Duration: 26 Jun 2012 → 1 Jul 2012 |
Publication series
| Name | Proceedings of the 29th International Conference on Machine Learning, ICML 2012 |
|---|---|
| Volume | 1 |
Conference
| Conference | 29th International Conference on Machine Learning, ICML 2012 |
|---|---|
| Country/Territory | United Kingdom |
| City | Edinburgh |
| Period | 26/06/12 → 1/07/12 |
Bibliographical note
Funding Information:This work was partially supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20068. Deyu Meng was partially supported by the China NSFC project under contract 61373114. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation grant number OCI-1053575. It used the Blacklight system at the Pittsburgh Supercomputing Center (PSC).
Funding
This work was partially supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20068. Deyu Meng was partially supported by the China NSFC project under contract 61373114. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation grant number OCI-1053575. It used the Blacklight system at the Pittsburgh Supercomputing Center (PSC).
| Funders | Funder number |
|---|---|
| Department of Interior National Business Center | D11PC20068 |
| Pittsburgh Supercomputing Center | |
| National Science Foundation | OCI-1053575 |
| Intelligence Advanced Research Projects Activity | |
| National Natural Science Foundation of China | 61373114 |