TY - JOUR
T1 - k-Mixup Regularization for Deep Learning via Optimal Transport
AU - Greenewald, Kristjan
AU - Gu, Anming
AU - Yurochkin, Mikhail
AU - Solomon, Justin
AU - Chien, Edward
N1 - Publisher Copyright:
© 2023, Transactions on Machine Learning Research. All rights reserved.
PY - 2023/9/1
Y1 - 2023/9/1
N2 - Mixup is a popular regularization technique for training deep neural networks that improves generalization and increases robustness to certain distribution shifts. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To better leverage the structure of the data, we extend mixup in a simple, broadly applicable way to k-mixup, which perturbs k-batches of training points in the direction of other kbatches. The perturbation is done with displacement interpolation, i.e. interpolation under the Wasserstein metric. We demonstrate theoretically and in simulations that k-mixup preserves cluster and manifold structures, and we extend theory studying the efficacy of standard mixup to the k-mixup case. Our empirical results show that training with k-mixup further improves generalization and robustness across several network architectures and benchmark datasets of differing modalities. For the wide variety of real datasets considered, the performance gains of k-mixup over standard mixup are similar to or larger than the gains of mixup itself over standard ERM after hyperparameter optimization. In several instances, in fact, k-mixup achieves gains in settings where standard mixup has negligible to zero improvement over ERM.
AB - Mixup is a popular regularization technique for training deep neural networks that improves generalization and increases robustness to certain distribution shifts. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To better leverage the structure of the data, we extend mixup in a simple, broadly applicable way to k-mixup, which perturbs k-batches of training points in the direction of other kbatches. The perturbation is done with displacement interpolation, i.e. interpolation under the Wasserstein metric. We demonstrate theoretically and in simulations that k-mixup preserves cluster and manifold structures, and we extend theory studying the efficacy of standard mixup to the k-mixup case. Our empirical results show that training with k-mixup further improves generalization and robustness across several network architectures and benchmark datasets of differing modalities. For the wide variety of real datasets considered, the performance gains of k-mixup over standard mixup are similar to or larger than the gains of mixup itself over standard ERM after hyperparameter optimization. In several instances, in fact, k-mixup achieves gains in settings where standard mixup has negligible to zero improvement over ERM.
UR - http://www.scopus.com/inward/record.url?scp=86000066092&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:86000066092
SN - 2835-8856
VL - 2023
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -