Canonical Correlation Analysis (CCA) models are powerful for studying the associations between two sets of variables. The canonically correlated representations, termed canonical variates are widely used in unsupervised learning to analyze unlabeled multi-modal registered datasets. Despite their success, CCA models may break (or overfit) if the number of variables in either of the modalities exceeds the number of samples. Moreover, often a significant fraction of the variables measures modality-specific information, and thus removing them is beneficial for identifying the canonically correlated variates. Here, we propose ℓ0-CCA, a method for learning correlated representations based on sparse subsets of variables from two observed modalities. Sparsity is obtained by multiplying the input variables by stochastic gates, whose parameters are learned together with the CCA weights via an ℓ0-regularized correlation loss. We further propose ℓ0-Deep CCA for solving the problem of non-linear sparse CCA by modeling the correlated representations using deep nets. We demonstrate the efficacy of the method using several synthetic and real examples. Most notably, by gating nuisance input variables, our approach improves the extracted representations compared to other linear, non-linear and sparse CCA-based models.
|State||Published - 2022|
|Event||10th International Conference on Learning Representations, ICLR 2022 - Virtual, Online|
Duration: 25 Apr 2022 → 29 Apr 2022
|Conference||10th International Conference on Learning Representations, ICLR 2022|
|Period||25/04/22 → 29/04/22|
Bibliographical noteFunding Information:
The work of YK was supported by the National Institutes of Health R01GM131642, UM1PA05141, U54AG076043, P50CA121974, and U01DA053628.
© 2022 ICLR 2022 - 10th International Conference on Learning Representationss. All rights reserved.