Deep unsupervised feature selection by discarding nuisance and correlated features

Uri Shaham, Ofir Lindenbaum, Jonathan Svirsky, Yuval Kluger

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

Modern datasets often contain large subsets of correlated features and nuisance features, which are not or loosely related to the main underlying structures of the data. Nuisance features can be identified using the Laplacian score criterion, which evaluates the importance of a given feature via its consistency with the Graph Laplacians’ leading eigenvectors. We demonstrate that in the presence of large numbers of nuisance features, the Laplacian must be computed on the subset of selected features rather than on the complete feature set. To do this, we propose a fully differentiable approach for unsupervised feature selection, utilizing the Laplacian score criterion to avoid the selection of nuisance features. We employ an autoencoder architecture to cope with correlated features, trained to reconstruct the data from the subset of selected features. Building on the recently proposed concrete layer that allows controlling for the number of selected features via architectural design, simplifying the optimization process. Experimenting on several real-world datasets, we demonstrate that our proposed approach outperforms similar approaches designed to avoid only correlated or nuisance features, but not both. Several state-of-the-art clustering results are reported. Our code is publically available at https://github.com/jsvir/lscae.

Original languageEnglish
Pages (from-to)34-43
Number of pages10
JournalNeural Networks
Volume152
DOIs
StatePublished - Aug 2022

Bibliographical note

Publisher Copyright:
© 2022 Elsevier Ltd

Funding

YK work was supported by NIH grants R01RGM131642 , UM1DA051410 , U54AG076043 , U01DA053628 , R01GM135928 and P50CA121974 .

FundersFunder number
National Institutes of HealthUM1DA051410, R01RGM131642, P50CA121974, R01GM135928, U01DA053628
National Institute on AgingU54AG076043

    Keywords

    • Concrete layer
    • Laplacian score
    • Unsupervised feature selection

    Fingerprint

    Dive into the research topics of 'Deep unsupervised feature selection by discarding nuisance and correlated features'. Together they form a unique fingerprint.

    Cite this