GB-AFS: graph-based automatic feature selection for multi-class classification via Mean Simplified Silhouette

David Levin, Gonen Singer

Research output: Contribution to journalArticlepeer-review

Abstract

This paper introduces a novel graph-based filter method for automatic feature selection (abbreviated as GB-AFS) for multi-class classification tasks. The method determines the minimum combination of features required to sustain prediction performance while maintaining complementary discriminating abilities between different classes. It does not require any user-defined parameters such as the number of features to select. The minimum number of features is selected using our newly developed Mean Simplified Silhouette (abbreviated as MSS) index, designed to evaluate the clustering results for the feature selection task. To illustrate the effectiveness and generality of the method, we applied the GB-AFS method using various combinations of statistical measures and dimensionality reduction techniques. The experimental results demonstrate the superior performance of the proposed GB-AFS over other filter-based techniques and automatic feature selection approaches, and demonstrate that the GB-AFS method is independent of the statistical measure or the dimensionality reduction technique chosen by the user. Moreover, the proposed method maintained the accuracy achieved when utilizing all features while using only 7–30% of the original features. This resulted in an average time saving ranging from 15% for the smallest dataset to 70% for the largest. Our code is available at https://github.com/davidlevinwork/gbfs/.

Original languageEnglish
Article number79
JournalJournal of Big Data
Volume11
Issue number1
DOIs
StatePublished - Dec 2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Keywords

  • Graph-based feature selection
  • Multi-class feature selection
  • Nonlinear dimensionality reduction
  • Silhouette

Fingerprint

Dive into the research topics of 'GB-AFS: graph-based automatic feature selection for multi-class classification via Mean Simplified Silhouette'. Together they form a unique fingerprint.

Cite this