A Multi-Objective Genetic Algorithm for Outlier Removal

Oren E. Nahum, Abraham Yosipof, Hanoch Senderowitz

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Quantitative structure activity relationship (QSAR) or quantitative structure property relationship (QSPR) models are developed to correlate activities for sets of compounds with their structure-derived descriptors by means of mathematical models. The presence of outliers, namely, compounds that differ in some respect from the rest of the data set, compromise the ability of statistical methods to derive QSAR models with good prediction statistics. Hence, outliers should be removed from data sets prior to model derivation. Here we present a new multi-objective genetic algorithm for the identification and removal of outliers based on the k nearest neighbors (kNN) method. The algorithm was used to remove outliers from three different data sets of pharmaceutical interest (logBBB, factor 7 inhibitors, and dihydrofolate reductase inhibitors), and its performances were compared with those of five other methods for outlier removal. The results suggest that the new algorithm provides filtered data sets that (1) better maintain the internal diversity of the parent data sets and (2) give rise to QSAR models with much better prediction statistics. Equally good filtered data sets in terms of these metrics were obtained when another objective function was added to the algorithm (termed "preservation"), forcing it to remove certain compounds with low probability only. This option is highly useful when specific compounds should be preferably kept in the final data set either because they have favorable activities or because they represent interesting molecular scaffolds. We expect this new algorithm to be useful in future QSAR applications.

Original languageEnglish
Pages (from-to)2507-2518
Number of pages12
JournalJournal of Chemical Information and Modeling
Volume55
Issue number12
DOIs
StatePublished - 28 Dec 2015

Bibliographical note

Publisher Copyright:
© 2015 American Chemical Society.

Fingerprint

Dive into the research topics of 'A Multi-Objective Genetic Algorithm for Outlier Removal'. Together they form a unique fingerprint.

Cite this