Skip to main navigation Skip to search Skip to main content

A cluster robustness score for identifying cell subpopulations in single cell gene expression datasets from heterogeneous tissues and tumors

  • Columbia University

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Motivation A major aim of single cell biology is to identify important cell types such as stem cells in heterogeneous tissues and tumors. This is typically done by isolating hundreds of individual cells and measuring expression levels of multiple genes simultaneously from each cell. Then, clustering algorithms are used to group together similar single-cell expression profiles into clusters, each representing a distinct cell type. However, many of these clusters result from overfitting, meaning that rather than representing biologically meaningful cell types, they describe the intrinsic 'noise' in gene expression levels due to limitations in experimental precision or the intrinsic randomness of biochemical cellular processes. Consequentially, these non-meaningful clusters are most sensitive to noise: A slight shift in gene expression levels due to a repeated measurement will rearrange the grouping of data points such that these clusters break up. Results To identify the biologically meaningful clusters we propose a 'cluster robustness score': We add increasing amounts of noise (zero mean and increasing variance) and check which clusters are most robust in the sense that they do not mix with their neighbors up to high levels of noise. We show that biologically meaningful cell clusters that were manually identified in previously published single cell expression datasets have high robustness scores. These scores are higher than what would be expected in corresponding randomized homogeneous datasets having the same expression level statistics. We believe that this scoring system provides a more automated way to identify cell types in heterogeneous tissues and tumors. Supplementary informationSupplementary dataare available at Bioinformatics online.

Original languageEnglish
Pages (from-to)962-971
Number of pages10
JournalBioinformatics
Volume35
Issue number6
DOIs
StatePublished - 15 Mar 2019

Bibliographical note

Publisher Copyright:
© 2018 The Author(s). Published by Oxford University Press. All rights reserved.

Funding

T.K. and I.K. are supported by the Israel Science Foundation ([ICORE number 1902/12] and [Grants numbers 1634/13 and 2017/13]), the Israel Cancer Association [Grant no. 20150911], the Israel Ministry of Health [Grant number 3-10146] and the EU-FP7 [Marie Curie International Reintegration Grant no. 618592]. P.D. is supported by a Runyon-Rachleff Innovator Award [Grant number DRR-44-16; Island Outreach Foundation] from the Damon Runyon Cancer Research Foundation, by the Schaefer Research Scholars Program (2017) of Columbia University’s College of Physicians and Surgeons (Dr. Ludwig Schaefer Fund), and by a research grant (CU16-1797) from the Adenoid Cystic Carcinoma Research Foundation (ACCRF). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

FundersFunder number
Columbia University’s College of Physicians and SurgeonsCU16-1797
EU-FP7618592, DRR-44-16
Island Outreach Foundation
Israel Ministry of Health3-10146
Schaefer Research Scholars Program
Damon Runyon Cancer Research Foundation
Adenoid Cystic Carcinoma Research Foundation
Israel Cancer Association20150911
Israel Science Foundation2017/13, 1634/13, 1902/12

    Fingerprint

    Dive into the research topics of 'A cluster robustness score for identifying cell subpopulations in single cell gene expression datasets from heterogeneous tissues and tumors'. Together they form a unique fingerprint.

    Cite this