Skip to main navigation Skip to search Skip to main content

Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd

  • Zichen Wang
  • , Caroline D. Monteiro
  • , Kathleen M. Jagodnik
  • , Nicolas F. Fernandez
  • , Gregory W. Gundersen
  • , Andrew D. Rouillard
  • , Sherry L. Jenkins
  • , Axel S. Feldmann
  • , Kevin S. Hu
  • , Michael G. McDermott
  • , Qiaonan Duan
  • , Neil R. Clark
  • , Matthew R. Jones
  • , Yan Kou
  • , Troy Goff
  • , Holly Woodland
  • , Fabio M.R. Amaral
  • , Gregory L. Szeto
  • , Oliver Fuchs
  • , Sophia M. Schüssler-Fiorenza Rose
  • Shvetank Sharma, Uwe Schwartz, Xabier Bengoetxea Bausela, Maciej Szymkiewicz, Vasileios Maroulis, Anton Salykin, Carolina M. Barra, Candice D. Kruth, Nicholas J. Bongio, Vaibhav Mathur, Radmila D. Todoric, Udi E. Rubin, Apostolos Malatras, Carl T. Fulp, John A. Galindo, Ruta Motiejunaite, Christoph Jüschke, Philip C. Dishuck, Katharina Lahl, Mohieddin Jafari, Sara Aibar, Apostolos Zaravinos, Linda H. Steenhuizen, Lindsey R. Allison, Pablo Gamallo, Fernando De Andres Segura, Tyler Dae Devlin, Vicente Pérez-García, Avi Ma'ayan
  • Icahn School of Medicine at Mount Sinai
  • NASA Glenn Research Center
  • Baylor College of Medicine
  • the Fairway
  • University of Nottingham
  • Massachusetts Institute of Technology
  • Ludwig Maximilian University of Munich
  • Department of Veterans Affairs
  • Stanford University
  • Institute of Liver and Biliary Sciences
  • University of Regensburg
  • University of Navarra
  • Polish Academy of Sciences
  • Masaryk University
  • Hospital del Mar
  • Shenandoah University
  • IBM
  • Columbia University
  • Centre de Recherche en Myologie
  • Universidad Nacional de Colombia
  • Brigham and Women’s Hospital
  • University of Oldenburg
  • Technical University of Denmark
  • Pasteur Institute of Iran
  • Institute for Researches in Fundamental Sciences
  • Universidad de Salamanca
  • Karolinska Institutet
  • European University Cyprus
  • University of Extremadura
  • CSIC - National Center for Biotechnology

Research output: Contribution to journalArticlepeer-review

211 Scopus citations

Abstract

Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.

Original languageEnglish
Article number12846
JournalNature Communications
Volume7
DOIs
StatePublished - 26 Sep 2016
Externally publishedYes

Bibliographical note

Publisher Copyright:
© The Author(s) 2016.

Funding

This work is supported by NIH grants: R01GM098316, U54HL127624 and U54CA189201 to A.M

FundersFunder number
National Institutes of HealthU54CA189201, U54HL127624, R01GM098316

    UN SDGs

    This output contributes to the following UN Sustainable Development Goals (SDGs)

    1. SDG 3 - Good Health and Well-being
      SDG 3 Good Health and Well-being

    Fingerprint

    Dive into the research topics of 'Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd'. Together they form a unique fingerprint.

    Cite this