Skip to main navigation Skip to search Skip to main content

Comprehensive analysis of RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues,

  • E. Levanon
  • , S. Mangul
  • , H. T. Yang
  • , N. Strauli
  • , F. Gruhl
  • , H. Porath
  • , K. Hsieh
  • , L. Chen
  • , Timothy Daley
  • , Stephanie Christenson
  • , Agata Wesolowska-Andersen
  • , Roberto Spreafico
  • , Cydney Rios
  • , Celeste Eng
  • , Andrew D. Smith
  • , Ryan D. Hernandez
  • , Roel A. Ophoff
  • , Jose Rodriguez Santana
  • , Prescott G. Woodruff
  • , Esteban Burchard
  • Max A. Seibold, Sagiv Shifman, Eleazar Eskin, Noah Zaitlen
  • University of Southern California
  • University of California at San Francisco
  • University of Colorado Denver
  • University of California at Los Angeles
  • Utrecht University
  • Centro de Neumología Pediátrica
  • Centro de Neumología Pediátrica
  • University of Colorado Anschutz Medical Campus
  • Hebrew University of Jerusalem
  • University of California at San Diego

Research output: Working paper / PreprintPreprint

Abstract

High throughput RNA sequencing technologies have provided invaluable research opportunities across distinct scientific domains by producing quantitative readouts of the transcriptional activity of both entire cellular populations and single cells. The majority of RNA-Seq analyses begin by mapping each experimentally produced sequence (i.e., read) to a set of annotated reference sequences for the organism of interest. For both biological and technical reasons, a significant fraction of reads remains unmapped. In this work, we develop Read Origin Protocol (ROP) to discover the source of all reads originating from complex RNA molecules, recombinant T and B cell receptors, and microbial communities. We applied ROP to 8,641 samples across 630 individuals from 54 tissues. A fraction of RNA-Seq data (n=86) was obtained in-house; the remaining data was obtained from the Genotype-Tissue Expression (GTEx v6) project. To generalize the reported number of accounted reads, we also performed ROP analysis on thousands of different, randomly selected, and publicly available RNA-Seq samples in the Sequence Read Archive (SRA). Our approach can account for 99.9% of 1 trillion reads of various read length across the merged dataset (n=10641). Using in-house RNA-Seq data, we show that immune profiles of asthmatic individuals are significantly different from the profiles of control individuals, with decreased average per sample T and B cell receptor diversity. We also show that immune diversity is inversely correlated with microbial load. Our results demonstrate the potential of ROP to exploit unmapped reads in order to better understand the functional mechanisms underlying connections between the immune system, microbiome, human gene expression, and disease etiology. ROP is freely available at https://github.com/smangul1/rop and currently supports human and mouse RNA-Seq reads.
Original languageEnglish
Number of pages41
Volume53041
StatePublished - 12 Jun 2017

Publication series

NamebioRxiv,

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Fingerprint

Dive into the research topics of 'Comprehensive analysis of RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues,'. Together they form a unique fingerprint.

Cite this