Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay

Dustin Shigaki, Orit Adato, Aashish N. Adhikari, Shengcheng Dong, Alex Hawkins-Hooker, Fumitaka Inoue, Tamar Juven-Gershon, Henry Kenlay, Beth Martin, Ayoti Patra, Dmitry D. Penzar, Max Schubach, Chenling Xiong, Zhongxia Yan, Alan P. Boyle, Anat Kreimer, Ivan V. Kulakovskiy, John Reid, Ron Unger, Nir YosefJay Shendure, Nadav Ahituv, Martin Kircher, Michael A. Beer

Research output: Contribution to journalArticlepeer-review

34 Scopus citations

Abstract

The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.

Original languageEnglish
Pages (from-to)1280-1291
Number of pages12
JournalHuman Mutation
Volume40
Issue number9
Early online date20 May 2019
DOIs
StatePublished - 1 Sep 2019

Bibliographical note

Publisher Copyright:
© 2019 Wiley Periodicals, Inc.

Funding

M.B., D.S., and A.P. are supported by NIH R01 HG007348 and NIH U01 HG009380. I.V.K. is supported by RFBR 18-34-20024. The CAGI experiment coordination is supported by NIH U41 HG007346 and the CAGI conference by NIH R13 HG006650. M.B., D.S., and A.P. are supported by NIH R01 HG007348 and NIH U01 HG009380. I.V.K. is supported by RFBR 18‐34‐20024. The CAGI experiment coordination is supported by NIH U41 HG007346 and the CAGI conference by NIH R13 HG006650.

FundersFunder number
CAGI
NIH R13 HG006650R13 HG006650
NIH U01 HG009380U01 HG009380
NIH U41 HG007346U41 HG007346
RFBR 18-34-20024
National Institutes of Health
National Human Genome Research InstituteR01HG007348
Russian Foundation for Basic Research18‐34‐20024

    Keywords

    • MPRA
    • enhancers
    • gene regulation
    • machine learning
    • promoters
    • regulatory variation

    Fingerprint

    Dive into the research topics of 'Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay'. Together they form a unique fingerprint.

    Cite this