TY - JOUR
T1 - Identifying Regulatory Elements via Deep Learning
AU - Barshai, Mira
AU - Tripto, Eitamar
AU - Orenstein, Yaron
N1 - Publisher Copyright:
© Annual Review of Biomedical Data Science.
PY - 2020/7/20
Y1 - 2020/7/20
N2 - Deep neural networks have been revolutionizing the field of machine learning for the past several years. They have been applied with great success in many domains of the biomedical data sciences and are outperforming extant methods by a large margin. The ability of deep neural networks to pick up local image features and model the interactions between them makes them highly applicable to regulatory genomics. Instead of an image, the networks analyze DNA and RNA sequences and additional epigenomic data. In this review, we survey the successes of deep learning in the field of regulatory genomics. We first describe the fundamental building blocks of deep neural networks, popular architectures used in regulatory genomics, and their training process on molecular sequence data. We then review several key methods in different gene regulation domains. We start with the pioneering method DeepBind and its successors, which were developed to predict protein-DNA binding. We then review methods developed to predict and model epigenetic information, such as histone marks and nucleosome occupancy. Following epigenomics, we review methods to predict protein-RNA binding with its unique challenge of incorporating RNA structure information. Finally, we provide our overall view of the strengths and weaknesses of deep neural networks and prospects for future developments.
AB - Deep neural networks have been revolutionizing the field of machine learning for the past several years. They have been applied with great success in many domains of the biomedical data sciences and are outperforming extant methods by a large margin. The ability of deep neural networks to pick up local image features and model the interactions between them makes them highly applicable to regulatory genomics. Instead of an image, the networks analyze DNA and RNA sequences and additional epigenomic data. In this review, we survey the successes of deep learning in the field of regulatory genomics. We first describe the fundamental building blocks of deep neural networks, popular architectures used in regulatory genomics, and their training process on molecular sequence data. We then review several key methods in different gene regulation domains. We start with the pioneering method DeepBind and its successors, which were developed to predict protein-DNA binding. We then review methods developed to predict and model epigenetic information, such as histone marks and nucleosome occupancy. Following epigenomics, we review methods to predict protein-RNA binding with its unique challenge of incorporating RNA structure information. Finally, we provide our overall view of the strengths and weaknesses of deep neural networks and prospects for future developments.
KW - deep learning
KW - gene regulation
KW - motif finding
KW - regulatory genomics
UR - http://www.scopus.com/inward/record.url?scp=85109025579&partnerID=8YFLogxK
U2 - 10.1146/annurev-biodatasci-022020-021940
DO - 10.1146/annurev-biodatasci-022020-021940
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.systematicreview???
AN - SCOPUS:85109025579
SN - 2574-3414
VL - 3
SP - 315
EP - 338
JO - Annual review of biomedical data science
JF - Annual review of biomedical data science
ER -