TY - JOUR
T1 - ChiTaH
T2 - A fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data
AU - Detroja, Rajesh
AU - Gorohovski, Alessandro
AU - Giwa, Olawumi
AU - Baum, Gideon
AU - Frenkel-Morgenstern, Milana
N1 - Publisher Copyright:
© 2021 The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first 'reference-based' approach termed ChiTaH (Chimeric Transcripts from High-throughput sequencing data). ChiTaH uses 43,466 non-redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally.
AB - Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first 'reference-based' approach termed ChiTaH (Chimeric Transcripts from High-throughput sequencing data). ChiTaH uses 43,466 non-redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally.
UR - http://www.scopus.com/inward/record.url?scp=85123224769&partnerID=8YFLogxK
U2 - 10.1093/nargab/lqab112
DO - 10.1093/nargab/lqab112
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 34859212
AN - SCOPUS:85123224769
SN - 2631-9268
VL - 3
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
IS - 4
M1 - lqab112
ER -