ChiTaH: A fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data

Rajesh Detroja, Alessandro Gorohovski, Olawumi Giwa, Gideon Baum, Milana Frenkel-Morgenstern

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first 'reference-based' approach termed ChiTaH (Chimeric Transcripts from High-throughput sequencing data). ChiTaH uses 43,466 non-redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally.

Original languageEnglish
Article numberlqab112
JournalNAR Genomics and Bioinformatics
Volume3
Issue number4
DOIs
StatePublished - 1 Dec 2021

Bibliographical note

Publisher Copyright:
© 2021 The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

Fingerprint

Dive into the research topics of 'ChiTaH: A fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data'. Together they form a unique fingerprint.

Cite this