Abstract
Real-time genome detection, classification and lineage assignment are critical for efficient tracking of emerging mutations and variants during viral pandemics such as Covid-19. For genomic surveillance to work effectively, each new viral genome sequence must be quickly and accurately associated with an existing viral family (lineage). ViRAL is a hardware-accelerated platform for real-time viral genome lineage assignment based on minhashing and Vision Transformer. Minhashing is a locality sensitive hashing based technique for finding regions of similarity within sequenced genomes. Vision Transformer is a model for image classification that employs a Transformer-like architecture over patches of images. In ViRAL, such image patches are genome fragments extracted from the regions of high similarity. ViRAL is especially efficient in lineage assignment of extremely low quality (or highly ambiguous) genomic data, i.e. when a large fraction of DNA bases are missing in an assembled genome. We implement ViRAL on CPU, GPU and a custom-designed hardware accelerator denoted ACMI. ViRAL assigns newly sequenced SARS-CoV-2 genomes to existing lineages with the top-1 accuracy of 94.2%. The probability of the correct assignment to be found among the five most likely placements generated by ViRAL (top-5 accuracy) is 99.8%. Accelerated ViRAL outperforms the fastest state-of-the-art assignment tools by 69.4×. It also outperforms ViRAL GPU implementation by 19.5×. ViRAL strongly outperforms the state-of-the-art solutions in assigning highly-ambiguous genomes: while state-of-the-art tools fail to assign lineage to genomes with 50% ambiguity, ViRAL achieves 77.6% assignment accuracy. We make ViRAL available to the research community through GitHub.
Original language | English |
---|---|
Pages (from-to) | 28353-28368 |
Number of pages | 16 |
Journal | IEEE Access |
Volume | 12 |
DOIs | |
State | Published - 2024 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
Funding
This work was supported in part by the European Union's Horizon Europe Programme for Research and Innovation under Grant 101047160, in part by the Israeli Ministry of Science and Technology under Lise Meitner Grant for Israeli-Swedish Research Collaboration under Grant 1001569396, and in part by the Israeli Ministry of Science and Technology Grant for Groundbreaking Research under Grant 1001702600. The work of Esteban Garzón was supported by the Italian Ministry for Universities and Research (MUR) under the Call ''Horizon Europe (2021-2027) Programme'' under Grant H25F21001420001.
Funders | Funder number |
---|---|
European Union's Horizon Europe Programme for Research and Innovation | 101047160 |
HORIZON EUROPE Framework Programme | H25F21001420001, 2021-2027 |
Ministero dell’Istruzione, dell’Università e della Ricerca | |
Ministry of science and technology, Israel | 1001569396, 1001702600 |
Keywords
- SARS-CoV-2
- Vision transformer
- accelerator
- genome
- transformers
- viral pathogens