Abstract
Motivation: Rapid spread of viral diseases such as Coronavirus disease 2019 (COVID-19) highlights an urgent need for efficient surveillance of virus mutation and transmission dynamics, which requires fast, inexpensive and accurate viral lineage assignment. The first two goals might be achieved through low-coverage whole-genome sequencing (LC-WGS) which enables rapid genome sequencing at scale and at reduced costs. Unfortunately, LC-WGS significantly diminishes the genomic details, rendering accurate lineage assignment very challenging. Results: We present ViTAL, a novel deep learning algorithm specifically designed to perform lineage assignment of low coverage-sequenced genomes. ViTAL utilizes a combination of MinHash for genomic feature extraction and Vision Transformer for fine-grain genome classification and lineage assignment. We show that ViTAL outperforms state-of-the-art tools across diverse coverage levels, reaching up to 87.7% lineage assignment accuracy at 1x coverage where state-of-the-art tools such as UShER and Kraken2 achieve the accuracy of 5.4% and 27.4% respectively. ViTAL achieves comparable accuracy results with up to 8x lower coverage than state-of-the-art tools. We explore ViTAL’s ability to identify the lineages of novel genomes, i.e. genomes the Vision Transformer was not trained on. We show how ViTAL can be applied to preliminary phylogenetic placement of novel variants.
| Original language | English |
|---|---|
| Article number | btae093 |
| Journal | Bioinformatics |
| Volume | 40 |
| Issue number | 3 |
| DOIs | |
| State | Published - 4 Mar 2024 |
Bibliographical note
Publisher Copyright:# The Author(s) 2024. Published by Oxford University Press.
Funding
This work was partially supported by the European Union’s Horizon programme for research and innovation [101047160—BioPIM].
| Funders | Funder number |
|---|---|
| European Union’s Horizon programme for research and innovation | 101047160 |