ViTAL: Vision TrAnsformer based Low coverage SARS-CoV-2 lineage assignment

Zuher Jahshan, Leonid Yavits

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation: Rapid spread of viral diseases such as Coronavirus disease 2019 (COVID-19) highlights an urgent need for efficient surveillance of virus mutation and transmission dynamics, which requires fast, inexpensive and accurate viral lineage assignment. The first two goals might be achieved through low-coverage whole-genome sequencing (LC-WGS) which enables rapid genome sequencing at scale and at reduced costs. Unfortunately, LC-WGS significantly diminishes the genomic details, rendering accurate lineage assignment very challenging. Results: We present ViTAL, a novel deep learning algorithm specifically designed to perform lineage assignment of low coverage-sequenced genomes. ViTAL utilizes a combination of MinHash for genomic feature extraction and Vision Transformer for fine-grain genome classification and lineage assignment. We show that ViTAL outperforms state-of-the-art tools across diverse coverage levels, reaching up to 87.7% lineage assignment accuracy at 1x coverage where state-of-the-art tools such as UShER and Kraken2 achieve the accuracy of 5.4% and 27.4% respectively. ViTAL achieves comparable accuracy results with up to 8x lower coverage than state-of-the-art tools. We explore ViTAL’s ability to identify the lineages of novel genomes, i.e. genomes the Vision Transformer was not trained on. We show how ViTAL can be applied to preliminary phylogenetic placement of novel variants.

Original languageEnglish
Article numberbtae093
JournalBioinformatics
Volume40
Issue number3
DOIs
StatePublished - 4 Mar 2024

Bibliographical note

Publisher Copyright:
# The Author(s) 2024. Published by Oxford University Press.

Fingerprint

Dive into the research topics of 'ViTAL: Vision TrAnsformer based Low coverage SARS-CoV-2 lineage assignment'. Together they form a unique fingerprint.

Cite this