Abstract
High-throughput sequencing (HTS) is the most established technique to measure transcript abundance. HTS reads often contain uncertain or low-quality base calls that introduce ambiguity in determining the underlying sequence. In many applications, these unresolved nucleotides are handled by looking at the consensus sequence of all HTS reads. However, this approach is not applicable where sequence heterogeneity is of biological relevance. To gauge the biological complexity of a set of HTS reads in face of unresolved base calls, one may apply the parsimony principle, i.e., find a smallest set of sequences that cover all ambiguous reads. But, no method to date solves this problem optimally. Here, we present FiSSC, a new method to find a smallest sequence cover of a set of ambiguous reads. We first prove that the problem is NP-hard. We then present filtering steps that preserve optimal solution size, and an integer-linear-programming formulation, which together form FiSSC. We tested FiSSC on A-to-I RNA editing datasets with binary ambiguities. FiSSC outperformed all baseline methods and achieved optimal results in all but one dataset. We expect FiSSC to advance the study of sequence variation and biological complexity of ambiguous reads in various biological domains.
| Original language | English |
|---|---|
| Title of host publication | ACM-BCB 2024 - 15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics |
| Publisher | Association for Computing Machinery, Inc |
| Number of pages | 10 |
| ISBN (Electronic) | 9798400713026 |
| DOIs | |
| State | Published - 16 Dec 2024 |
| Event | 15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2024 - Shenzhen, China Duration: 22 Nov 2024 → 25 Nov 2024 |
Publication series
| Name | ACM-BCB 2024 - 15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics |
|---|
Conference
| Conference | 15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2024 |
|---|---|
| Country/Territory | China |
| City | Shenzhen |
| Period | 22/11/24 → 25/11/24 |
Bibliographical note
Publisher Copyright:© 2024 Copyright held by the owner/author(s).
Keywords
- ILP
- NP-hard
- independent set
- sequence cover