Discrete profile alignment via constrained information bottleneck

Sean O'Rourke, Gal Chechik, Robin Friedman, Eleazar Eskin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Amino acid profiles, which capture position-specific mutation probabilities, are a richer encoding of biological sequences than the individual sequences themselves. However, profile comparisons are much more computationally expensive than discrete symbol comparisons, making profiles impractical for many large datasets. Furthermore, because they are such a rich representation, profiles can be difficult to visualize. To overcome these problems, we propose a discretization for profiles using an expanded alphabet representing not just individual amino acids, but common profiles. By using an extension of information bottleneck (IB) incorporating constraints and priors on the class distributions, we find an informationally optimal alphabet. This discretization yields a concise, informative textual representation for profile sequences. Also alignments between these sequences, while nearly as accurate as the full profileprofile alignments, can be computed almost as quickly as those between individual or consensus sequences. A full pairwise alignment of SwissProt would take years using profiles, but less than 3 days using a discrete IB encoding, illustrating how discrete encoding can expand the range of sequence problems to which profile information can be applied.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 17 - Proceedings of the 2004 Conference, NIPS 2004
PublisherNeural information processing systems foundation
ISBN (Print)0262195348, 9780262195348
StatePublished - 2005
Externally publishedYes
Event18th Annual Conference on Neural Information Processing Systems, NIPS 2004 - Vancouver, BC, Canada
Duration: 13 Dec 200416 Dec 2004

Publication series

NameAdvances in Neural Information Processing Systems
ISSN (Print)1049-5258

Conference

Conference18th Annual Conference on Neural Information Processing Systems, NIPS 2004
Country/TerritoryCanada
CityVancouver, BC
Period13/12/0416/12/04

Fingerprint

Dive into the research topics of 'Discrete profile alignment via constrained information bottleneck'. Together they form a unique fingerprint.

Cite this