Abstract
Proteins are essential components of living systems, capable of performing a huge variety of tasks at the molecular level, such as recognition, signalling, copy, transport,.. The protein sequences realizing a given function may largely vary across organisms, giving rise to a protein family. Here, we estimate the entropy of those families based on different approaches, including Hidden Markov Models used for protein databases and inferred statistical models reproducing the low-order (1- and 2-point) statistics of multi-sequence alignments. We also compute the entropic cost, that is, the loss in entropy resulting from a constraint acting on the protein, such as the mutation of one particular amino-acid on a specific site, and relate this notion to the escape probability of the HIV virus. The case of lattice proteins, for which the entropy can be computed exactly, allows us to provide another illustration of the concept of cost, due to the competition of different folds. The relevance of the entropy in relation to directed evolution experiments is stressed.
Original language | English |
---|---|
Pages (from-to) | 1267-1293 |
Number of pages | 27 |
Journal | Journal of Statistical Physics |
Volume | 162 |
Issue number | 5 |
DOIs | |
State | Published - 1 Mar 2016 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2016, Springer Science+Business Media New York.
Funding
S.C., H.J. and R.M. were partly funded by the Agence Nationale de la Recherche Coevstat project (ANR-13-BS04-0012-01).
Funders | Funder number |
---|---|
Agence Nationale de la Recherche | ANR-13-BS04-0012-01 |
Keywords
- Covariation
- Entropy
- Fitness landscape
- Genomics
- HIV virus
- Hidden Markov models
- Statistical inference