On the Entropy of Protein Families

John P. Barton, Arup K. Chakraborty, Simona Cocco, Hugo Jacquin, Rémi Monasson

Research output: Contribution to journalArticlepeer-review

12 Scopus citations


Proteins are essential components of living systems, capable of performing a huge variety of tasks at the molecular level, such as recognition, signalling, copy, transport,.. The protein sequences realizing a given function may largely vary across organisms, giving rise to a protein family. Here, we estimate the entropy of those families based on different approaches, including Hidden Markov Models used for protein databases and inferred statistical models reproducing the low-order (1- and 2-point) statistics of multi-sequence alignments. We also compute the entropic cost, that is, the loss in entropy resulting from a constraint acting on the protein, such as the mutation of one particular amino-acid on a specific site, and relate this notion to the escape probability of the HIV virus. The case of lattice proteins, for which the entropy can be computed exactly, allows us to provide another illustration of the concept of cost, due to the competition of different folds. The relevance of the entropy in relation to directed evolution experiments is stressed.

Original languageEnglish
Pages (from-to)1267-1293
Number of pages27
JournalJournal of Statistical Physics
Issue number5
StatePublished - 1 Mar 2016
Externally publishedYes

Bibliographical note

Funding Information:
S.C., H.J. and R.M. were partly funded by the Agence Nationale de la Recherche Coevstat project (ANR-13-BS04-0012-01).

Publisher Copyright:
© 2016, Springer Science+Business Media New York.


  • Covariation
  • Entropy
  • Fitness landscape
  • Genomics
  • HIV virus
  • Hidden Markov models
  • Statistical inference


Dive into the research topics of 'On the Entropy of Protein Families'. Together they form a unique fingerprint.

Cite this