TY - JOUR
T1 - Scaling law in sizes of protein sequence families
T2 - From super-families to orphan genes
AU - Unger, Ron
AU - Uliel, Shai
AU - Havlin, Shlomo
PY - 2003/6/1
Y1 - 2003/6/1
N2 - It has been observed that the size of protein sequence families is unevenly distributed, with few super families with a large number of members and many "orphan" proteins that do not belong to any family. Here it is shown that the distribution of sizes of protein families in different databases and classifications (Protomap, Prodom, Cog) follows a power-law behavior with similar scaling exponents, which is characteristic of self-organizing systems. Since large databases are used in this study, a more detailed analysis of the data than in previous studies was possible. Hence, it is shown that the size distribution is governed by two exponents, different for the super families and the orphan proteins. A simple model of protein evolution is proposed, in which proteins are dynamically generated and clustered into families. The model yields a scaling behavior very similar to the distribution observed in the actual sequence databases, including the two distinct regimes for the large and small families, and thus suggests that the existence of "super families" of proteins and "orphan" proteins are two manifestations of the same evolutionary process.
AB - It has been observed that the size of protein sequence families is unevenly distributed, with few super families with a large number of members and many "orphan" proteins that do not belong to any family. Here it is shown that the distribution of sizes of protein families in different databases and classifications (Protomap, Prodom, Cog) follows a power-law behavior with similar scaling exponents, which is characteristic of self-organizing systems. Since large databases are used in this study, a more detailed analysis of the data than in previous studies was possible. Hence, it is shown that the size distribution is governed by two exponents, different for the super families and the orphan proteins. A simple model of protein evolution is proposed, in which proteins are dynamically generated and clustered into families. The model yields a scaling behavior very similar to the distribution observed in the actual sequence databases, including the two distinct regimes for the large and small families, and thus suggests that the existence of "super families" of proteins and "orphan" proteins are two manifestations of the same evolutionary process.
KW - Evolution
KW - Power-law
KW - Protein families
KW - Scaling
KW - Size distribution
UR - http://www.scopus.com/inward/record.url?scp=0037941120&partnerID=8YFLogxK
U2 - 10.1002/prot.10347
DO - 10.1002/prot.10347
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 12784216
AN - SCOPUS:0037941120
SN - 0887-3585
VL - 51
SP - 569
EP - 576
JO - Proteins: Structure, Function and Genetics
JF - Proteins: Structure, Function and Genetics
IS - 4
ER -