Non-covalent residue side-chain interactions occur in many different types of proteins and facilitate many biological functions. Are these differences manifested in the sequence compositions and/or the residue-residue contact preferences of the interfaces? Previous studies analysed small data sets and gave contradictory answers. Here, we introduced a new data-mining method that yielded the largest high-resolution data set of interactions analysed. We introduced an information theory-based analysis method. On the basis of sequence features, we were able to differentiate six types of protein interfaces, each corresponding to a different functional or structural association between residues. Particularly, we found significant differences in amino acid composition and residue-residue preferences between interactions of residues within the same structural domain and between different domains, between permanent and transient interfaces, and between interactions associating homo-oligomers and hetero-oligomers. The differences between the six types were so substantial that, using amino acid composition alone, we could predict statistically to which of the six types of interfaces a pool of 1000 residues belongs at 63-100% accuracy. All interfaces differed significantly from the background of all residues in SWISS-PROT, from the group of surface residues, and from internal residues that were not involved in non-trivial interactions. Overall, our results suggest that the interface type could be predicted from sequence and that interface-type specific mean-field potentials may be adequate for certain applications.
Bibliographical noteFunding Information:
Thanks to Lukasz Salwinski (UCLA) and Ioannis Xenarios (UCLA, Lausanne) for their help in obtaining homo-complexes from DIP; thanks to Jinfeng Liu (Columbia) for computer assistance and Henry Bigelow (Columbia) for invaluable comments on the manuscript. We are grateful for the invaluable comments from two unknown referees, and from Shoshana Wodak (Brussels), and from Barry Honig (Columbia). This work was supported by grants 1-P50-GM62413-01 and RO1-GM63029-01 from the National Institute of Health. Last, not least, thanks to all those who deposit their experimental data in public databases, and to those who maintain these databases, in particular to Phil Bourne (UCSD), Amos Bairoch (Geneva), Rolf Apweiler (EBI) and their teams.
- Drug design
- Protein complexes
- Protein folding
- Protein interface
- Protein-protein interaction