TY - JOUR
T1 - A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction
AU - Eyal, Eran
AU - Frenkel-Morgenstern, Milana
AU - Sobolev, Vladimir
AU - Pietrokovski, Shmuel
PY - 2007/4/1
Y1 - 2007/4/1
N2 - We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitutions. P2PMAT matrix implicitly accounts for evolutionary conservation, correlated mutations, and residue-residue contact potentials. The usefulness of the matrix for structural predictions is shown in this article. Predicting protein residue-residue contacts from sequence information alone, by our method (P2PConPred) is particularly accurate in the protein cores, where it performs better than other basic contact prediction methods (increasing accuracy by 25-60%). The method mean accuracy for protein cores is 24% for 59 diverse families and 34% for a subset of proteins shorter than 100 residues. This is above the level that was recently shown to be sufficient to significantly improve ab initio protein structure prediction. We also demonstrate the ability of our approach to identify native structures within large sets of (300-2000) protein decoys. On the basis of evolutionary information alone our method ranks the native structure in the top 0.3% of the decoys in 4/10 of the sets, and in 8/10 of sets the native structure is ranked in the top 10% of the decoys. The method can, thus, be used to assist filtering wrong models, complimenting traditional scoring functions.
AB - We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitutions. P2PMAT matrix implicitly accounts for evolutionary conservation, correlated mutations, and residue-residue contact potentials. The usefulness of the matrix for structural predictions is shown in this article. Predicting protein residue-residue contacts from sequence information alone, by our method (P2PConPred) is particularly accurate in the protein cores, where it performs better than other basic contact prediction methods (increasing accuracy by 25-60%). The method mean accuracy for protein cores is 24% for 59 diverse families and 34% for a subset of proteins shorter than 100 residues. This is above the level that was recently shown to be sufficient to significantly improve ab initio protein structure prediction. We also demonstrate the ability of our approach to identify native structures within large sets of (300-2000) protein decoys. On the basis of evolutionary information alone our method ranks the native structure in the top 0.3% of the decoys in 4/10 of the sets, and in 8/10 of sets the native structure is ranked in the top 10% of the decoys. The method can, thus, be used to assist filtering wrong models, complimenting traditional scoring functions.
KW - Contact prediction
KW - Correlated mutations
KW - ab initio
UR - http://www.scopus.com/inward/record.url?scp=33847356936&partnerID=8YFLogxK
U2 - 10.1002/prot.21223
DO - 10.1002/prot.21223
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 17243158
AN - SCOPUS:33847356936
SN - 0887-3585
VL - 67
SP - 142
EP - 153
JO - Proteins: Structure, Function and Genetics
JF - Proteins: Structure, Function and Genetics
IS - 1
ER -