Newly synthesized polypeptides must pass stringent quality controls in cells to ensure appropriate folding and function. However, mutations, environmental stresses and aging can reduce efficiencies of these controls, leading to accumulation of protein aggregates, amyloid fibrils and plaques. In-vitro experiments have shown that even single amino acid substitutions can drastically enhance or mitigate protein aggregation kinetics. In this work, we have collected a dataset of 220 unique mutations in 25 proteins and classified them as enhancers or mitigators on the basis of their effect on protein aggregation rate. The data were analyzed via machine learning to identify features capable of distinguishing between aggregation rate enhancers and mitigators. Our initial Support Vector Machine (SVM) model separated such mutations with an overall accuracy of 69%. When local secondary structures at the mutation sites were considered, the accuracies further improved by 13–15%. The machine-learnt features are distinct for each secondary structure class at mutation sites. Protein stability and flexibility changes are important features for mutations in α-helices. β-strand propensity, polarity and charge become important when mutations occur in β-strands and ability to form secondary structure, helical tendency and aggregation propensity are important for mutations lying in coils. These results have been incorporated into a sequence-based algorithm (available at http://www.iitm.ac.in/bioinfo/aggrerate-disc/) capable of predicting whether a mutation will enhance or mitigate a protein's aggregation rate. This algorithm will find several applications towards understanding protein aggregation in human diseases, enable in-silico optimization of biopharmaceuticals and enzymes for improved biophysical attributes and de novo design of bio-nanomaterials.
|Number of pages||11|
|Journal||International Journal of Biological Macromolecules|
|State||Published - 15 Oct 2018|
Bibliographical noteFunding Information:
We thank Bioinformatics Infrastructure facility, Department of Biotechnology and Indian Institute of Technology Madras for computational facilities and Ministry of human resource and development (MHRD) for HTRA scholarship to PR. The work was partially supported by the Department of Biotechnology, Government of India to MMG (BT/PR16710/BID/7/680/2016).
We thank Bioinformatics Infrastructure facility, Department of Biotechnology and Indian Institute of Technology Madras for computational facilities and Ministry of human resource and development (MHRD) for HTRA scholarship to PR. The work was partially supported by the Department of Biotechnology, Government of India to MMG ( BT/PR16710/BID/7/680/2016 ).
© 2018 Elsevier B.V.
- Aggregation rate
- Machine learning
- Support vector machine