Abstract
Motivation: High-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more nonfunctional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively expensive. Consequently, many protein-screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein-variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity. Results: We introduce a novel algorithm for total DC library optimization, degenerate codon design (DeCoDe), based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g. the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states. Contact: [email protected]
Original language | English |
---|---|
Pages (from-to) | 3357-3364 |
Number of pages | 8 |
Journal | Bioinformatics |
Volume | 36 |
Issue number | 11 |
DOIs | |
State | Published - 1 Jun 2020 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2020 The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].
Funding
T.C.S. acknowledges travel support from the Prof. Rahamimoff Travel Grant Program of the United States-Israel Binational Science Foundation (BSF). T.C.S. acknowledges the support of an NSF Graduate Research Fellowship. P.M.F. is a Chan Zuckerberg Biohub Investigator and acknowledges the support of an Alfred P. Sloan Foundation Fellowship. This work was supported by the National Institutes of Health [DP2-GM-123641 to P.M.F.].
Funders | Funder number |
---|---|
National Science Foundation | |
National Institutes of Health | |
National Institute of General Medical Sciences | DP2GM123641 |
Alfred P. Sloan Foundation | |
United States-Israel Binational Science Foundation |