Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping

Yaron Orenstein, Robert Puccinelli, Ryan Kim, Polly Fordyce, Bonnie Berger

Research output: Contribution to journalArticlepeer-review

Abstract

Sequence libraries that cover all k-mers enable universal, unbiased measurements of binding to both oligonucleotides and peptides. While the number of k-mers grows exponentially in k, space on all experimental platforms is limited. Here, we shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet simultaneously. We present the JokerCAKE (joker covering all k-mers) algorithm for generating a short sequence such that each k-mer appears at least p times with at most one joker character per k-mer. By running our algorithm on a range of parameters and alphabets, we show that JokerCAKE produces near-optimal sequences. Moreover, through comparison with data from hundreds of DNA-protein binding experiments and with new experimental results for both standard and JokerCAKE libraries, we establish that accurate binding scores can be inferred for high-affinity k-mers using JokerCAKE libraries. JokerCAKE libraries allow researchers to search a significantly larger sequence space using the same number of experimental measurements and at the same cost. We present a new compact sequence design that covers all k-mers utilizing joker characters and develop an efficient algorithm to generate such designs. We show through simulations and experimental validation that these sequence designs are useful for identifying high-affinity binding sites at significantly reduced cost and space.

Original languageEnglish
Pages (from-to)230-236.e5
JournalCell Systems
Volume5
Issue number3
DOIs
StatePublished - 27 Sep 2017
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2017 The Authors

Funding

This work was supported by the NIH (grant R01GM081871 to B.B., grant R00GM09984804 to P.F.). Part of this work was done while Y.O. was visiting the Simons Institute for the Theory of Computing. Part of this work was done while R.K. was visiting the Research Science Institute and was supported by the Center for Excellence in Education and their sponsors. P.F. is a Chan Zuckerberg Biohub Investigator and also acknowledges the support of a Gabilan and McCormick Fellowship for this work. An early version of this paper was submitted to and peer reviewed at the 2017 Annual International Conference on Research in Computational Molecular Biology (RECOMB). The manuscript was revised and then independently further reviewed at Cell Systems.

FundersFunder number
National Institutes of HealthR00GM09984804
National Institute of General Medical SciencesR01GM081871

    Keywords

    • de Bruijn graph
    • microarray design
    • sequence libraries

    Fingerprint

    Dive into the research topics of 'Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping'. Together they form a unique fingerprint.

    Cite this