A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction

Ofir Yaish, Maor Asif, Yaron Orenstein

Research output: Contribution to journalReview articlepeer-review

10 Scopus citations

Abstract

CRISPR/Cas9 system is widely used in a broad range of gene-editing applications. While this editing technique is quite accurate in the target region, there may be many unplanned off-target sites (OTSs). Consequently, a plethora of computational methods have been developed to predict off-target cleavage sites given a guide RNA and a reference genome. However, these methods are based on small-scale datasets (only tens to hundreds of OTSs) produced by experimental techniques to detect OTSs with a low signal-to-noise ratio. Recently, CHANGE-seq, a new in vitro experimental technique to detect OTSs, was used to produce a dataset of unprecedented scale and quality (>200 000 OTS over 110 guide RNAs). In addition, the same study included in cellula GUIDE-seq experiments for 58 of the guide RNAs. Here, we fill the gap in previous computational methods by utilizing these data to systematically evaluate data processing and formulation of the CRISPR OTSs prediction problem. Our evaluations show that data transformation as a pre-processing phase is critical prior to model training. Moreover, we demonstrate the improvement gained by adding potential inactive OTSs to the training datasets. Furthermore, our results point to the importance of adding the number of mismatches between guide RNAs and their OTSs as a feature. Finally, we present predictive off-target in cellula models based on both in vitro and in cellula data and compare them to state-of-the-art methods in predicting true OTSs. Our conclusions will be instrumental in any future development of an off-target predictor based on high-throughput datasets.

Original languageEnglish
Article numberbbac157
JournalBriefings in Bioinformatics
Volume23
Issue number5
Early online date20 May 2022
DOIs
StatePublished - 20 Sep 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2022 The Author(s).

Funding

Israel Innovation Authority through the CRISPR-ILcon-sortium; Israel Science Foundation (grantNo.351/22).

FundersFunder number
Israel Innovation Authority
Israel Science Foundation.351/22

    Keywords

    • CHANGE-seq
    • CRISPR off-target
    • GUIDE-seq
    • machine learning
    • read count normalization

    Fingerprint

    Dive into the research topics of 'A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction'. Together they form a unique fingerprint.

    Cite this