Abstract
CRISPR/Cas9 technology has revolutionized gene-editing technologies. By designing a single guide RNA (sgRNA) of 20nt, one can target any genomic loci followed by NGG downstream. However, the endogenous gene-editing efficiency at the target site and unintended off-target sites (OTSs) are affected by various factors, among them are the sequence of the site, its flanking sequences, and the epigenetic marks harbouring it. Since experimentally measuring endogenous efficiency and OTSs are time- and resource-consuming tasks, researchers have developed computational methods to predict them. However, previous studies have failed to successfully incorporate the factors that affect on- or off-target activity, and were trained and evaluated on small datasets. Recently, Leenay et al. produced the most comprehensive dataset of endogenous on-target efficiencies of nearly 1,600 measurements in T cells, and Yaish et al. reprocessed the largest dataset of 78 OTS experiments in T cells, providing a unique opportunity to develop a method to predict endogenous on-target efficiency and OTSs. Here, we developed EpiCRISPROn and EpiCRISPROff to improve on-target efficiency and OTSs prediction, respectively, by combining various inputs: the site and its flanking sequences, multiple epigenetic marks, and a high-throughput-based prediction. In EpiCRISROn and EpiCRISPROff, the additional non-sequence features improved prediction performance from an average Spearman correlation of 0.31 to 0.51 in 5-fold cross-validation and from an AUPRC of 0.436 to 0.441 on a held-out test set, respectively. Moreover, EpiCRISPROn and EpiCRISPROff were trained on one cell type and successfully generalized to other cell types. Furthermore, we shed light on on-target and OTS preference by interrogating EpiCRISPROn and EpiCRISPROff trained models. We expect EpiCRISPROn and EpiCRISPROff to advance the field of gene editing by providing improved prediction of endogenous CRISPR/Cas9 on-target efficiency and off-target activity.
| Original language | English |
|---|---|
| Title of host publication | BCB 2025 - Proceedings of the 16th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics |
| Publisher | Association for Computing Machinery, Inc |
| ISBN (Electronic) | 9798400722004 |
| DOIs | |
| State | Published - 10 Dec 2025 |
| Event | 16th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2025 - Philadelphia, United States Duration: 12 Oct 2025 → 15 Oct 2025 |
Publication series
| Name | BCB 2025 - Proceedings of the 16th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics |
|---|
Conference
| Conference | 16th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2025 |
|---|---|
| Country/Territory | United States |
| City | Philadelphia |
| Period | 12/10/25 → 15/10/25 |
Bibliographical note
Publisher Copyright:© 2025 Copyright held by the owner/author(s).
Keywords
- CRISPR/Cas9
- deep neural networks
- epigenetics
- flanking sequences
- off-target sites
- on-target efficiency