TY - JOUR
T1 - Scalable and robust DNA-based storage via coding theory and deep learning
AU - Bar-Lev, Daniella
AU - Orr, Itai
AU - Sabary, Omer
AU - Etzion, Tuvi
AU - Yaakobi, Eitan
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Nature Limited 2025.
PY - 2025/4
Y1 - 2025/4
N2 - The global data sphere is expanding exponentially, projected to hit 180 zettabytes by 2025, whereas current technologies are not anticipated to scale at nearly the same rate. DNA-based storage emerges as a crucial solution to this gap, enabling digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of the main bottlenecks are the scalability and accuracy, which have a natural tradeoff between the two. Here we show a modular and holistic approach that combines deep neural networks trained on simulated data, tensor product-based error-correcting codes and a safety margin mechanism into a single coherent pipeline. We demonstrated our solution on 3.1 MB of information using two different sequencing technologies. Our work improves upon the current leading solutions with a 3,200× increase in speed and a 40% improvement in accuracy and offers a code rate of 1.6 bits per base in a high-noise regime. In a broader sense, our work shows a viable path to commercial DNA storage solutions hindered by current information retrieval processes.
AB - The global data sphere is expanding exponentially, projected to hit 180 zettabytes by 2025, whereas current technologies are not anticipated to scale at nearly the same rate. DNA-based storage emerges as a crucial solution to this gap, enabling digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of the main bottlenecks are the scalability and accuracy, which have a natural tradeoff between the two. Here we show a modular and holistic approach that combines deep neural networks trained on simulated data, tensor product-based error-correcting codes and a safety margin mechanism into a single coherent pipeline. We demonstrated our solution on 3.1 MB of information using two different sequencing technologies. Our work improves upon the current leading solutions with a 3,200× increase in speed and a 40% improvement in accuracy and offers a code rate of 1.6 bits per base in a high-noise regime. In a broader sense, our work shows a viable path to commercial DNA storage solutions hindered by current information retrieval processes.
UR - http://www.scopus.com/inward/record.url?scp=85218674517&partnerID=8YFLogxK
U2 - 10.1038/s42256-025-01003-z
DO - 10.1038/s42256-025-01003-z
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85218674517
SN - 2522-5839
VL - 7
SP - 639
EP - 649
JO - Nature Machine Intelligence
JF - Nature Machine Intelligence
IS - 4
M1 - 352
ER -