Recent developments in DNA data storage systems have revealed the great potential to store large amounts of data at a very high density with extremely long persistence and low cost. However, despite recent contributions to robust data encoding, current DNA storage systems offer limited support for random access on DNA storage devices due to restrictive biochemical constraints. Moreover, state-of-the-art approaches do not support content-based filter queries on DNA storage. This paper introduces the first encoding for DNA that enables content-based searches on structured data like relational database tables. We provide the details of the methods for coding and decoding millions of directly accessible data objects on DNA. We evaluate the derived codes on real data sets and verify their robustness.
BTW
DNAContainer: An object-based storage architecture on DNA
Over the past decade, DNA has emerged as a new storage medium with intriguing data volume and durability capabilities. Despite its advantages, DNA storage also has crucial limitations, such as intricate data access interfaces and restricted random accessibility. To overcome these limitations, DNAContainer has been introduced with a novel storage interface for DNA that spans a very large virtual address space on objects and allows random access to DNA at scale. In this paper, we substantially improve the first version of DNAContainer, focusing on the update capabilities of its data structures and optimizing its memory footprint. In addition, we extend the previous set of experiments on DNAContainer with new ones whose results reveal the impact of essential parameters on the performance and memory footprint.
2022
NARGAB
High-scale random access on DNA storage systems
Alex
El-Shaikh, Marius
Welzel, Dominik
Heider, and
1 more author
Due to the rapid cost decline of synthesizing and sequencing deoxyribonucleic acid (DNA), high information density, and its durability of up to centuries, utilizing DNA as an information storage medium has received the attention of many scientists. State-of-the-art DNA storage systems exploit the high capacity of DNA and enable random access (predominantly random reads) by primers, which serve as unique identifiers for directly accessing data. However, primers come with a significant limitation regarding the maximum available number per DNA library. The number of different primers within a library is typically very small (e.g.≈10). We propose a method to overcome this deficiency and present a general-purpose technique for addressing and directly accessing thousands to potentially millions of different data objects within the same DNA pool. Our approach utilizes a fountain code, sophisticated probe design, and microarray technologies. A key component is locality-sensitive hashing, making checks for dissimilarity among such a large number of probes and data objects feasible.