Publications | Dr. Alex El-Shaikh

2023

Sci_Rep
Content-based filter queries on DNA data storage systems

Alex El-Shaikh, and Bernhard Seeger

Scientific Reports, Apr 2023

Abs DOI Bib HTML PDF

Recent developments in DNA data storage systems have revealed the great potential to store large amounts of data at a very high density with extremely long persistence and low cost. However, despite recent contributions to robust data encoding, current DNA storage systems offer limited support for random access on DNA storage devices due to restrictive biochemical constraints. Moreover, state-of-the-art approaches do not support content-based filter queries on DNA storage. This paper introduces the first encoding for DNA that enables content-based searches on structured data like relational database tables. We provide the details of the methods for coding and decoding millions of directly accessible data objects on DNA. We evaluate the derived codes on real data sets and verify their robustness.
@article{El-Shaikh2023, author = {El-Shaikh, Alex and Seeger, Bernhard}, title = {Content-based filter queries on DNA data storage systems}, journal = {Scientific Reports}, year = {2023}, month = apr, day = {29}, volume = {13}, number = {1}, pages = {7053}, issn = {2045-2322}, doi = {10.1038/s41598-023-34160-5}, }

BTW

DNAContainer: An object-based storage architecture on DNA

Alex El-Shaikh, and Bernhard Seeger

In BTW 2023, Apr 2023

DOI Bib PDF

@incollection{incollection,
  author = {El-Shaikh, Alex and Seeger, Bernhard},
  title = {DNAContainer: An object-based storage architecture on DNA},
  year = {2023},
  booktitle = {BTW 2023},
  publisher = {Gesellschaft für Informatik e.V.},
  address = {Bonn},
  isbn = {978-3-88579-725-8},
  pages = {773--795},
  doi = {10.18420/BTW2023-50},
}

DB-Spektrum
An Extension of DNAContainer with a Small Memory Footprint

Alex El-Shaikh, and Bernhard Seeger

Datenbank-Spektrum, Nov 2023

Abs DOI Bib HTML PDF

Over the past decade, DNA has emerged as a new storage medium with intriguing data volume and durability capabilities. Despite its advantages, DNA storage also has crucial limitations, such as intricate data access interfaces and restricted random accessibility. To overcome these limitations, DNAContainer has been introduced with a novel storage interface for DNA that spans a very large virtual address space on objects and allows random access to DNA at scale. In this paper, we substantially improve the first version of DNAContainer, focusing on the update capabilities of its data structures and optimizing its memory footprint. In addition, we extend the previous set of experiments on DNAContainer with new ones whose results reveal the impact of essential parameters on the performance and memory footprint.
@article{El-Shaikh2024, author = {El-Shaikh, Alex and Seeger, Bernhard}, title = {An Extension of DNAContainer with a Small Memory Footprint}, journal = {Datenbank-Spektrum}, year = {2023}, month = nov, day = {01}, volume = {23}, number = {3}, pages = {211-220}, issn = {1610-1995}, doi = {10.1007/s13222-023-00460-3}, }

2022

NARGAB
High-scale random access on DNA storage systems

Alex El-Shaikh, Marius Welzel, Dominik Heider, and 1 more author

NAR Genomics and Bioinformatics, Jan 2022

Abs DOI Bib HTML PDF

Due to the rapid cost decline of synthesizing and sequencing deoxyribonucleic acid (DNA), high information density, and its durability of up to centuries, utilizing DNA as an information storage medium has received the attention of many scientists. State-of-the-art DNA storage systems exploit the high capacity of DNA and enable random access (predominantly random reads) by primers, which serve as unique identifiers for directly accessing data. However, primers come with a significant limitation regarding the maximum available number per DNA library. The number of different primers within a library is typically very small (e.g.≈10). We propose a method to overcome this deficiency and present a general-purpose technique for addressing and directly accessing thousands to potentially millions of different data objects within the same DNA pool. Our approach utilizes a fountain code, sophisticated probe design, and microarray technologies. A key component is locality-sensitive hashing, making checks for dissimilarity among such a large number of probes and data objects feasible.
@article{10.1093/nargab/lqab126, author = {El-Shaikh, Alex and Welzel, Marius and Heider, Dominik and Seeger, Bernhard}, title = {High-scale random access on DNA storage systems}, journal = {NAR Genomics and Bioinformatics}, volume = {4}, number = {1}, pages = {lqab126}, year = {2022}, month = jan, issn = {2631-9268}, doi = {10.1093/nargab/lqab126}, }