Paper ID | M.6.2 | ||
Paper Title | Constrained Coding with Error Control for DNA-Based Data Storage | ||
Authors | Tuan Thanh Nguyen, Kui Cai, Singapore University of Technology and Design, Singapore; Kees A. Schouhamer Immink, Turing Machines Inc, Netherlands; Han Mao Kiah, Nanyang Technological University, Singapore | ||
Session | M.6: Coding for Storage and Memories II | ||
Presentation | Lecture | ||
Track | Coding for Storage and Memories | ||
Manuscript | Click here to download the manuscript | ||
Virtual Presentation | Click here to watch in the Virtual Symposium | ||
Abstract | In this paper, we first propose coding techniques for DNA-based data storage which account the maximum homopolymer runlength and the GC-content. In particular, for arbitrary $\ell,\epsilon > 0$, we propose simple and efficient $(\epsilon, \ell)$-constrained encoders that transform binary sequences into DNA base sequences (codewords), that satisfy the following properties: • Runlength constraint: the maximum homopolymer run in each codeword is at most $\ell$, • GC-content constraint: the GC-content of each codeword is within $[0.5 − \epsilon, 0.5 + \epsilon]$. For practical values of l and ε, our codes achieve higher rates than the existing results in the literature. We further design efficient $(\epsilon,\ell)$-constrained codes with error-correction capability. Specifically, the designed codes satisfy the runlength constraint, the GC-content constraint, and can correct a single edit (i.e. a single deletion, insertion, or substitution) and its variants. To the best of our knowledge, no such codes are constructed prior to this work. |
Plan Ahead
2021 IEEE International Symposium on Information Theory
11-16 July 2021 | Melbourne, Victoria, Australia