Technical Program

Paper Detail

Paper IDM.7.3
Paper Title Error-correcting Codes for Short Tandem Duplication and Substitution Errors
Authors Yuanyuan Tang, Farzad Farnoud, The University of Virginia, United States
Session M.7: Insertion Deletion Substitution Codes I
Presentation Lecture
Track Coding for Storage and Memories
Manuscript  Click here to download the manuscript
Virtual Presentation  Click here to watch in the Virtual Symposium
Abstract Due to its high data density and longevity, DNA is considered a promising storage medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and substitution errors. We focus on tandem repeats of length at most 3 and design codes for correcting an arbitrary number of duplication errors and one substitution error. Because a substituted symbol can be duplicated many times (possibly as part of longer substrings), a single substitution can affect an unbounded substring of the retrieved word. However, we show that with appropriate preprocessing, the effect may be limited to a substring of finite length, thus making efficient error-correction possible. We construct a code for correcting the aforementioned errors and provide lower bounds for its rate. In particular, compared to optimal codes correcting only duplication errors, numerical results show that the asymptotic cost of protecting against an additional substitution is only 0.003 bits/symbol when the alphabet has size 4, an important case corresponding to data storage in DNA.

Plan Ahead

IEEE ISIT 2021

2021 IEEE International Symposium on Information Theory

11-16 July 2021 | Melbourne, Victoria, Australia

Visit Website!