Technical Program

Paper Detail

Paper IDO.1.6
Paper Title Data Deduplication with Random Substitutions
Authors Hao Lou, Farzad Farnoud, University of Virginia, United States
Session O.1: Data Compression
Presentation Lecture
Track Source Coding
Manuscript  Click here to download the manuscript
Virtual Presentation  Click here to watch in the Virtual Symposium
Abstract Data deduplication saves storage space by identifying and removing repeats in the data stream. In this paper, we provide information-theoretical analysis on the performance of deduplication algorithms with data streams where repeats are not exact. We introduce a source model in which probabilistic substitutions are considered. Two modified versions of fixed-length deduplication are studied and proven to have performance within a constant factor of optimal with the knowledge of repeat length. We also study the variable-length scheme and show that as entropy becomes smaller, the size of the compressed string vanishes relative to the length of the uncompressed string.

Plan Ahead

IEEE ISIT 2021

2021 IEEE International Symposium on Information Theory

11-16 July 2021 | Melbourne, Victoria, Australia

Visit Website!