Paper ID | O.1.6 | ||
Paper Title | Data Deduplication with Random Substitutions | ||
Authors | Hao Lou, Farzad Farnoud, University of Virginia, United States | ||
Session | O.1: Data Compression | ||
Presentation | Lecture | ||
Track | Source Coding | ||
Manuscript | Click here to download the manuscript | ||
Virtual Presentation | Click here to watch in the Virtual Symposium | ||
Abstract | Data deduplication saves storage space by identifying and removing repeats in the data stream. In this paper, we provide information-theoretical analysis on the performance of deduplication algorithms with data streams where repeats are not exact. We introduce a source model in which probabilistic substitutions are considered. Two modified versions of fixed-length deduplication are studied and proven to have performance within a constant factor of optimal with the knowledge of repeat length. We also study the variable-length scheme and show that as entropy becomes smaller, the size of the compressed string vanishes relative to the length of the uncompressed string. |
Plan Ahead
2021 IEEE International Symposium on Information Theory
11-16 July 2021 | Melbourne, Victoria, Australia