IEEE ISIT 2020 || Los Angeles, California, USA || 21-26 June 2020

Paper Detail

Paper ID

E.3.5

Paper Title

Missing Mass of Markov Chains

Authors

Prafulla Chandra, Andrew Thangaraj, Indian Institute of Technology Madras, India; Nived Rajaraman, University of California Berkeley, United States

Session

E.3: Estimation Theory

Presentation

Lecture

Track

Detection and Estimation

Manuscript

Click here to download the manuscript

Virtual Presentation

Click here to watch in the Virtual Symposium

Abstract

Estimation of missing mass or the total probability of unseen letters in a random sample is an important problem with several applications. The Good-Turing (GT) estimator is one of the most popular estimators for missing mass. The bias of the GT estimator is known to fall inversely with the number of samples when they are independent and identically distributed. When the samples form a Markov chain, very little is known about the convergence of the GT estimator even though it is widely used for smoothing in language models that are mostly Markovian. In this work, we initiate and make progress towards characterizing bias of the GT estimator for missing mass in Markov chains. We develop a useful `multi-letter' characterization of the bias, which leads to sufficient conditions on the transition probability matrix for convergence of the bias of the GT estimator to zero.

Plan Ahead

IEEE ISIT 2021

2021 IEEE International Symposium on Information Theory

11-16 July 2021 | Melbourne, Victoria, Australia

Visit Website!

2020 IEEE International Symposium
on Information Theory

21-26 June 2020 • Los Angeles, California, USA

2020 IEEE International Symposium
on Information Theory

21-26 June 2020 • Los Angeles, California, USA

Technical Program

Paper Detail

IEEE ISIT 2021