|Detecting an Odd Restless Markov Arm with a Trembling Hand
|PN Karthik, Rajesh Sundaresan, Indian Institute of Science, India
|L.12: Multi-Arm Bandits
|Statistics and Learning Theory
|Click here to download the manuscript
|Click here to watch in the Virtual Symposium
|Consider a multi-armed bandit whose arms are independent Markov processes on a common underlying state space. The transition probability matrix of one of the arms (the odd arm) is different from the common transition probability matrix of all the other arms. The goal is to identify the odd arm as quickly as possible while keeping the probability of decision error small. We study the case of restless Markov observations and identify an asymptotic lower bound on the expected stopping time for a decision with vanishing error probability. We then propose a sequential test and show that the asymptotic behaviour of its expected stopping time comes arbitrarily close to that of the lower bound. Prior works dealt with iid arms and rested Markov arms, whereas our work deals with restless Markov arms.