Technical Program

Paper Detail

Paper IDL.13.5
Paper Title What is the Value of Data? on Mathematical Methods for Data Quality Estimation
Authors Netanel Raviv, Washington University in Saint Louis, United States; Siddharth Jain, Jehoshua Bruck, California Institute of Technology, United States
Session L.13: Online, Active, and Transfer Learning
Presentation Lecture
Track Statistics and Learning Theory
Manuscript  Click here to download the manuscript
Virtual Presentation  Click here to watch in the Virtual Symposium
Abstract Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset's quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.

Plan Ahead


2021 IEEE International Symposium on Information Theory

11-16 July 2021 | Melbourne, Victoria, Australia

Visit Website!