Aleksander Mądry (MIT)
Norbert Wiener Center Colloquium Speaker
Time: 3:30 pm on Friday, October 7th, 2022
Datamodels: Predicting Predictions with Training Data
Machine learning models tend to rely on an abundance of training data.
Yet, understanding the underlying structure of this data---and models'
exact dependence on it---remains a challenge.
In this talk, we will present a new framework---called datamodeling---for
directly modeling predictions as functions of training data. This
datamodeling framework, given a dataset and a learning algorithm,
pinpoints---at varying levels of granularity---the relationships between
train and test point pairs through the lens of the corresponding model
class. Even in its most basic version, datamodels enable many applications,
including discovering subpopulations, quantifying model brittleness via
counterfactuals, and identifying train-test leakage.