Learning in High Dimension Always Amounts to Extrapolation
By Sharon C Betts, on 24 November 2021
By Laura Ruis, PhD Candidate
Recently, Randall Balestriero, Jerome Pesenti, and Yann LeCun dropped a paper on arXiv that clarifies certain terms that are often used when people talk about generalization in machine learning. In machine learning, we often formulate a differentiable objective function for our problem that we can optimize with gradient-based methods. We tune model parameters given data such that this objective function is optimized. However, what differentiates machine learning from optimization is that we do not just want our model to optimize the objective function for the data we used to learn the parameters, called the training data, we also want it to generalize to unseen data points. Modern machine learning methods have become very good at this for lots of applications like speech recognition, machine translation, and image classification. However, some people claim (here, here, and here) that these methods are simply interpolating the training data they see during training, and that they would fail when classifying a new data point requires extrapolation. The paper by Balestriero et al. shows that this is not the case for a specific definition of interpolation and extrapolation. They come to the following conclusion:
We shouldn’t use interpolation/extrapolation in the way the terms are defined in the paper when talking about generalization, because for high dimensional data deep learning models always have to extrapolate, regardless of the dimension of the underlying data manifold.
In this post I’ll attempt to shed some light on this conclusion. It’s drawn in part from the first figure in the paper, which we will reproduce from scratch. In the process of doing that we’ll encounter all the relevant background material that’s necessary to understand this paper. I’ll go through all the code and maths that’s required to reproduce it. Below you can see the figure I’m talking about, that without any explanation won’t illuminate much yet. If you want to understand it, read on!
At the end this post, we will know more about the following terms:
- The curse of dimensionality
- Convex hull
- Ambient dimension
- Intrinsic dimension / data manifold dimension
- Interpolation / extrapolation