SKM 2023 – wissenschaftliches Programm
DY 1.3: Tutorium
Sonntag, 26. März 2023, 17:30–18:15, HSZ 01
Computing learning curves for large machine learning models using the replica approach — •Manfred Opper — Inst. für Softwaretechnik und Theor. Informatik, TU Berlin — Centre for Systems Modelling and Quantitative Biomedicine, University of Birmingham, UK
Methods of statistical physics have been used for a long time to mathematically analyse the typical performance of machine learning models in the limit where both the number of data and the number of parameters (such as network weights) is large. By defining Boltzmann-Gibbs probability distributions over parameters where the cost function of the machine learning problem plays the role of a hamiltoninan, one can derive analytical expressions for training errors and generalisation errors using the corresponding partition functions and free energies in terms of a usually small number of order parameters.
Since the models depend on a set of random data to be learnt, additional appropriate statistical (so-called quenched) averages of free energies over this 'disorder' have to be performed. The replica approach is a prominent analytical tool from the statistical physics of disordered systems to solve this nontrivial technical challenge.
In this tutorial I will give an introduction to this approach. Starting with an explicit calculation for simple single layer perceptrons, I will then argue how the method can be applied to more complex problems such as kernel machines (support vector machines and Gaussian processes) and multilayer networks.