DY 45.6: Vortrag

Donnerstag, 12. März 2026, 10:30–10:45, GÖR/0226

Testing generalization through tiny task switching frameworks — •Daniel Henrik Nevermann and Claudius Gros — Institut für Theoretische Physik, Goethe-Universität Frankfurt, Deutschland

With an ever-growing interest in advancing the performance and efficiency of large language models (LLMs), and therein particularly the transformer architecture, the need for tiny testing frameworks is pressing, as many researchers cannot afford to train models on large GPU clusters. We here propose a tiny testing framework, extending the recently published IARC task switching framework, that despite being trivial to implement offers suitable complexity to be non-trivial to learn for small scale transformer models with a few million parameters or less. Beyond model benchmarking, the framework is also suitable for probing phenomena relevant to problems arising in physics of AI, where controlled, interpretable testbeds are essential. The proposed training and evaluation scheme relies on integer sequences to be predicted by the model. These integer sequences are generated by simple deterministic tasks designed to abstract typical challenges arising in natural language processing, such as short and long range correlations, or context awareness. Within the sequences, tasks are randomly switched, where a switch is indicated by a control token. An important quality of LLMs is the ability to generalize at inference time. We here extend the existing task switching framework with new tasks able to probe models generalization capacities in a tiny, yet meaningful manner.

Keywords: transformer; task switching framework; generalization; tiny testing frameworks; physics of AI

Bereiche | Tage | Auswahl | Suche | Aktualisierungen | Downloads | Hilfe

DY: Fachverband Dynamik und Statistische Physik

DY 45: Focus Session: Physics of AI – Part I (joint session SOE/DY)

DY 45.6: Vortrag

Donnerstag, 12. März 2026, 10:30–10:45, GÖR/0226