DPG Phi
Verhandlungen
Verhandlungen
DPG

Dresden 2020 – scientific programme

The DPG Spring Meeting in Dresden had to be cancelled! Read more ...

Parts | Days | Selection | Search | Updates | Downloads | Help

SOE: Fachverband Physik sozio-ökonomischer Systeme

SOE 16: Evolutionary Game Theory and Networks (joint SOE/DY/BP)

SOE 16.3: Talk

Thursday, March 19, 2020, 15:30–15:45, GÖR 226

Reinforcement learning dynamics in the infinite memory limit — •Wolfram Barfuss — Max Planck Institute for Mathematics in the Sciences, Leipzig

Reinforcement learning algorithms have been shown to converge to the classic replicator dynamics of evolutionary game theory, which describe the evolutionary process in the limit of an infinite population. However, it is not clear how to interpret these dynamics from the perspective of a learning agent. In this work we propose a data-inefficient batch-learning algorithm for temporal difference Q learning and show that it converges to a recently proposed deterministic limit of temporal difference reinforcement learning. In a second step, we state a data-efficient learning algorithm, that uses a form of experience replay, and show that it retains core features of the batch learning algorithm. Thus, we propose an agent-interpretation for the learning dynamics: What is the infinite population limit of evolutionary dynamics is the infinite memory limit of learning dynamics.

100% | Mobile Layout | Deutsche Version | Contact/Imprint/Privacy
DPG-Physik > DPG-Verhandlungen > 2020 > Dresden