Mathematics of Data Science Seminar - Timur Akhunov

Tue, March 10, 2026

3:00 pm - 4:00 pm

Math Tower (MW) 154

Timur Akhunov
Wabash College

Title
Hypoelliptic PDE-based characterization of Thompson sampling

Abstract
The multi-armed bandit is a foundational problem in reinforcement learning with extensive industrial applications. Named after a gambler seeking the most rewarding slot machine, it epitomizes the exploration-exploitation tradeoff. Thompson sampling is a classic strategy to navigate this tradeoff and minimize the selections of suboptimal arms (regret) over the long run. In classical applications, expected rewards of the arms differ by a fixed, if unknown, amount. In recent problems motivated by the large samples used in machine learning applications, however, such rewards diminish over time, leaving few foundational results known.

While the bandit games are inherently discrete, in joint work with Vlad Kobzar, we have characterized the continuous-time limit of this problem by a degenerate parabolic PDE. We verified that this PDE satisfies Hörmander's bracket condition and is hypoelliptic. Is hypoellipticity enough to guarantee regret bounds for our gambler? We will share known results and open questions from our work in progress.

For More Information About the Seminar

Filed in:

Seminar-Mathematics of Data Science

Mathematics of Data Science Seminar - Timur Akhunov

Timur AkhunovWabash College

Filed in:

Timur Akhunov
Wabash College