- Talk by Philippe Rigollet [youtube]
- $x_{k+1} = x_k + \sigma(W_k x_k + b_k)$
- $\dot{x}(t) = \sigma(W_t x(t) + b_t)$
- NN = Flow Map
- Transformers: probability measure maps from tokens to tokens
- $f_\theta: \mu(0) \mapsto \mu(T)$, or $\mathcal{P}(\mathbb{R}^d) \mapsto \mathcal{P}(\mathbb{R}^d)$
- $\mu(0) = \frac{1}{n} \sum_{i=1}^n \delta_{x_i}$
- Mean-field interacting particle system
- $\dot{x}_i(t) = X_t\mu(t), i=1, \dots, n$