• Talk by Philippe Rigollet [youtube]
  • $x_{k+1} = x_k + \sigma(W_k x_k + b_k)$
  • $\dot{x}(t) = \sigma(W_t x(t) + b_t)$
  • NN = Flow Map
  • Transformers: probability measure maps from tokens to tokens
  • $f_\theta: \mu(0) \mapsto \mu(T)$, or $\mathcal{P}(\mathbb{R}^d) \mapsto \mathcal{P}(\mathbb{R}^d)$
  • $\mu(0) = \frac{1}{n} \sum_{i=1}^n \delta_{x_i}$
  • Mean-field interacting particle system
  • $\dot{x}_i(t) = X_t\mu(t), i=1, \dots, n$