Review different samplers, e.g. Gibbs, Collapse, MCMC, for parameter inference.

LDA

The Model - The Generative Story

$$ \begin{align*} & p(\beta_{1:K}, \theta_{1:D}, z_{1:D}, W_{1:D}) \\ =& \underbrace{\Pi_{i=1}^{K} p(\beta_i)}{\text{topic priors}} \Pi{d=1}^{D} p(\theta_d) ( \Pi_{n=1}^{N} \underbrace{p(z_{d,n} \vert \theta_d)}{\text{topic assignment}} p(w{d,n} \vert \underbrace{\beta_{1:K}, z_{d,n}}{z{d,n}\text{ picks a topic from }\beta_{1:K}}) ) \end{align*} $$

“Notice that this distribution specifies a number of dependencies. For example, the topic assignment $z_{d,n}$ depends on the per-document topic proportions $\theta_d$. As another example, the observed word $w_{d,n}$ depends on the topic assignment $z_{d,n}$ and all of the topics $\beta_{1:K}$. (Operationally, that term is defined by looking up as to which topic $z_{d,n}$ refers to and looking up the probability of the word $w_{d,n}$ within that topic.)”

Posterior Computation

$$ p(\beta_{1:K}, \theta_{1:D}, z_{1:D} \vert w_{1:D}) = \frac{p(\beta_{1:K}, \theta_{1:D}, z_{1:D}, w_{1:D})}{\underbrace{p(w_{1:D})}_{\text{evidence}}} $$

“Topic modelling algorithms form an approximation of [the above equation] by adapting an alternative distribution over the latent topic structure to be close to the true posterior”

Reference

[PTM2012] Probabilistic Topic Model, Communications of the ACM, David M. Blei.

According to [PTM].