Structure analysis and learning; Unsupervised Dialogue State Tracking; Unsupervised end-to-end dialogue etc.

Some abbreviations:

ToD: Task-oriented Dialogue.

Contents.

Categorisation

In this part, I will survey several papers and classify them into sub-topics. The summary is brief and mainly focuses on method perspective.

Structure Analysis/Learning

Classics (Graphical Models)

**Unsupervised Modeling of Twitter Conversations, NAACL 2010.**
- Intent Clustering and Intent Transition Induction

Motivations & Intro.

“We propose the first unsupervised approach to the problem of modeling dialogue acts in an open domain”

“Automatic detection of dialogue structure is an important first step toward deep understanding of human conversations”

“Dialogue act tagging has traditionally followed an annotate-train-test paradigm, which begins with the design of annotation guidelines, followed by collection and labeling of corpora”

Research subject: dialogue act (speech act).

Clustering raw utterances of noisy twitter posts; the learned model can be used to induce discourse structure especially for dialogues.

Evaluation: 1) visualisation; 2) intrinsic task of conversation ordering.

Methods.

The authors propose a generative model of conversations.

“Our base model structure is inspired by the content model proposed by Barzilay and Lee (2004) for multi-document summarization.”

Proposed models: 0) EM Conversation Model; 1) Conversation+Topic Model; 2) Bayesian Conversation Model.

0) Conversation Model

Sentence-level HMM, “Each conversation $C$ is a sequence of acts $a$, and each act produces a post, represented by a bag of words shown using the $W$ plates”.

$$ \begin{align*} P(W_0, W_1, W_2) &= \sum_{a_0, a_1, a_2} P(a_0, a_1, a_2) P(W_0 \vert a_0) P(W_1 \vert a_1) P(W_2 \vert a_2) \\ &= \sum_{\textbf{a}} \Pi_{t=0}^{2} P(a_t \vert a_{t-1}) P(W_t \vert a_t) \end{align*} $$

where each sentence $W_t$ is independently generated conditioned on a dialogue act $a_t$. The parameters of this model are: 1) dialogue act transition matrix $\theta_{i, j}$ that transits from act $i$ to act $j$; 2) $n$ ~~conditional 2-gram language model~~ $\eta_{w \vert w'; a}$ bag-of-word models, or 1-gram model $\eta_{w; a}$. Here $n$ is the number of dialogue act.

EM algorithm is used to learn this model, “Starting with a random assignment of acts, we train our conversation model using EM, with forward-backward providing act distributions during the expectation step.”.