Classic Theory

Blum’s seminal paper: Combining labeled and unlabeled data for co-training, COLT 1998.

Modern Theory

Self-training Avoids Using Spurious Features Under Domain Shift, NIPS 2020.
Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data, ICLR 2021.

Theoretical assumptions.

(Expansion) Informal: Data distribution has good continuity within each class; letting $P_i$ be the distribution of data conditioned on class $i$, expansion states that for small subsets $S$ of class $i$, $P_i(\text{neighborhood of }S) \geq c P_i (S)$, where $c > 1$ is the expansion factor.

(Allowable Transformation around $x$)
$\mathcal{B}(x) = \{ x': \exist T \in \mathcal{T}, \text{ s.t. } \vert\vert x' - T(x) \vert\vert \leq r \}$, where $\mathcal{T}$ is the set of possible transformations.

(Neighborhood of $x$) $\mathcal{N}(x) = \{ x': \mathcal{B}(x) \cap \mathcal{B}(x') \neq \empty \}$.

(Neighborhood of set $S$) $\mathcal{N}(S) = \cup_{x \in S} \mathcal{N}(x)$.

($(a, c)$-expansion) Formal: Class-conditional distribution $P_i$ satisfies $(a, c)$-expansion if for all $V \subseteq \mathcal{X}$ with $P_i(V) \leq a$, the following holds: $P_i(\mathcal{N}(V)) \geq \text{min} \{ c P_i(V), 1 \}$; if $P_i$ satisfies $(a, c)$-expansion for all $\forall i \in [K]$, then we say $P$ satisfies $(a, c)$-expansion.

— “Expansion says that the manifold of each class has sufficient connectivity, as every subset $S$ has a neighborhood larger than $S$”.

(Assumption, $P$ expands on sets smaller than $\mathcal{M}(G_{pl})$).