Assume points belonging to the same class cluster together, using unlabeled data with labeled data, you can draw better decision boundaries around clusters belonging to various class: probability density is important in classification.
Generative approaches
\tbc
Label propagation on graphs
Make the similarity graph G=(V, E) of the n points, \(\set{X_i}\). Some nodes \(X_T\) have labels, others don’t have labels; let \(k\) be the indicator vector with \(k_i = [i \in T]\); take K = diag(k).
Quadratic criterion
Binary labels
Denote proposed labels of the n points as bits in a binary vector f. Let y be the label vector, with \(y_T = c(X_T)\) and \(y_{V - T}\) having arbitrary values.
\(\min_{f \in \set{0, 1}^{n}} \sum_{(i, j) \in E}W_{i, j}(f_i - f_j)^{2} + l(f-y)^{T}K(f-y)\). This finds an f which is smooth in that \(f_i\) and \(f_j\) are similar if they are connected by an edge in G, but also which agrees with \(y_T\) as much as possible.
Discrete labels
Denote the \(L\) possible labellings thus: \(\set{e_i}\): as L-dim indicator vectors. Denote labels of the n points as columns in the binary matrix F, where \(\forall i : F_{:,i} \in \set{e_j}\). Let Y be the label matrix of similar dimensions, with columns \(Y_{:, T} = c(X_T)\).
\(\min_{F_{:, i} \in \set{0, 1}^{n}} \sum_{(i, j) \in E}W_{i, j}(F_{:,i} - F_{:,j})^{2} + l\sum_i (F_{:,i}-Y_{:,i})^{T}K(F_{:,i}-Y_{:,i})\). This is akin to solving \(L\) binary label propogation problems simultaneously: propogating 1 label bit at a time. +++(Note that the binary case is a special case of this.)+++
+++(Combinatorial hardness in both cases!)+++
Rewriting using Graph Laplacians
The objectives are equivalent to \(\min f^{T}Lf + l (f-y)^{T}K(f-y)\) and \ \(\min tr(F^{T}LF) + l tr((F-Y)^{T}K(F-Y))\). \(\sum_{(i, j) \in E}W_{i, j}(f_i - f_j)^{2} = f^{T}Lf\) is shown in graph theory ref; and the latter equivalence is easily obtained by using this as a component.
Using the normalized graph laplacian \(N\), get the objective \(\min tr(F^{T}NF) + l tr((F-Y)^{T}K(F-Y))\). In graph theory ref, see that \(f^{T}Nf\) is a normalized smoothness measure.
Real relaxation: solve linear system of eqns
In both the binary and the general discrete case, drop the constraints \(f \in \set{0, 1}^{n}, F_{:, i} \in \set{0, 1}^{n}\). Allow any real value; solve this relaxed problem to get f or F, and get to the closest binary f’ and F'.
From optimality conditions, the solution is \((L+K)F = KY\).
Low rank approximation for fast solution
(L+K) is \(n \times n\), and solving this system of equations is \(O(n^{3})\) naively. Instead, can use low rank factorization \(TT^{*}\) where \(T \in R^{n \times k}\) to solve this in \(O(nk^{2})\).
But, finding low rank approximation requires finding sv (same as finding ev for symmetric matrices), which is \(O(kn^{2})\). Can surpass this using eigenfunctions. \why