Problems
In all of the following, we suppose that
Model Estimation
The most ambitious goal is to estimate the parameters associated with the distribution. This can often be accomplished by minimizing the log loss associated with the conditional distribution
Edge recovery
Deduce the graph encoding the conditional independence relationships among features.
Ising models: Signed edge recovery
Take the ising model. Deduce not just the edge, but also the sign of the correlation/factor for interaciton between the nodes. Eg: Maybe want to deduce relationship voting patterns of legislators.
High dimensional case
Often have few (f(d)) samples from high-dimensional (d-dim) data, so
Measuring performance
Redefine statistical consistency: let both
Approaches
Learn closest tree
(Chow, Liu).
Aim
Here, the aim is to learn a tree structured graphical model which is closest to the underlying distribution in terms of KL divergence.
Algorithm
Construct a complete graph, where each edge is labeled with the mutual information estimated from the given sample. Then, run the minimum spanning tree algorithm over this graph.
Learn neighborhoods
For ising models
Consider for each node the conditional probability distributions
Results
For signed edge recovery, in case the distribution satisifies certain special conditions, signed edge recovery and good parameter estimation are guaranteed. The l1 regularization ensures that
Otherwise,
Caveats
However, experiments suggest that picking the right choices of
Analysis technique
To prove the guarantees, one considers the optimality criteria of the convex optimization problem mentioned earlier, which says
For discrete graphical models
In this case, each edge is associated with
This is multi-class logistic regression with l1/l2 regularization, which is a convex program. The same caveats and advantages as earlier apply.