Decision trees

Decision lists

Decision list is fully specified by a sequence of k variables (xi) and outputs r(xi),r(xk). It is like an ‘if .. elseif .. elseif .. else’ statement. It can be visualised as a chain of variables, each with one outgoing edges representing an output.

In a d-decision list, dCNF’s are used in place of {xi}.

Generality

This can be writ as halfspace: sign(2kixio(xi)).

We can write conjunctions, disjunctions as decision lists.

Decision trees

DAG with internal nodes labelled with variables; leaves are labelled 0 or 1 (range of the ’label’ variable to be predicted). A decision tree is same as a nested ‘if .. else ..’ statement.

Power

They include decision lists, but are strictly more powerful: can represent x1x2 as a decision tree.

Evaluation and interpretation

Apart form the overall classification error, one can consider classification errors particular to various decision paths. The leaves of the decision trees, which represent a decision path can be labeled with the error rate observed for that particular path.

t augmented Decision tree

Decision tree with t-DNF’s at leaves. t augmented Decision tree of rank r reducible to t + r + 1 decision list.

Rank of decision trees

Rank of a leaf variable is 1. Rank of decision tree (not max depth) = Max (ranks of kids) if kids have diff ranks; else rank of kid + 1. So, ranklog(nodes).

Number of decision trees

Number of Decision trees = 22n. Number of poly size Decision trees probably same as number of polysize DNF’s: they’re interchangeable \chk.

Rank r decision tree to r+1 decision list

Easy decision list for Rank 1 subtree; Take node x with kid subtrees T1 and T2, append x¯ to conjunctions of DL of T1 and x to conjunctions of DL of T2, join them.

Decision tree to polysize DNF f

Taking of AND’s corresponding to paths to leaves with label 1; only one of the terms can be true at a time. So, f = AND terms, with \(\norm{AND(..)}{1} \leq 1\). So, \(\norm{f}{1}\leq t\).

Sparsity

As f1t, Decision tree with t leaves has approximator with sparsity O(t).