Sparse signal detection

Problem

Generating process

Suppose that the $p$ dimensional ‘observation vector’ $Y$ is generated from $Y_{i} \sim N (θ_{i}, σ^{2}) = θ_{i} + N (0, σ^{2})$ . $σ$ is known.

In addition suppose that $θ$ is sparse. The set $K = {i : θ_{i} \neq 0}$ is called the signal set.

$n$ observations ${Y_{i} = y_{i}}$ are made. Usually $n « p$ .

Decision rule sought

The problem for the decision rule, given $Y_{i}$ (the observation), is to estimate $θ_{i}$ . More simply, one might seek decision rules to estimate the indicator variable $I [θ_{i} \neq 0]$ given $θ_{i}$ .

As a classification problem

This is essentially a classification problem with some peculiarities. It is an abduction problem (the test points are known beforehand), and no labeled training set is provided.

Framing it as a classification problem is a good way to state the final goal, but one can not apply solution ideas typical of classifiers naturally. So, this view is not very informative.

Peculiarities

If the number of signals, $| K |$ were known beforehand, the problem would be trivial: one would just select the top $| K |$ elements of $\hat{E} [Y]$ .

Risk

Identifying non-signals as signals often carries an especially high penalty. Eg: In case of gene-expression data, in response to certain conditions, the expression (ie, signalness) of each gene identified as being a signal is verified using laborious wet-lab experiments.

So, it is often hard to express a formula for evaluating the actual risk of a decision procedure, yet one can make qualitative statements about it. Yet, one can define a simpler risk function and show that a decision procedure chosen using a certain process will be low risk. \tbc

Hypothesis classes

Desired qualities

Sparsity

The main point in modeling $θ_{i}$ is to ensure that the model results in sparse $θ$ : that is $θ_{i}$ should often be close to 0.

Adaptability to different sparsity levels

The hypothesis class should include $θ$ of different sparsity levels.

Robustness to large signals

The hypothesis class should include $θ$ with arbitrarily large components.

Probabilistic models

\tbc

Scale mixture models

Scale mixture models for $θ_{i}$ say: $θ_{i} | λ_{i} = N (0, λ_{i}^{2})$ .