Hypothesis testing

Model selection given 2 models

Aka Confirmatory data analysis: Test hypotheses, as against Exploratory data analysis: Find hypotheses worth testing.

Which process is more likely to have generated the data? Which model is better at explaining the observations? Model selection, with only 2 models.

Hypotheses

Null hypothesis

H0:t=t0 or tt0

Alternate hypothesis

Ha; can be 1 sided like t>ta or 2 sided: tt0 or |tt0|k.

The decision

So, you decide if parameter tT1 or if tT2.

Experiment/ Test

Pick sample; find value of estimate test statistic t^; accept Ha/ reject H0 if |t^t0|>|tt0|; fail to reject H0 otherwise. Critical value t’ defines H0 rejection region. Visualize as shaded area under t^ pdf curve. So, you always do hypothesis testing assuming H0 is true.

Errors

Type 1

Erroneously accept Ha: α=Pr(t^>t|t=t0). Say α(=0.05?) level of significance.

Type 2

Erroneously fail to reject H0: β=Pr(t^t|t=ta).

Tradeoff

Trying to decrease type 1 error involves increasing t’; But that increases type 2 error rate. Visualize error zones with regions in 2 bell curves with means slightly apart.

To simultaneously decrease both, must increase sample size.

In case of XN(μ,σ2),t=μ, can write t^>t as z=t^μσ/n>tμσ/n;zN(0,1). Given μ,σ,α,β, can solve for t’, n using N(0,1) table. For small sample size, can use t distribution.

p-value of the statistic

Given a sample, got t^, for what mint we will reject H0 based on it? The corresponding α is p-value.

Power of a test

Take H0:t=t0;Ha:t=ta, fix t. power(t)=1β(t): ability to detect if Ha is true. power(t0)=α \chk. So, power curve has a minimum at t0 \chk.

Test design

Consider goodness of test with α,β,power(t).

Best test for given \htext{α{alpha}}

(Neyman-Pearson): Testing H0:t=t0,Ha:t=ta. Likelihood ratio test: L=L(t0|t^)L(ta|t^)?h, Pr(Lh)=α. This is the most powerful test.

Difference in differences

Suppose that using experiment A, where one compares hypotheses H1 and N (for null hyp), it is determined that N cannot be dismissed. In experiment B, one compares H2 and N and observe that N can be dismissed. From this, one cannot conclude that, while comparing H2 and H1, H1 can be dismissed: it is possible that the difference in evidence supporting H1 and H2 is small.

One should instead conduct an experiment comparing H1 and H2 directly. This has been a common mistake in medical research as of 2011!