Carrying out a hypothesis test often causes confusion. Here’s how it works.

Some hypothesis tests start with a known fact, such as “25% of patients treated for a particular disease will suffer side effects.” A drug company may then claim that “a new treatment reduces the number of patients suffering side effects.” The original figure, the status quo, is known as the “null hypothesis” and given the symbol H0. The new claim is called the “alternative hypothesis” and given the symbol H1. So,

H0: P(side effects) = 0.25
H1: P(side effects) < 0.25

The test is then performed by trying the new treatment on a sample of patients and seeing how many suffer side effects. The conclusions which can be drawn depend on the numbers. For example, suppose the new treatment is tested on 100 patients and 24 suffer side effects – do you think that’s enough evidence to justify the claim? Presumably not. Where you draw the line – “the critical value” – depends on how sure you want to be. We’ll do the sums in a moment.

Here are other situations where hypothesis testing is valid:

A candidate for the Orange Party claims that he has 35% support. A rival thinks it is less than that.

A group of Year 8 children has a mean IQ of 112. It is claimed that a new diet can increase IQ within a month.

A paint manufacturer claims that a new quick drying paint will dry completely in 30 minutes. A customer doesn’t believe this is true.

There are many different types of hypothesis test, but the key part of the test is generally a probability calculation. Let’s go back to the original problem where we are dealing with a binomial probability distribution. What we are looking for is the “unlikely region”, the part of the distribution which is very unlikely to happen by chance alone. A “5% significance level” leads us to find the range of results with less that a 5% chance of happening. In other words, if you took 100 samples, only 5 would show a result in this region. Here’s part of a table of cumulative binomial probabilities, where X ~ B(100, 0.25).

So we can see that there is a 6.3% probability of a result in the range 0 – 18, but only 3.8% for a result between 0 and 17. Thus 17 is called the “critical value”, and 0 – 17 the “critical region.” If, let’s say, only 15 patients in the sample showed side effects, we would say that this is a “significant” result, and there is evidence to reject H0 – it looks as if the new treatment is effective. But if 19 patients showed side effects, then a result at this level could have happened by chance alone, so accept H0. We could be more certain of the conclusion by going for a 1% significance level, and then the critical value would be 14. But the lower the significance level, the higher the chance that we will accept H0 when, in fact, the treatment is actually effective.

Other hypothesis tests may use normal distributions, chi-squared testing,
t-tests, paired sample tests… but the general principle is always the same.