I'm just in the middle of an excellent book called *Beyond Significance Testing*. It's one of an increasing number of works to point out the flaws in both the fundamental theory and conventional usage of the "classic" significance test.

Let's have a quick look at how significance testing works. Imagine we want to see if there is a significant difference between the Satisfaction Index for men and women in the situation below:

How do we know if it's significant? The first thing is to work out what question we're asking, which is **"does this difference exist in the population as well as in the sample?"**. Then we create a *null hypothesis*, which is that there is no difference between men and women. Finally we conduct a hypothesis test, a t-test in this case, which tells us the chances of finding a difference as big as this in the sample assuming the null hypothesis (that there's no difference in the population). This value, the *p. value*, conventionally needs to be less than 5% for us to reject the null hypothesis...telling us that there is very unlikely to be no difference between men and women in the population (note that it doesn't tell us * how big* the difference is!). Clear as mud? I thought so.

So if this approach has flaws, which it does, what should we do instead? The alternative is to look at **effect size** and **margin of error**. In other words, instead of asking our stats package for an obscure p. value, let's ask it how big the difference is likely to be in the population...which is what we're really interested in.

One quick way to do this is to plot the margin of error, or confidence interval, of the Satisfaction Indexes straight onto the chart. This has a number of advantages: it is much easier to explain and understand, it illustrates the uncertainty of the measurement, and it captures the size of the difference rather than a simple "significant" or "not significant".

A better test of a specific question like this is to work out the margin of error of the difference, which in this case is 3.1 ±1.8. In other words we can conclude with 95% confidence that the difference between men and women in the population is **between 1.3 and 4.9**.

Much more powerful, and much easier to understand, than null hypothesis significance testing isn't it?

## Comments