< Back DiffTest Page 3 of 8 Next >
Site highly rated by


Testing for Similarity

If we carry out a sensory difference test and obtain a non-significant result, we have not demonstrated that the tested samples are similar.  We have only failed to demonstrate that they are different – and these conclusions are not equivalent!  (The result is not entirely useless, though.  Had we obtained a significant difference we would have concluded that the samples are not identical, so a non-significant result at least leaves that possibility open, even though it does not show it to be true.) If we want reassurance that samples are similar, the usual kind of significance test will not do the job.

One approach to similarity testing is to take account not only of alpha risk – the risk of mistakenly concluding that there is a difference when really there is none – but also of beta risk – the risk of failing to conclude that there is a difference when there really is one.  This is not just a simple change in statistical procedure – it brings in new problems that are not statistical.  Sufficiently sensitive detection methods will almost always show some difference between samples, so before we can calculate beta risk we must decide how big a difference is needed to be considered 'real'.

In sensory testing, the smallest difference that is not considered negligible is often expressed in terms of the proportion of detectors – assessors who detect the difference.  For instance, some companies treat detection of a difference by any fewer than 20% of consumers as negligible.

Two strawberries Whether this (or any other) value is reasonable or not depends on the nature of the difference that is perceived.  A flavour difference that might be interpreted as toxic contamination of a food product would have to be much less detectable than that to be tolerable!  But even 90% detection of a difference that will be interpreted only as random fluctuation in the flavour of a natural product might be entirely acceptable. Bad cheesecake Good cheesecake

There is no purely statistical prescription for an acceptable level of detectability.  It must come from knowledge of the product and its market.

The approach of specifying beta risk  appeals to statisticians but they already have an excellent intuitive grasp of the inter-relationships among the quantities that have to be taken into account: the alpha risk, the beta risk, the largest negligible difference and the power of the difference test that is used. (The latter is affected by the nature of the task and also by the amount of data – usually, the number of trials.)  Most practical users of statistical methods are less familiar with these relationships and want answers that can more easily be interpreted in their own terms.

Using beta risk also makes it possible to draw up (rather complex) statistical tables that allow the analyst to pre-specify both alpha and beta risk.  However, these tables have the severe disadvantage that if the pre-specified levels are not met by the data the user is unlikely to have much idea of how to modify the pre-defined targets, since there are five variables that might be manipulated in a quest for improvement.

DiffTest aims to provide answers in a form that is more directly related to the way users characterize their data.  It also provides exact probability values rather than just specifying criterion frequencies that the data must meet.  This makes it possible for a user to decide that a risk of 0.051, say, is acceptable, even though it does not correspond to the 0.05 level which is all that the table allows.  Of course using DiffTest does not conflict with controlling beta risk – it just expresses the answers differently. On the Examples page there is an illustration of how DiffTest can be used in a flexible way to seek reassurance.