Statistical significance, power and sample size â ... - Wiley Online Library

Jun 4, 2008 - Statistical significance, power and sample size – what does it all mean? STATISTICAL tests are commonly performed and reported in veterinary ...
46KB taille 6 téléchargements 43 vues
EDITORIAL

Statistical significance, power and sample size – what does it all mean? STATISTICAL tests are commonly performed and reported in veterinary research. A review of 14 issues of JSAP (January 2007 to February 2008) shows that approximately 75 per cent of papers included P-values when reporting the ‘‘significance’’ or otherwise of results obtained. Therefore, it is important that readers are able to interpret P-values, understand the concept of statistical significance and appreciate the role that sample size and statistical power play in the interpretation of the results of a study. It sometimes seems that demonstration of ‘‘statistical significance’’ has become paramount in the minds of many researchers; indeed this is sometimes deemed by authors, reviewers and editors alike as a prerequisite to publication. As a corollary, some readers may perceive that demonstration of a statistically significant difference, for example in the effect of two drugs, is the same as demonstrating the existence of an important difference, and that failure to demonstrate a significant difference indicates that no difference exists. Such conclusions are misleading and often incorrect. Let us consider the example of an experiment comparing the effect of two anaesthetic agents on systolic arterial blood pressure (SABP). To do this, drug A was administered to one group of dogs and drug B to the same number of dogs in a different group. Even in a group of normal animals, we would expect results to vary from dog to dog. Hence, if the animals are divided into two groups, we may also expect to find that the average blood pressure is a little different between the two groups. For example, we may find that, by chance, more of the dogs with a higher measurement were put into group A. When trying to examine the effects of the drugs the question then arises; is the difference in the average value between the groups due to the different treatment, or could the difference be due to chance uneven allocation of the dogs to the two groups? When considering the results of the study, we should ask: (a) Is the observed difference between the groups real or could it be a chance finding? (b) If the difference is real, is it large enough to be important? Statistical tests can only help directly with the first of these questions; interpretation of the clinical importance of the results (question b) requires clinical/biological judgement, and cannot be assessed solely through statistical testing. Generally, scientists frame their research questions in terms of a null hypothesis (i.e. the hypothesis that there is no difference between the groups or treatments). When making decisions about whether to reject a null hypothesis, we can make two types of error (Table 1): d

d

declaring that there is a difference between the groups, when in fact there is no difference (rejecting the null hypothesis when it is in fact true; type I error), and deciding there is no difference, when in fact a difference exists (type II error).

In our study of anaesthetic agents, the null hypothesis would be that ‘‘drugs A and B produce no difference in average SABP’’ Journal of Small Animal Practice

Table 1. Possible outcomes from testing experimental hypotheses Result of statistical test

Reject null hypothesis

Do not reject null hypothesis

Reality Null hypothesis is true Type I error Probability ¼ significance level Correct conclusion

Null hypothesis is false Correct conclusion Probability ¼ power

Type II error Probability ¼ 1-power

(i.e. the average SABP for group A – average SABP for group B ¼ 0 mmHg). Statistical tests are then used to determine the probability of finding a difference at least as great as that observed in the study, assuming this null hypothesis is true. This is expressed in the P-value (or significance level). For example, in our experiment the difference in average SABP between groups receiving drug A and B was 15 mmHg and the P-value was 0.05. This tells us that, if there truly is no difference between the effects of drugs A and B (i.e. the null hypothesis is true) and the experiment was repeated 100 times, we would expect to find a difference of 15 mmHg or more in five of the experiments; the other 95 times the difference would be less than 15 mmHg. As a convention, scientists often consider a probability of 5 per cent (P¼0.05) sufficiently low to conclude that the observed results would be quite unusual if the null hypothesis was true. Hence in this case it is reasonable to believe that the treatments are NOT the same. If, instead the P-value was 0.2, then we might expect to find the observed difference in 20 per cent of trials; that is, the observed difference is not particularly unusual and we would typically conclude that there is no reason to reject the null hypothesis. Note that this is not the same as concluding that the treatments ARE the same. Failure to reject the null hypothesis can occur because the treatments really are the same, or because our experiment has simply failed to detect a real difference. The latter often occurs when a study has a small sample size, and hence insufficient statistical power. The next editorial in this series will look at the P-value and its use in interpretation of study data. R. M. Christley Epidemiology Group, Faculty of Veterinary Science, University of Liverpool Rob Christley graduated from the University of Sydney in 1991 and completed an internship there the following year. After a period in practice in Australia and the UK he returned to the University of Sydney where he completed a residency in Equine Medicine, a Masters degree and a PhD in the Epidemiology of Equine Airway Disease. In 1999 he joined the University of Glasgow and since 2002 has worked in the Epidemiology group at the University of Liverpool.

 Vol 49  June 2008  Ó 2008 British Small Animal Veterinary Association

263