Math 19b, Linear Algebra, Probability and Statistics, Spring, 2011

On the use of Error bars

The following article of Geoff Cumming,Fiona Fidler,1 and David L. Vaux: Error bars in experimental biology, Journal of Cell biology, 177, 1, 2007 7-11, has momentarily the second highest readership in Mendeley: it addresses the meaning of error bars in scientific papers. It mentions the following rules:

Important terms which appear and are explained: The standard deviation SD and the standard error SE (which is SD divided by the square root of the number of experiments). An other important quantity in scientific experiments is the P-value: Assume we have a random variable X over some probability space and we measure X=c. The question is, whether this experiment is significant or not assuming a null-hypothesis (which stands for the setup of our probability space). The P-value of the experiment is defined as
  p = P[ X > c] 
By convention (note this is arbitrary and therefore a bit controversial), one calls p smaller than 0.05 a statistically significant result and a P-value smaller than 0.01 a highly significant result. For example, if you see 10 times head when throwing a coin 30 times. What is the P value? If X is the number of heads, then P[X smaller or equal to 10] = F[10], here computed with Mathematica
f=CDF[BinomialDistribution[30,0.5]]; f[10]
is 0.0493. This is considered statistically significant. With this assumptions we would considered it a significant test that the null hypothesis (the coin is fair and different coin experiments are independent) is rejected. However, if see 11 heads, then the P-value is slightly larger than 0.1 and the test is not significant. You see how easy it is to cheat here: Just repeat your coin flipping experiment a lot until you reach an instance with a statistically significant deviation result. This will eventually happen. Suppress the other experiments as "test trials" and publish the result.