# Thanksgiving

This year we give thanks for a concept that has been particularly useful in recent times: the error bar. (We’ve previously given thanks for the Standard Model Lagrangian, Hubble’s Law, the Spin-Statistics Theorem, conservation of momentum, and effective field theory.)

Error bars are a simple and convenient way to characterize the expected uncertainty in a measurement, or for that matter the expected accuracy of a prediction. In a wide variety of circumstances (though certainly not always), we can characterize uncertainties by a normal distribution — the bell curve made famous by Gauss. Sometimes the measurements are a little bigger than the true value, sometimes they’re a little smaller. The nice thing about a normal distribution is that it is fully specified by just two numbers — the central value, which tells you where it peaks, and the standard deviation, which tells you how wide it is. The simplest way of thinking about an error bar is as our best guess at the standard deviation of what the underlying distribution of our measurement would be if everything were going right. Things might go wrong, of course, and your neutrinos might arrive early; but that’s not the error bar’s fault.

Now, there’s much more going on beneath the hood, as any scientist (or statistician!) worth their salt would be happy to explain. Sometimes the underlying distribution is not expected to be normal. Sometimes there are systematic errors. Are you sure you want the standard deviation, or perhaps the standard error? What are the error bars on your error bars?

While these are important issues, we’re in a holiday mood and aren’t trying to be so picky. What we’re celebrating is not the concept of statistical uncertainty, but the elegant shortcut provided by the concept of the error bar. Sure, many things can be going on, and ultimately we want to be more careful; nevertheless, there’s no question that the ability to sum up our rough degree of precision in a single number is enormously useful. That’s the genius of the error bar: it lets you decide at a glance whether a result is possibly worth believing or not. The power spectrum of the cosmic microwave background is a pretty plot, but it only becomes convincing when we see the error bars. Then you have a right to go, “Aha, I see three peaks there!”

And the error bar isn’t just pretty, it provides some quantitative oomph. An error bar is basically the standard deviation — “sigma,” as the scientists like to call it. So if your distribution really is normal you know that an individual measurement should be within one sigma of the expected value about 68% of the time; within two sigma 95% of the time, and within three sigma 99.7% of the time. So if you’re not within three sigma, you begin to think your expectation was wrong — something fishy is going on. (Like maybe a Nobel-prize-worthy discovery?) Once you’re out at five sigma, you’re outside the 99.9999% range — in normal human experience, that’s pretty unlikely.

Error bars aren’t the last word on statistical significance, they’re the first word. But we can all be thankful that so much meaning can be compressed into one little quantity.

This entry was posted in Science. Bookmark the permalink.

### 15 Responses to Thanksgiving

1. Lab Lemming says:

“(Like maybe a Nobel-prize-worthy discovery?)”

Or, for those of us in the 99%, a stupid technical mistake…

2. Mike says:

I’m giving thanks for the Enlightenment. It’s only been several hundred hears, but the world has changed greatly for the better. Let’s all hope and work for it continuing. It’s the only h0pe we have — short of ET dropping in and making it all right. 😉

3. Chris says:

While we scientists are thankful for the error bar and love everything it tells us, the students we teach despise it and believe it was created just to torment them.

Although I am tempted to put the standard model Lagrangian on their final general physics equation sheet. We have to torment them a little 😛

4. Thomas says:

Dammit Sean, I was about to lazily just plot my means without error bars, now I feel obliged to comply.

5. Georg says:

Hell what science history falsification!
Gauß did not “make it popular” he invented it,
and he did that on purpose.
So it is not “”The nice thing about a normal distribution is that it is fully specified by just two numbers “” but it is a property he looked for when
he was told to supervise the triangulation results of the Hannoverian country.
So, the thanks could be directed to Gauß for not just demanding
one or another length or angle measured once more, he thought
on the problen basically and as ingenious as usual.
Or thank someone in the Hannoverian ministy to gave him the order mentioned
or thank the one who looked for that young Gauß was sent to schools
when he showed signs of some math genious as a boy.

6. What is not obvious to many is the reason why the normal distribution is such a common error distribution. Answer: a histogram of the sums of random numbers approaches the normal distribution as the number of random numbers per sum approaches infinity. For processes in which the total error is the sum of many uncorrelated errors, this is applicable.

7. JW Mason says:

An error bar is basically the standard deviation

I’m curious about the “basically” here. In what sense is the error bar not the (estimated) standard deviation?

a histogram of the sums of random numbers approaches the normal distribution as the number of random numbers per sum approaches infinity. For processes in which the total error is the sum of many uncorrelated errors, this is applicable.

If the underlying distribution has a finite variance, right? Which is not the case for all natural processes.

In the social sciences, the assumption of uncorrelated errors is also a major problem. For you guys in the physical sciences, I guess not so much.

8. Massimo says:

If the underlying distribution has a finite variance, right ?

This is one of the conditions upon which the central limit theorem, in the form that most of us have learned, indeed applies. At the time I was told by the person teaching it, that it is possible to relax the conditions, but the proof becomes rather unwieldy.
Interestingly, one can come to the conclusion that the underlying distribution function is a Gaussian, assuming that mean value and standard deviation are known, without invoking the central limit theorem at all. If one simply adopts a Bayesian viewpoint, the Gaussian is the distribution that maximizes the entropy — see, for instance, D. Sivia, “Data Analysis: A Bayesian Tutorial”.

9. David Brown says:

A group of scientists sitting at a Thanksgiving feast table and saying aloud, “Let us give thanks to whatever there is for error bars, Hubble’s Law, and the Spin Statistics Theorem” might be a plausible cartoon for Gary Larson’s “Far Side”.

10. Mr D says:

I’ll have to disagree.

In X-ray astronomy, I’d go so far as to characterize error bars and simple uncertainties as being a big problem. The underlying distributions are often very covariant or even completely non-Gaussian, but most authors will simply quote symmetrical uncertainties and be done with it, as if that made any sense. Some will be fancy and include asymmetrical uncertainties, as if that made it any better… And the reviewers are apparently fine with this.

I’m done with simple uncertainties, it’s marginalized likelihood contours that we should be publishing, with the underlying MCMC data available somewhere for use in future studies.

11. Indeed. One needs the likelihood as a function of the parameters. These days, there is no excuse for not sticking it on the web: http://www.astro.multivax.de:8000/ceres/data_from_papers/papers.html .

For contours, I plot the smallest contour which encloses, say, 95% of the integrated likelihood, not some percentage of the peak likelihood (I’ve seen this done, with the percentage chosen so that it corresponds to 95% c.l. for a gaussian distribution, even if the data are not gaussian.)