I’ve been doing some statistical measurements lately (more to follow). It occurs to me that while most people measure the mean of a set of measurements, the median is more useful.
If the distribution is Gaussian, the mean and median are equal.
(Mean is defined as where
Many times in engineering and process control, we keep track of the mean and standard deviation. One of the reasons is that if the thing we’re trying to control is Gaussian, the mean/median and standard deviation give us good design criteria to minimize failure: if we allow our system to tolerate
However, we can generalize this: if we wanted to be more lax, we could only design (or require) the system to tolerate
However, what if the distribution is bimodal? Take for example, two modes of operation (each more or less Gaussian):
Due to the asymmetric distribution, the mean and median are now not the same. In this case, we could posit that some secondary mode (or external factor) causes that second hump. Let’s call the main hump the primary mode and the smaller hump the secondary mode. If things are behaving “normally” we get the first hump, but some failure or aberration causes the second hump.
However, what if the system was more sensitive to this failure (secondary mode). Then, we’d see something like:
Notice what happened? The median stayed exactly the same. However the mean mislabeled “average”) moved proportionally to that secondary hump. Incidentally, the standard deviation (
The question you’re probably asking is “what’s so bad about that”? Well, if you’re computing six-sigma-like design criteria, you’re taking
The nice thing about picking the median as the average is that it doesn’t depend on the magnitude of the secondary mode—only on the probability of the secondary mode. The magnitude of failure impacts the standard deviation. I like to view these (median and standard deviation) sas two independent metrics that tell different stories.
Another thing to note is that one could view the 2nd illustration above as an input to a nonlinear amplifier (for example) and the 3rd illustration as the output. That’s another nice thing about the median: it commutes with a monotonic nonlinearity. That is, if