{"id":807,"date":"2009-12-22T00:35:59","date_gmt":"2009-12-22T05:35:59","guid":{"rendered":"https:\/\/www.circuitdesign.info\/blog\/2009\/12\/median-vs-mean\/"},"modified":"2022-12-14T00:14:52","modified_gmt":"2022-12-14T06:14:52","slug":"median-vs-mean","status":"publish","type":"post","link":"https:\/\/www.circuitdesign.info\/blog\/2009\/12\/median-vs-mean\/","title":{"rendered":"Median vs Mean"},"content":{"rendered":"<p>I\u2019ve been doing some statistical measurements lately (more to follow). It occurs to me that while most people measure the mean of a set of measurements, the median is more useful.<\/p>\n<p><!--more-->If the distribution is Gaussian, the mean and median are equal.<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151a.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-801\" src=\"https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151a-300x154.jpg\" alt=\"\" width=\"300\" height=\"154\" srcset=\"https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151a-300x154.jpg 300w, https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151a.jpg 640w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>(Mean is defined as $\\mu_X = \\int X p(X) dX$ where $p(X)$ is the probability distribution function (PDF) of $X$\u2014that is, it\u2019s a average of X, weighted with the probability density of $X$. The median defined as $P ( X &lt; \\mu_{1\/2} ) = 1\/2$\u2014that is, the point where $X$ is equally likely to be lower than or greater than (50% probability).)<\/p>\n<p>Many times in engineering and process control, we keep track of the mean and standard deviation. One of the reasons is that if the thing we\u2019re trying to control is Gaussian, the mean\/median and standard deviation give us good design criteria to minimize failure: if we allow our system to tolerate $\\pm 3$ standard deviations ($6 \\sigma$) around the mean\/median, then it has a 99.7% chance of success (0.3% chance of failure).<\/p>\n<p>However, we can generalize this: if we wanted to be more lax, we could only design (or require) the system to tolerate $\\pm 2$ standard deviations (4.5% failure). In some cases, systems are designed to tolerate $\\pm 4$ standard deviations (0.006% failure). So, one can design the system to tolerate $\\mu_(1\/2) \\pm k \\sigma$, where $k$ is some factor (3, 2, 4 for example) that determines the probability of failure.<\/p>\n<p>However, what if the distribution is bimodal? Take for example, two modes of operation (each more or less Gaussian):<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151b.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-803\" src=\"https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151b-300x162.jpg\" alt=\"\" width=\"300\" height=\"162\" srcset=\"https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151b-300x162.jpg 300w, https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151b.jpg 640w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><!--more--><\/p>\n<p>Due to the asymmetric distribution, the mean and median are now not the same. In this case, we could posit that some secondary mode (or external factor) causes that second hump. Let\u2019s call the main hump the primary mode and the smaller hump the secondary mode. If things are behaving \u201cnormally\u201d we get the first hump, but some failure or aberration causes the second hump.<\/p>\n<p>However, what if the system was more sensitive to this failure (secondary mode). Then, we\u2019d see something like:<a href=\"https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151c.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-805 aligncenter\" src=\"https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151c-300x153.jpg\" alt=\"\" width=\"300\" height=\"153\" srcset=\"https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151c-300x153.jpg 300w, https:\/\/www.circuitdesign.info\/blog\/wp-content\/uploads\/2009\/12\/scan0151c.jpg 640w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a>Notice what happened? The median stayed exactly the same. However the mean mislabeled \u201caverage\u201d) moved proportionally to that secondary hump. Incidentally, the standard deviation ($\\sigma$) also moved proportionally to the distance between the two humps\u2014but let\u2019s focus on the fact that the mean just changed.<\/p>\n<p>The question you\u2019re probably asking is \u201cwhat\u2019s so bad about that\u201d? Well, if you\u2019re computing six-sigma-like design criteria, you\u2019re taking $\\mu \\pm k \\sigma$. Recall, however, that we could pick any factor $k$ depending on the probability of failure we want (I should say want to avoid). So, when both the average and the standard deviation change, how can we be sure we\u2019re getting the right value for $\\mu + k \\sigma$?<\/p>\n<p>The nice thing about picking the median as the average is that it doesn\u2019t depend on the magnitude of the secondary mode\u2014only on the probability of the secondary mode. The magnitude of failure impacts the standard deviation. I like to view these (median and standard deviation) as two independent metrics that tell different stories.<\/p>\n<p>Another thing to note is that one could view the 2nd illustration above as an input to a nonlinear amplifier (for example) and the 3rd illustration as the output. That\u2019s another nice thing about the median: it commutes with a monotonic nonlinearity. That is, if $f$ is monotonic, and $Y = f(X)$, then $\\mu_{1\/2,Y} = f(\\mu_{1\/2,X})$. So, we don\u2019t have to worry so much that we\u2019re measuring the correct independent variable. Our median will give us the same information (albeit in a different, nonlinear domain).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I\u2019ve been doing some statistical measurements lately (more to follow). It occurs to me that while most people measure the mean of a set of measurements, the median is more useful.<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[3,107,5],"tags":[153,152],"class_list":["post-807","post","type-post","status-publish","format-standard","hentry","category-analog-pro","category-digital-professional","category-software","tag-five-nines","tag-six-sigma"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/poCEy-d1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.circuitdesign.info\/blog\/wp-json\/wp\/v2\/posts\/807","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.circuitdesign.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.circuitdesign.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.circuitdesign.info\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.circuitdesign.info\/blog\/wp-json\/wp\/v2\/comments?post=807"}],"version-history":[{"count":32,"href":"https:\/\/www.circuitdesign.info\/blog\/wp-json\/wp\/v2\/posts\/807\/revisions"}],"predecessor-version":[{"id":1221,"href":"https:\/\/www.circuitdesign.info\/blog\/wp-json\/wp\/v2\/posts\/807\/revisions\/1221"}],"wp:attachment":[{"href":"https:\/\/www.circuitdesign.info\/blog\/wp-json\/wp\/v2\/media?parent=807"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.circuitdesign.info\/blog\/wp-json\/wp\/v2\/categories?post=807"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.circuitdesign.info\/blog\/wp-json\/wp\/v2\/tags?post=807"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}