Mean and median. One of the most basic tasks in statistics is to summarize a set of observations by a single number. Two popular choices for this summary statistic are:
• The median, which we’ll call
• The mean, which we’ll call
(a) Show that the median is the value of that minimizes the function
You can assume for simplicity that is odd. (Hint: Show that for any , the function decreases if you move either slightly to the left or slightly to the right.)
(b) Show that the mean is the value of that minimizes the function
One way to do this is by calculus. Another method is to prove that for any ,
Notice how the function for penalizes points that are far from much more heavily than the function for . Thus tries much harder to be close to all the observations. This might sound like a good thing at some level, but it is statistically undesirable because just a few outliers can severely throw off the estimate of . It is therefore sometimes said that is a more robust estimator than . Worse than either of them, however, is , the value of that minimizes the function
(c) Show that can be computed in O(n) time (assuming the numbers are small enough that basic arithmetic operations on them take unit time).