Chapter 3: Problem 18

The accompanying data are a subset of data read from a graph in the paper "Ladies First? A Field Study of Discrimination in Coffee Shops" (Applied Economics [April, 2008]). The data are the waiting times (in seconds) between ordering and receiving coffee for 19 female customers at a Boston coffee shop. $\begin{array}{llllllll}0 & 80 & 80 & 100 & 100 & 100 & 120 & 120\end{array}$ $\begin{array}{llllllll}120 & 140 & 140 & 150 & 160 & 180 & 200 & 200\end{array}$ $$ \begin{array}{lll} 220 & 240 & 380 \end{array} $$ a. Calculate the mean and standard deviation for this data set. b. Delete the observation of 380 and recalculate the mean and standard deviation. How do these values compare to the values calculated in Part (a)? What does this suggest about using the mean and standard deviation as measures of center and spread for a data set with outliers?

Short Answer

Expert verified

Calculating the mean and standard deviation with and without the outlier indicates the sensitivity of these measures to extreme values. In this case, the mean and standard deviation was significantly higher when the outlier was included, skewing the data away from the actual center and spread. This suggests caution should be used with mean and standard deviation when dealing with data sets that have outliers.

Step by step solution

Compute the Mean and Standard Deviation initially

To calculate the mean, add up all the waiting times and divide by the total number of datapoints, which is 19 in this case. The standard deviation involves squaring the difference of each individual waiting time with the mean, adding all these together, dividing by the number of observations and computing the square root of the resultant quantity.

Compute Mean and Standard Deviation without the outlier

The calculation of the mean and standard deviation excluding the outlier (380 in this case) follows the same procedure as in Step 1, but this time the total number of observations will be 18.

Compare the results of Step 1 and Step 2

After obtaining the results from Step 1 and Step 2, we compare the two sets of results. This will give insight into the effect of an outlier on the mean and standard deviation calculations. This step emphasizes the sensitivity of these measures to outliers in the data set.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Mean Calculation

Calculating the mean, or average, is a fundamental procedure in statistical data analysis. It gives us a quick snapshot of where the middle ground lies within a set of numbers. To calculate the mean of a data set, you simply add up all the individual numerical values and then divide by the number of values.

Using the waiting times at the coffee shop as an example, the mean waiting time is determined by adding up all the waiting times of the 19 female customers and dividing by 19. If a data point seems unusually high or low (like the 380 seconds outlier in the exercise), it can impact the mean significantly. This is because the mean takes into account every data point equally, which is why it’s referred to as a measure of center.

Standard Deviation Calculation

The standard deviation is a statistic that measures the dispersion of a dataset relative to its mean. It is calculated as the square root of the variance and gives an indication of the amount of variation or spread of the set of values. A low standard deviation indicates that the data points tend to be close to the mean, whereas a high standard deviation indicates that the data points are spread out over a large range of values.

For example, calculating the standard deviation of the coffee shop data involves subtracting the mean from each waiting time to find the difference, squaring each difference, then averaging those squared differences, and finally taking the square root of that average. This provides insight into the consistency of the waiting times. After removing the outlier, you would typically find a lowered standard deviation, signifying that the remaining data points are more tightly clustered around the mean.

Data Set Outliers

In the context of statistical analysis, an outlier is a data point that is markedly different from most of the other data points in a set. Outliers can result from measurement errors or may indicate a novel finding that is worth further investigation.

In the exercise, the waiting time of 380 seconds is an outlier because it is much higher than the rest of the data points. This single data point can skew the mean and inflate the standard deviation, suggesting that the average waiting time is higher and that there is more variability in the waiting times than there actually is. Removing outliers can sometimes give us a more accurate picture of the data set's central tendency and variability.

Measures of Center and Spread

The two main statistical measures are measures of center, which describe the center of a data set, and measures of spread, which describe the diversity or variability of the data set. When we consider measures of center, such as the mean and median, we get a sense of a 'typical' value you might expect. The mean is affected by every single data point, while the median—a measure of center less sensitive to outliers—looks at the middle value when all the points are arranged in order.

On the other hand, measures of spread such as standard deviation and range indicate how spread out our data is. A small range or a low standard deviation implies that the data points are clustered closely around the center. Outliers have a significant effect on these measures, especially on the mean and standard deviation. As showcased in the exercise, removing the outlier tends to decrease the mean and standard deviation, leading to a more representative measure of the central tendency and variability for the rest of the data.

Short Answer

Step by step solution

Compute the Mean and Standard Deviation initially

Compute Mean and Standard Deviation without the outlier

Compare the results of Step 1 and Step 2

Key Concepts

Mean Calculation

Standard Deviation Calculation

Data Set Outliers

Measures of Center and Spread

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Math Textbooks

Applied Mathematics

Statistics

Probability and Statistics

Logic and Functions

Theoretical and Mathematical Physics

Discrete Mathematics

Study anywhere. Anytime. Across all devices.

Company

Product

Help