Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

The accompanying data are a subset of data read from a graph in the paper "Ladies First? A Field Study of Discrimination in Coffee Shops" (Applied Economics [April, 2008]). The data are the waiting times (in seconds) between ordering and receiving coffee for 19 female customers at a Boston coffee shop. \(\begin{array}{llllllll}0 & 80 & 80 & 100 & 100 & 100 & 120 & 120\end{array}\) \(\begin{array}{llllllll}120 & 140 & 140 & 150 & 160 & 180 & 200 & 200\end{array}\) $$ \begin{array}{lll} 220 & 240 & 380 \end{array} $$ a. Calculate the mean and standard deviation for this data set. b. Delete the observation of 380 and recalculate the mean and standard deviation. How do these values compare to the values calculated in Part (a)? What does this suggest about using the mean and standard deviation as measures of center and spread for a data set with outliers?

Short Answer

Expert verified
Calculating the mean and standard deviation with and without the outlier indicates the sensitivity of these measures to extreme values. In this case, the mean and standard deviation was significantly higher when the outlier was included, skewing the data away from the actual center and spread. This suggests caution should be used with mean and standard deviation when dealing with data sets that have outliers.

Step by step solution

01

Compute the Mean and Standard Deviation initially

To calculate the mean, add up all the waiting times and divide by the total number of datapoints, which is 19 in this case. The standard deviation involves squaring the difference of each individual waiting time with the mean, adding all these together, dividing by the number of observations and computing the square root of the resultant quantity.
02

Compute Mean and Standard Deviation without the outlier

The calculation of the mean and standard deviation excluding the outlier (380 in this case) follows the same procedure as in Step 1, but this time the total number of observations will be 18.
03

Compare the results of Step 1 and Step 2

After obtaining the results from Step 1 and Step 2, we compare the two sets of results. This will give insight into the effect of an outlier on the mean and standard deviation calculations. This step emphasizes the sensitivity of these measures to outliers in the data set.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Mean Calculation
Calculating the mean, or average, is a fundamental procedure in statistical data analysis. It gives us a quick snapshot of where the middle ground lies within a set of numbers. To calculate the mean of a data set, you simply add up all the individual numerical values and then divide by the number of values.

Using the waiting times at the coffee shop as an example, the mean waiting time is determined by adding up all the waiting times of the 19 female customers and dividing by 19. If a data point seems unusually high or low (like the 380 seconds outlier in the exercise), it can impact the mean significantly. This is because the mean takes into account every data point equally, which is why it’s referred to as a measure of center.
Standard Deviation Calculation
The standard deviation is a statistic that measures the dispersion of a dataset relative to its mean. It is calculated as the square root of the variance and gives an indication of the amount of variation or spread of the set of values. A low standard deviation indicates that the data points tend to be close to the mean, whereas a high standard deviation indicates that the data points are spread out over a large range of values.

For example, calculating the standard deviation of the coffee shop data involves subtracting the mean from each waiting time to find the difference, squaring each difference, then averaging those squared differences, and finally taking the square root of that average. This provides insight into the consistency of the waiting times. After removing the outlier, you would typically find a lowered standard deviation, signifying that the remaining data points are more tightly clustered around the mean.
Data Set Outliers
In the context of statistical analysis, an outlier is a data point that is markedly different from most of the other data points in a set. Outliers can result from measurement errors or may indicate a novel finding that is worth further investigation.

In the exercise, the waiting time of 380 seconds is an outlier because it is much higher than the rest of the data points. This single data point can skew the mean and inflate the standard deviation, suggesting that the average waiting time is higher and that there is more variability in the waiting times than there actually is. Removing outliers can sometimes give us a more accurate picture of the data set's central tendency and variability.
Measures of Center and Spread
The two main statistical measures are measures of center, which describe the center of a data set, and measures of spread, which describe the diversity or variability of the data set. When we consider measures of center, such as the mean and median, we get a sense of a 'typical' value you might expect. The mean is affected by every single data point, while the median—a measure of center less sensitive to outliers—looks at the middle value when all the points are arranged in order.

On the other hand, measures of spread such as standard deviation and range indicate how spread out our data is. A small range or a low standard deviation implies that the data points are clustered closely around the center. Outliers have a significant effect on these measures, especially on the mean and standard deviation. As showcased in the exercise, removing the outlier tends to decrease the mean and standard deviation, leading to a more representative measure of the central tendency and variability for the rest of the data.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

The paper "Total Diet Study Statistics on Element Results" (Food and Drug Administration, April 25,2000 ) gave information on sodium content for various types of foods. Twenty-six brands of tomato catsup were analyzed. Data consistent with summary quantities in the paper are Sodium content ( \(\mathrm{mg} / \mathrm{kg}\) ) \(\begin{array}{lrrrrr}12,148 & 10,426 & 10,912 & 9,116 & 13,226 & 11,663 \\ 11,781 & 10,680 & 8,457 & 10,788 & 12,605 & 10,591 \\\ 11,040 & 10,815 & 12,962 & 11,644 & 10,047 & \\ 10,478 & 10,108 & 12,353 & 11,778 & 11,092 & \\ 11,673 & 8,758 & 11,145 & 11,495 & & \end{array}\) Calculate and interpret the values of the quartiles and the interquartile range. (Hint: See Example 3.9 )

Data on weekday exercise time for 20 females, consistent with summary quantities given in the paper "An Ecological Momentary Assessment of the Physical Activity and Sedentary Behaviour Patterns of University Students" (Health Education Journal [2010]: 116-125), are shown below. Female-Weekday \(\begin{array}{rrrrrr}10.0 & 90.6 & 48.5 & 50.4 & 57.4 & 99.6 \\\ 0.0 & 5.0 & 0.0 & 0.0 & 5.0 & 2.0 \\ 10.5 & 5.0 & 47.0 & 0.0 & 5.0 & 54.0 \\\ 0.0 & 48.6 & & & & \end{array}\) a. Calculate and interpret the values of the median and interquartile range. b. How do the values of the median and interquartile range for women compare to those for men calculated in the previous exercise?

The mean number of text messages sent per month by customers of a cell phone service provider is 1,650 , and the standard deviation is \(750 .\) Find the \(z\) -score associated with each of the following numbers of text messages sent. a. 0 b. \(\quad 10,000\) c. \(\quad 4,500\) d. \(\quad 300\)

The Insurance Institute for Highway Safety (www.iihs. org, June 11,2009 ) published data on repair costs for cars involved in different types of accidents. In one study, seven different 2009 models of mini- and micro-cars were driven at 6 mph straight into a fixed barrier. The following table gives the cost of repairing damage to the bumper for each of the seven models. \begin{tabular}{|lc|} \hline Model & Repair Cost \\ \hline Smart Fortwo & \(\$ 1,480\) \\ Chevrolet Aveo & \(\$ 1,071\) \\ Mini Cooper & \(\$ 2,291\) \\ Toyota Yaris & \(\$ 1,688\) \\ Honda Fit & \(\$ 1,124\) \\ Hyundai Accent & \(\$ 3,476\) \\ Kia Rio & \(\$ 3,701\) \\ \end{tabular} a. Calculate and interpret the value of the median for this data set. b. Explain why the median is preferable to the mean for describing center in this situation.

The paper "Caffeinated Energy Drinks-A Growing Problem" (Drug and Alcohol Dependence [2009]: 1-10) reported caffeine per ounce for 8 top-selling energy drinks and for 11 highcaffeine energy drinks: Top Selling Energy Drinks $$ \begin{array}{llll} 9.6 & 10.0 & 10.0 & 9.0 \end{array} $$ 9.5 High-Caffeine Energy Drinks \(21.0 \quad 2\) 25.0 15.0 ! \(21 .\) \(\begin{array}{ll}5 & 35 .\end{array}\) 30.0 \(31.3 \quad 3\) \(33.3 \quad 11.9 \quad 16.3\) The mean caffeine per ounce is clearly higher for the highcaffeine energy drinks, but which of the two groups of energy drinks is the most variable? Justify your choice.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free