Chapter 2: Problem 51

Altman and Bland report the survival times for patients with active hepatitis, half treated with prednisone and half receiving no treatment. ${ }^{10}$ The survival times (in months) (Exercise 1.73 and $\mathrm{EX} 0173$ ) are adapted from their data for those treated with prednisone. $$ \begin{array}{rr} 8 & 127 \\ 11 & 133 \\ 52 & 139 \\ 57 & 142 \\ 65 & 144 \\ 87 & 147 \\ 93 & 148 \\ 97 & 157 \\ 109 & 162 \\ 120 & 165 \end{array} $$ a. Can you tell by looking at the data whether it is roughly symmetric? Or is it skewed? b. Calculate the mean and the median. Use these measures to decide whether or not the data are symmetric or skewed. c. Draw a box plot to describe the data. Explain why the box plot confirms your conclusions in part b.

Short Answer

Expert verified

Short Answer: The survival time data is somewhat skewed to the right. The mean (89.65) is slightly less than the median (123.5), and the box plot shows a longer right whisker and an asymmetrical box, indicating a right skewness.

Step by step solution

(Exercise part a: Symmetry or skewness estimation by looking at the data)

To analyze the data just by looking at it, we need to observe the spacing between the numbers. If the spacing is uniform throughout, the data is likely symmetric. If the spacing is narrower on one side and wider on the other side, the data is skewed. In this case, it is hard to tell definitively if the data is symmetric or skewed just by looking at it. We need to calculate the mean and median and use a box plot to make a conclusion.

(Exercise part b: Calculating the mean and median)

First, we need to find the mean and median of the survival times. To find the mean, add up the survival times and then divide by the number of data points. To find the median, sort the data points in ascending order and locate the middle value. Here's how to do it: Mean = $\frac{8+11+52+57+65+87+93+97+109+120+127+133+139+142+144+147+148+157+162+165}{20} = 89.65$ To find the median, we have already sorted data: $\{8, 11, 52, 57, 65, 87, 93, 97, 109, 120, 127, 133, 139, 142, 144, 147, 148, 157, 162, 165\}$ Since there are 20 data points, the median is the average of the 10th and 11th data points: Median = $\frac{120+127}{2} = 123.5$ Since the mean (89.65) is slightly less than the median (123.5), we can conclude that the data is somewhat skewed to the right.

(Exercise part c: Drawing a box plot and explaining conclusions)

To draw a box plot, we need to find the 1st quartile (Q1), median (Q2), and the 3rd quartile (Q3). There are 20 data points, so Q1 is the median of the first 10 data points: Q1 = Median of $\{8, 11, 52, 57, 65, 87, 93, 97, 109, 120\} = \frac{65+87}{2} = 76$ Q3 is the median of the last 10 data points: Q3 = Median of $\{127, 133, 139, 142, 144, 147, 148, 157, 162, 165\} = \frac{142+144}{2} = 143$ The box plot will have the following components: - A vertical line at Q1 (76) - A vertical line at Q2 (123.5) - A vertical line at Q3 (143) - A rectangle surrounding Q1, Q2, and Q3 - Whiskers extending from the minimum value (8) to Q1 and from Q3 to the maximum value (165) Looking at the box plot, we can see that the right whisker is longer than the left whisker and the box is not symmetrical. Thus, it confirms that the data is somewhat skewed to the right, as we concluded in part b.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Data Symmetry

In descriptive statistics, data symmetry indicates whether a dataset is evenly distributed on either side of a central value. Symmetry in data means that the left and right sides of the distribution are mirror images of each other. When considering symmetry, any deviation towards one end signifies skewness.

To determine symmetry just by visual inspection, as was attempted in step 1 of the example problem, look for regular spacing and balanced distribution of values around the middle value. If the values gradually increase and decrease at a consistent rate from the center, we expect symmetry. But often, a visual inspection is insufficient; thus, calculating statistical measures such as the mean and median is crucial for confirming data symmetry or skewness.

Mean and Median

The mean and median are central measures of tendency in descriptive statistics. The mean is simply the average of all data points, calculated by summing them up and dividing by the number of values. Contrastingly, the median is the middle value when the dataset is sorted from smallest to largest, or the average of the two middle values in an even-sized dataset.

In the example problem, the mean was found to be lower than the median, which hints at a skew in the dataset. Typically, when the mean and median are not equal, the side where the mean lies is considered to be 'weighted' with more or larger values, indicating skewness.

Box Plot

A box plot, or box-and-whisker plot, is a graphical representation of a dataset’s distribution and is very helpful in depicting the degree of skewness. It consists of a rectangle (box) and lines (whiskers) extending from either side. The box encloses the interquartile range (Q1 to Q3), with a line inside it that represents the median (Q2). The whiskers stretch out to the minimum and maximum values, excluding outliers.

In step 3 of the solution process, the box plot was crafted to give a visual understanding of distribution. The length and symmetry of the whiskers, the placement of the median within the box, and the position of the box within the total range are crucial indicators. A longer right whisker, as in the example, confirms right-skewness—often easier to discern in a box plot than in a raw data list.

Data Skewness

Data skewness is a measure of asymmetry in a data distribution. If a dataset is skewed to the right, like in the provided example, it means there are a minority of higher values that extend the tail of the distribution to the right. Conversely, left skewness indicates a tail that stretches to the left with lower values.

Skewness can significantly affect the mean, pulling it towards the tail's direction, while the median remains more robust and less influenced by extreme values. The difference in the mean and median values, and the non-symmetrical visualization offered by the box plot, indicate that the dataset in the example problem does not possess data symmetry but rather a skewness towards the higher values.

Short Answer

Step by step solution

(Exercise part a: Symmetry or skewness estimation by looking at the data)

(Exercise part b: Calculating the mean and median)

(Exercise part c: Drawing a box plot and explaining conclusions)

Key Concepts

Data Symmetry

Mean and Median

Box Plot

Data Skewness

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Math Textbooks

Geometry

Statistics

Pure Maths

Mechanics Maths

Theoretical and Mathematical Physics

Applied Mathematics

Study anywhere. Anytime. Across all devices.

Company

Product

Help