Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Let \(x_{1}, x_{2}, \ldots, x_{n}\) be the values of a random sample. A bootstrap sample, \(\mathbf{x}^{* \prime}=\left(x_{1}^{*}, x_{2}^{*}, \ldots, x_{n}^{*}\right)\), is a random sample of \(x_{1}, x_{2}, \ldots, x_{n}\) drawn with replacement. (a) Show that \(x_{1}^{*}, x_{2}^{*}, \ldots, x_{n}^{*}\) are iid with common cdf \(\widehat{F}_{n}\), the empirical cdf of \(x_{1}, x_{2}, \ldots, x_{n}\) (b) Show that \(E\left(x_{i}^{*}\right)=\bar{x}\) (c) If \(n\) is odd, show that median \(\left\\{x_{i}^{*}\right\\}=x_{((n+1) / 2)}\). (d) Show that \(V\left(x_{i}^{*}\right)=n^{-1} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\).

Short Answer

Expert verified
The bootstrap samples \(x_{1}^{*}, x_{2}^{*}, \ldots, x_{n}^{*}\) are independent and identically distributed (iid) random variables with empirical CDF \(\widehat{F}_{n}\). The expectation of each \(x_{i}^{*}\) is \(\bar{x}\), the mean of original sample. If \(n\) is odd, the median of the bootstrap sample is \(x_{((n+1) / 2)}\), the median of the original sample. The variance of each \(x_{i}^{*}\) is equal to the variance of the original sample.

Step by step solution

01

Defining the empirical CDF

The empirical CDF \(\widehat{F}_{n}\) is a step function that jumps up by \(1/n\) at each of the \(n\) data points. The value of \(\widehat{F}_{n}\) at any number is the proportion of elements in the sample less than or equal to that number. So, for any \(x_{i}\), \(\widehat{F}_{n}(x_{i})=\frac{1}{n}\). Hence, \(x_{1}^{*}, x_{2}^{*}, \ldots, x_{n}^{*}\) are identically distributed.
02

Independence of bootstrap samples

Since each \(x_{i}^{*}\) in the bootstrap sample is drawn independently from the original sample, \(x_{1}^{*}, x_{2}^{*}, \ldots, x_{n}^{*}\) are independent.
03

Expectation of a bootstrap sample

Expectation of each \(x_{i}^{*}\) is equal to the average of the original sample because each \(x_{i}^{*}\) is as likely to take on the value of any of the \(x_{i}\)'s. Therefore, \(E\left(x_{i}^{*}\right)=\bar{x}\), where \(\bar{x}\) is the mean of the original sample.
04

Median of a bootstrap sample

If \(n\) is odd, then each \(x_{i}^{*}\) takes value from \(x_{i}\) independently with the same probability. Hence, it's most likely that the median of \(\{x_{i}^{*}\}\) will be the same as the median of \(\{x_{i}\}\), that is, \(x_{((n+1) / 2)}\).
05

Variance of a bootstrap sample

The variance of \(x_{i}^{*}\) is equivalent to the variance of the original sample, because each \(x_{i}^{*}\) is a simple random sample from \(x_{i}\). So, \(V\left(x_{i}^{*}\right)=n^{-1}\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\).

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Bootstrap Sampling
Bootstrap sampling is a resampling technique used to estimate the distribution of a statistic by randomly drawing samples with replacement from the original dataset. When working with a dataset, like a set of observations \( x_{1}, x_{2}, \ldots, x_{n} \), the fundamental idea is to generate new datasets, called bootstrap samples. Each of these new samples, \( \mathbf{x}^{* \prime} \), contains the same number of observations as the original but allows for repeated observations due to the replacement factor.

When applied, bootstrap sampling grants us insight into the variability and stability of sample statistics like the mean or variance. This robust approach allows for drawing conclusions about the population from which the original sample was drawn, even when no explicit knowledge about the population is available. The method's strength lies in its simplicity and flexibility, making it invaluable for assessing the uncertainty or confidence in statistical estimates from limited or non-parametric data.
Empirical Cumulative Distribution Function (ECDF)
The empirical cumulative distribution function (ECDF) represents the proportion of observations less or equal to a certain value. For each data point \( x_{i} \) in a dataset \( \{x_{1}, x_{2}, \ldots, x_{n}\} \), the ECDF \( \widehat{F}_{n} \) increases by \( 1/n \) at that specific value. Essentially, the ECDF is a step function that graphically showcases the distribution of data.

To bring this into perspective, imagine lining up all data points on the number line; at each point, take a step upward. The height of the step at any given position is the fraction of data points that are at or below that level, a snapshot of the data's relative standing. Unlike theoretical distribution functions, the ECDF is based strictly on the available data, hence its empirical nature, and provides a non-parametric model to understand the inherent distribution of the data.
Independent and Identically Distributed (iid)
When discussing random variables or samples, 'independent and identically distributed' (iid) is a critical concept in the realm of statistics and probability. Idependence implies that the occurrence of one event does not influence that of another. Identically distributed denotes that each random variable has the same probability distribution.

In the context of bootstrap sampling, each bootstrap element \( x_{i}^{*} \) is drawn from the original sample independently, meaning that the selection of one does not affect the selection of another. Moreover, they are identically distributed as each element comes from the same original sample and thus follows the empirical cumulative distribution function (ECDF), \( \widehat{F}_{n} \), of the original data. This iid property is foundational in bootstrapping and many other statistical methods as it ensures consistent behavior across samples, which is imperative for valid inference.
Sample Variance
Sample variance is a measure that tells us how widely dispersed the values in a sample are. It's calculated by taking the squared differences between each observation and the sample mean, adding them all up, and then dividing by the number of observations minus one. In a mathematical form, for a sample \( \{x_{1}, x_{2}, \ldots, x_{n}\} \) with a mean of \( \bar{x} \), the sample variance \( s^2 \) is \( s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\bar{x})^2\).

In bootstrap sampling, each resampled dataset \( \mathbf{x}^{* \prime} \) provides a sample variance that can be used to estimate the variance of the sampling distribution of a statistic. This resulting bootstrap variance captures the variability among the resampled datasets, lending a way to understand uncertainty and construct confidence intervals around statistical estimates. It's a cornerstone for inferential statistics, providing a glimpse into the sample's diversity, and by extension, the underlying population's diversity.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

This exercise obtains a useful identity for the cdf of a Poisson cdf. (a) Use Exercise \(3.3 .5\) to show that this identity is true: $$ \frac{\lambda^{n}}{\Gamma(n)} \int_{1}^{\infty} x^{n-1} e^{-x \lambda} d x=\sum_{j=0}^{n-1} e^{-\lambda} \frac{\lambda^{j}}{j !} $$ for \(\lambda>0\) and \(n\) a positive integer. Hint: Just consider a Poisson process on the unit interval with mean \(\lambda\). Let \(W_{n}\) be the waiting time until the \(n\) th event. Then the left side is \(P\left(W_{n}>1\right)\). Why? (b) Obtain the identity used in Example \(4.3 .3\), by making the transformation \(z=\lambda x\) in the above integral.

Consider the following permutation test for the two-sample problem with hypotheses \((4.9 .7) .\) Let \(\mathbf{x}^{\prime}=\left(x_{1}, x_{2}, \ldots, x_{n_{1}}\right)\) and \(\mathbf{y}^{\prime}=\left(y_{1}, y_{2}, \ldots, y_{n_{2}}\right)\) be the realizations of the two random samples. The test statistic is the difference in sample means \(\bar{y}-\bar{x} .\) The estimated \(p\) -value of the test is calculated as follows: 1\. Combine the data into one sample \(\mathbf{z}^{\prime}=\left(\mathbf{x}^{\prime}, \mathbf{y}^{\prime}\right)\). 2\. Obtain all possible samples of size \(n_{1}\) drawn without replacement from \(\mathrm{z}\). Each such sample automatically gives another sample of size \(n_{2}\), i.e., all elements of \(\mathbf{z}\) not in the sample of size \(n_{1}\). There are \(M=\left(\begin{array}{c}n_{1}+n_{2} \\ n_{1}\end{array}\right)\) such samples. 3\. For each such sample \(j\) : (a) Label the sample of size \(n_{1}\) by \(\mathbf{x}^{*}\) and label the sample of size \(n_{2}\) by \(\mathbf{y}^{*}\). (b) Calculate \(v_{j}^{*}=\bar{y}^{*}-\bar{x}^{*}\). 4\. The estimated \(p\) -value is \(\hat{p}^{*}=\\#\left\\{v_{j}^{*} \geq \bar{y}-\bar{x}\right\\} / M\). (a) Suppose we have two samples each of size 3 which result in the realizations: \(\mathbf{x}^{\prime}=(10,15,21)\) and \(\mathbf{y}^{\prime}=(20,25,30)\). Determine the test statistic and the permutation test described above along with the \(p\) -value. (b) If we ignore distinct samples, then we can approximate the permutation test by using the bootstrap algorithm with resampling performed at random and without replacement. Modify the bootstrap program boottesttwo.s to do this and obtain this approximate permutation test based on 3000 resamples for the data of Example \(4.9 .2 .\) (c) In general, what is the probability of having distinct samples in the approximate permutation test described in the last part? Assume that the original data are distinct values.

Let \(X_{1}, \ldots, X_{n}\) be a random sample from a \(N(0,1)\) distribution. Then the probability that the random interval \(\bar{X} \pm t_{\alpha / 2, n-1}(s / \sqrt{n})\) traps \(\mu=0\) is \((1-\alpha)\). To verify this empirically, in this exercise, we simulate \(m\) such intervals and calculate the proportion that trap 0, which should be "close" to \((1-\alpha)\). (a) Set \(n=10\) and \(m=50\). Run the \(\mathrm{R}\) code mat=matrix (rnorm \((\mathrm{m} * \mathrm{n}), \mathrm{n} \overline{\mathrm{col}=\mathrm{n}})\) which generates \(m\) samples of size \(n\) from the \(N(0,1)\) distribution. Each row of the matrix mat contains a sample. For this matrix of samples, the function below computes the \((1-\alpha) 100 \%\) confidence intervals, returning them in a \(m \times 2\) matrix. Run this function on your generated matrix mat. What is the proportion of successful confidence intervals? (b) Run the following code which plots the intervals. Label the successful intervals. Comment on the variability of the lengths of the confidence intervals.

In Exercise \(4.2 .27\), in finding a confidence interval for the ratio of the variances of two normal distributions, we used a statistic \(S_{1}^{2} / S_{2}^{2}\), which has an \(F\) distribution when those two variances are equal. If we denote that statistic by \(F\), we can test \(H_{0}: \sigma_{1}^{2}=\sigma_{2}^{2}\) against \(H_{1}: \sigma_{1}^{2}>\sigma_{2}^{2}\) using the critical region \(F \geq c\). If \(n=13, m=11\), and \(\alpha=0.05\), find \(c .\)

Let \(p\) equal the proportion of drivers who use a seat belt in a country that does not have a mandatory seat belt law. It was claimed that \(p=0.14\). An advertising campaign was conducted to increase this proportion. Two months after the campaign, \(y=104\) out of a random sample of \(n=590\) drivers were wearing their seat belts. Was the campaign successful? (a) Define the null and alternative hypotheses. (b) Define a critical region with an \(\alpha=0.01\) significance level. (c) Determine the approximate \(p\) -value and state your conclusion.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free