Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Use the data in Table 10.6 on page 640. We are interested in the bias of the sample median as an estimator of the median of the distribution.

a. Use the non-parametric bootstrap to estimate this bias.

b. How many bootstrap samples does it appear that you need in order to estimate the bias to within .05 with a probability of 0.99?

Short Answer

Expert verified

a. The Estimate of the bias is equal to -1.615.

b. Around n = 39096.

Step by step solution

01

(a) To Use the non-parametric bootstrap to estimate this bias.

The fact that there are no assumptions for the distribution F means that the non-parametric bootstrap shall be used. The estimator of interest is the bias of the sample median.

To initialize the process, the sample median of the initial sample shall be computed. First, calculate the difference between the sample median of the generated samples from the distribution and the median of the original sample to find the bootstrap bias estimate. The bootstrap bias estimate is then the average of these values. The following code gives it in RStudio. Note that the variable of interest is Sulfur Dioxide.

The sample median from the initial sample is equal to 26. The approximation of the bias, or the estimated bias, is -1.615. The number of samples used was\(n = 25000 = \nu \) . Note that this number will change every time you run the code.

02

(b) To find the bootstrap samples and estimate the bias

For the bias to be between -0.05 and 0.05 from the actual bias, one should look into the following probability

\(P( - 0.05 < \theta < 0.05) = 0.99\)

As in the exercise, it is given that the probability shall be at least 0.99. Then, one would need the estimator's expected value and the estimator's standard deviation. Instead, take that the sample size is large and use the simulated sample standard deviation as an estimate (note here one should use the sample standard deviation of the n generated sample medians) and set the mean to be equal to zero. Then,

\(\begin{aligned}{}P( - 0.05 < \theta < 0.05) = P\left( {\frac{{ - 0.05 - 0}}{{\sqrt {4.25/n} }} < \frac{{\theta - 0}}{{\sqrt {4.25/n} }} < \frac{{0.05 - 0}}{{\sqrt {4.25/n} }}} \right)\\\\ = P\left( {\frac{{ - 0.05 - 0}}{{\sqrt {4.25/n} }} < Z < \frac{{0.05 - 0}}{{\sqrt {4.25/n} }}} \right)\\\\ = 0.99,\end{aligned}\)

or equally, because P(Z<2.576) =0.995 (standard normal is symmetric), it is true that

\(\frac{{0.05}}{{\sqrt {4.25/n} }} = 2.576\)

which leads to

\(n = {\left( {\frac{{4.25}}{{0.04}} \times 2.576} \right)^2} = 39096\)

Instead of using the initial n=25000, use \(n \ge 39096\) it to obtain the desired result. By rerunning the code with a changed value of n, the standard error of the simulation will be close to 0.021<0.05. The Estimate of the bias is close to -1.679.

#The library for built-in bootstrap functions

library(boot)

#read data

Users

Exercise5.txt",

#a column instead of a list

#a column instead of a list

\(Sulfur Dioxide = matrix (unlist (data (1) ))\)

#a way to find the sample median

Original Sample median = median (Sulfur Dioxide)

#the number of bootstrap samples

\(n = 39096\)

\(Xstarimedian = numeric \left( n \right)\)

\(SampleBias = numeric(n)\)

#Generate sample \({X^{ \wedge *}}i\)

and the biases

for \(\left( { i in \left( {1: n} \right)} \right)\){

#generate the bootstrap sample with replacement

\(Xstari = sample(SulfurDioxide,length(SulfurDioxide),replace = T)\)

#median of the bootstrap sample

\(XstariMedian(i) = median(Xstari)\)

#bias

\(SampleBias \left( i \right) = XstariMedian\left( i \right) - OriginalSampleMedian\)

}

#average the differences

Bootstrap Estimate median = mean (Sample Bias)

#the second part

Bootstrap Estimate SD Median = \(sd (SampleBias)/sqrt (n)\)

#the Estimate of the standard deviation of the n sample medians

\(\begin{aligned}{{}{}}{BootstrapEstimateSD = sd \left( {Xstarimedian } \right)}\\{}\end{aligned}\)

\(BootstrapEstimateMean = mean \left( {XstariMedian} \right)\)

\(n = {( qnorm (0.995,0,1)* BootstrapEstimate SD /0.05)^ \wedge }2\)

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Suppose that we wish to approximate the integral\(\int g (x)dx\). Suppose that we have a p.d.f. \(f\)that we shall use as an importance function. Suppose that \(g(x)/f(x)\) is bounded. Prove that the importance sampling estimator has finite variance.

Let \({x_1},...,{x_n}\) be the observed values of a random sample \(X = \left( {{x_1},...,{x_n}} \right)\) . Let \({F_n}\)be the sample c.d.f. Let \(\,{j_1},...\,,{j_n}\) be a random sample with replacement from the numbers \(\left\{ {1,.....,n} \right\}\) Define\({x_i}^ * = x{j_i}\) for \(i = 1,..,n.\)ashow that \({x^ * } = \left( {{x_1}^ * ,...,{x_n}^ * } \right)\) is an i.i.d. sample from the distribution\({F_n}\)

In Sec. 10.2, we discussed \({\chi ^2}\) goodness-of-fit tests for composite hypotheses. These tests required computing M.L.E.'s based on the numbers of observations that fell into the different intervals used for the test. Suppose instead that we use the M.L.E.'s based on the original observations. In this case, we claimed that the asymptotic distribution of the \({x^2}\) test statistic was somewhere between two different \({\chi ^2}\) distributions. We can use simulation to better approximate the distribution of the test statistic. In this exercise, assume that we are trying to test the same hypotheses as in Example 10.2.5, although the methods will apply in all such cases.

a. Simulate \(v = 1000\) samples of size \(n = 23\) from each of 10 different normal distributions. Let the normal distributions have means of \(3.8,3.9,4.0,4.1,\) and \(4.2\) Let the distributions have variances of 0.25 and 0.8. Use all 10 combinations of mean and variance. For each simulated sample, compute the \({\chi ^2}\) statistic Q using the usual M.L.E.'s of \(\mu \) , and \({\sigma ^2}.\) For each of the 10 normal distributions, estimate the 0.9,0.95, and 0.99 quantiles of the distribution of Q.

b. Do the quantiles change much as the distribution of the data changes?

c. Consider the test that rejects the null hypothesis if \(Q \ge 5.2.\) Use simulation to estimate the power function of this test at the following alternative: For each \(i,\left( {{X_i} - 3.912} \right)/0.5\) has the t distribution with five degrees of freedom.

In Example 12.5.6, we modeled the parameters \({\tau _1}, \ldots {\tau _\pi }\) as i.i.d. having the gamma distribution with parameters \({\alpha _0}\) , and \({\beta _0}.\) We could have added a level to the hierarchical model that would allow the \({\tau _\iota }\) 's to come from a distribution with an unknown parameter. For example, suppose that we model the \({\tau _\iota }\) 's as conditionally independent, having the gamma distribution with parameters \({\alpha _0}\) and \(\beta \) given \(\beta \). Let \(\beta \) be independent of \(\psi \) and \({\mu _1}, \ldots ,{\mu _p}\) with \(\beta \) having the prior distributions as specified in Example 12.5.6.

a. Write the product of the likelihood and the prior as a function of the parameters \({\mu _1}, \ldots ,{\mu _p},{\tau _1}, \ldots ,{\tau _\pi },\psi ,\) \(\beta \).

b. Find the conditional distributions of each parameter given all of the others. Hint: For all the parameters besides \(\beta \), the distributions should be almost identical to those given in Example 12.5.6. It wherever \({\beta _0}.\) appears, of course, something will have to change.

c. Use a prior distribution in which and \({\psi _0} = 170.\) Fit the model to the hot dog calorie data from Example 11.6.2. Compute the posterior means of the four \({\mu _i}'s\) and

\(1/{\tau _i}^\prime s.\)

In Example 12.5.6, we used a hierarchical model. In that model, the parameters\({\mu _i},...,{\mu _P}\,\)were independent random variables with\({\mu _i}\)having the normal distribution with mean ฯˆ and precision\({\lambda _0}{T_i}\,\)conditional on ฯˆ and\({T_1},\,....{T_P}\). To make the model more general, we could also replace\({\lambda _0}\)with an unknown parameter\(\lambda \). That is, let the\({\mu _i}\)โ€™s be independent with\({\mu _i}\)having the normal distribution with mean ฯˆ and precision\(\,\lambda {T_i}\)conditional on\(\psi \),\(\lambda \) and\({T_1},\,....{T_P}\). Let\(\lambda \)have the gamma distribution with parameters\({\gamma _0}\)and\(\,{\delta _0}\), and let\(\lambda \)be independent of ฯˆ and\({T_1},\,....{T_P}\). The remaining parameters have the prior distributions stated in Example 12.5.6.

a. Write the product of the likelihood and the prior as a function of the parameters\({\mu _i},...,{\mu _P}\,\), \({T_1},\,....{T_P}\)ฯˆ, and\(\lambda \).

b. Find the conditional distributions of each parameter given all of the others. Hint: For all the parameters besides\(\lambda \), the distributions should be almost identical to those given in Example 12.5.6. It wherever\({\lambda _0}\)appears, of course, something will have to change.

c. Use a prior distribution in which ฮฑ0 = 1, ฮฒ0 = 0.1, u0 = 0.001, ฮณ0 = ฮด0 = 1, and \({\psi _0}\)= 170. Fit the model to the hot dog calorie data from Example 11.6.2. Compute the posterior means of the four ฮผiโ€™s and 1/ฯ„iโ€™s.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free