Problem 1
Suppose that a \(2^{2}\) factorial experiment is to be performed using eight units in four blocks of two units each. Show that the intercept, three block effects, and the main effects and interaction between the treatments can be estimated if the treatments are allocated to blocks as follows: \((1, a),(b, a b),(1, a b),(a, b)\). Can they all still be estimated if an observation from the last block is lost?
Problem 1
(a) Show that under the two-sample model, the difference of the sample averages, \(\bar{y}_{2}-\bar{y}_{1}\), has variance \(\left(n_{1}+n_{2}\right) \sigma^{2} /\left(n_{1} n_{2}\right) .\) Show that subject to \(n_{1}+n_{2}=n\), this is minimized when \(n_{1}\) and \(n_{2}\) are as nearly equal as possible. (b) Suppose that \(n\) units are split into \(k\) blocks of size \(m+1\), and that one unit in each block is chosen at random to be treated, while the remaining \(m\) are controls. Suppose that the responses in the \(j\) th block are \(y_{j 1}\) and \(y_{j 2}, \ldots, y_{j(m+1)}\), and let \(d_{j}\) represent the difference between the treated individual and the average of the controls. Show that the average of these differences has variance \((m+1) \sigma^{2} /(k m)\), and show that for fixed \(n\) this is minimized when \(m=1\)
Problem 2
Suppose a paired comparison experiment is performed, in which the \(j\) th pair satisfies the normal linear model $$ y_{0 j}=\mu_{j}-\delta+\varepsilon_{0 j}, \quad y_{1 j}=\mu_{j}+\delta+\varepsilon_{1 j}, \quad j=1, \ldots, m $$ but that data analysis is performed using the two-sample model. Show that the variance estimator can be written as $$ S^{2}=\frac{1}{2(m-1)} \sum_{j, t}\left(\mu_{j}-\bar{\mu}+\varepsilon_{t j}-\bar{\varepsilon} . .\right)^{2} $$ Deduce that this has expected value \(\sigma^{2}+(m-1)^{-1} \sum_{j}\left(\mu_{j}-\bar{\mu}\right)^{2}\) conditional on the \(\mu_{j}\) and hence show that if the \(\mu_{j}\) are normally distributed with variance \(\tau^{2}\), then \(\mathrm{E}\left(S^{2}\right)=\) \(\sigma^{2}+\tau^{2}\) Show that if the two-sample model is used in this situation, the length of a \(95 \%\) confidence interval for \(2 \delta\) is roughly \(2\left(\sigma^{2}+\tau^{2}\right)^{1 / 2} t_{2(m-1)}(0.025)\), whereas under the paired comparisons model the length is about \(2 \sigma t_{m-1}(0.025) .\) For what values of \(\tau^{2} / \sigma^{2}\) are the two-sample intervals shorter when (a) \(m=3\), (b) \(m=11\) ? Discuss your results.
Problem 2
Let \(y_{g r}, g=1, \ldots, G, r=1, \ldots, R\), be independent normal random variables with means \(\mu_{g r}\) and common variance \(\sigma^{2}\). (a) Assume the one-way analysis of variance model, namely that \(\mu_{g r}=\mu_{g}\), so that the \(y_{g r}\) are replicate measurements with the same mean, and find the sufficient statistics for the \(\mu \mathrm{s}\) and \(\sigma^{2}\). Show that these are equivalent to $$ \bar{y}_{1}, \ldots, \bar{y}_{G \cdot}, \quad S S=\sum_{g=1}^{G} \sum_{r=1}^{R}\left(y_{g r}-\bar{y}_{g} .\right)^{2} $$ where \(\bar{y}_{g .}=R^{-1} \sum_{r=1}^{R} y_{g r} ;\) note that $$ \sum_{r}\left(y_{g r}-\mu_{g}\right)^{2}=\sum_{r}\left(y_{g r}-\bar{y}_{g .}\right)^{2}+R\left(\bar{y}_{g .}-\mu_{g}\right)^{2} $$ (b) Prove that \(S S\) is independent of the group means, and that it is proportional to a chi-squared random variable on \(G(R-1)\) degrees of freedom. (c) Let \(\bar{y}_{. .}=G^{-1} \sum_{g} \bar{y}_{g}\). denote the overall mean. If \(\mu_{1}=\cdots=\mu_{G}\), show that the distribution of \(S S_{G}=R \sum_{g=1}^{G}\left(y_{g .}-\bar{y}_{. .}\right)^{2}\) is proportional to a chi-squared distribution on \(G-1\) degrees of freedom. Hence find the distribution of \(G(R-1) S S_{G} /(G-1) S^{2}\), when the means are equal. (d) Samples of the same material are sent to four laboratories for chemical analysis as part of a study to determine whether laboratories give the same results. The results for laboratories A-D are: $$ \begin{array}{llllll} \mathrm{A} & 58.7 & 61.4 & 60.9 & 59.1 & 58.2 \\ \mathrm{~B} & 62.7 & 64.5 & 63.1 & 59.2 & 60.3 \\ \mathrm{C} & 55.9 & 56.1 & 57.3 & 55.2 & 58.1 \\ \mathrm{D} & 60.7 & 60.3 & 60.9 & 61.4 & 62.3 \end{array} $$ Test the hypothesis that the means are different and comment.
Problem 5
To what extent can gender be regarded as a cause in studies (a) relating longevity and lifestyle and (b) of salary differentials in employment?
Problem 6
Let \(T=0\) with probability \(1-\alpha\) and \(T=1\) otherwise, and suppose that conditional on \(T=0, R_{0}\) is normal with mean zero and \(R_{1}\) is normal with mean \(\delta\), while conditional on \(T=1\), the corresponding means are \(\eta\) and \(\eta+\delta ;\) in each case the variables have unit variances. Let \(Y=R_{0}(1-T)+R_{1} T\) denote the observed response variable. Show that \(\gamma=\mathrm{E}(Y \mid T=1)-\mathrm{E}(Y \mid T=0)=\eta+\delta\), and deduce that \(\delta=\mathrm{E}\left(R_{1}\right)-\mathrm{E}\left(R_{0}\right)\) cannot be estimated unless \(\left(R_{0}, R_{1}\right)\) and \(T\) are independent.