Chapter 6

Problem 1

Dataframe alof i contains three-state data derived from daily rainfall over three years at Alofi in the Niue Island group in the Pacific Ocean. The states are 1 (no rain), 2 (up to $5 \mathrm{~mm}$ rain) and 3 (over $5 \mathrm{~mm}$ ). Triplets of transition counts for all 1096 observations are given in the upper part of Table $6.10 ;$ its lower part gives transition counts for successive pairs for sub-sequences 1-274, 275-548, 549-822 and 823-1096. (a) The maximized log likelihoods for first-, second-, and third-order Markov chains fitted to the entire dataset are $-1038.06,-1025.10$, and $-1005.56$. Compute the log likelihood for the zeroth-order model, and compare the four fits using likelihood ratio statistics and using AIC. Give the maximum likelihood estimates for the best-fitting model. Does it simplify to a varying-order chain? (b) Matrices of transition counts $\left\\{n_{i r s}\right\\}$ are available for $m$ independent $S$-state chains with transition matrices $P_{i}=\left(p_{i r s}\right), i=1, \ldots, m .$ Show that the maximum likelihood estimates are $\widehat{p}_{i r s}=n_{\text {irs }} / n_{i \cdot s}$, where $\cdot$ denotes summation over the corresponding index. Show that the maximum likelihood estimates under the simpler model in which $P_{1}=\cdots=P_{m}=\left(p_{r s}\right)$ are $\widehat{p}_{r s}=n_{\cdot r s} / n_{\cdot s} .$ Deduce that the likelihood ratio statistic to compare these models is $2 \sum_{i, r, s} n_{\text {irs }} \log \left(\widehat{p}_{i r s} / \widehat{p}_{r s}\right)$ and give its degrees of freedom. (c) Consider the lower part of Table 6.10. Explain how to use the statistic from (b) to test for equal transition probabilities in each section, and hence check stationarity of the data.

Problem 1

Classify the states of Markov chains with transition matrices $$ \left(\begin{array}{lll} 0 & 1 & 0 \\ 0 & 0 & 1 \\ \frac{1}{2} & \frac{1}{2} & 0 \end{array}\right),\left(\begin{array}{llll} 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \end{array}\right), \quad\left(\begin{array}{cccccc} \frac{1}{2} & \frac{1}{2} & 0 & 0 & 0 & 0 \\ \frac{1}{4} & \frac{3}{4} & 0 & 0 & 0 & 0 \\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & 0 & 0 \\ \frac{1}{4} & 0 & \frac{1}{4} & \frac{1}{4} & 0 & \frac{1}{4} \\ 0 & 0 & 0 & 0 & \frac{1}{2} & \frac{1}{2} \\ 0 & 0 & 0 & 0 & \frac{1}{2} & \frac{1}{2} \end{array}\right). $$

Problem 2

Find the eigendecomposition of $$ P=\left(\begin{array}{ccc} 0 & 1 & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & 0 & \frac{1}{2} \end{array}\right) $$ and show that $p_{11}(n)=a+2^{-n}\\{b \cos (n \pi / 2)+c \sin (n \pi / 2)\\}$ for some constants $a, b$ and $c$. Write down $p_{11}(n)$ for $n=0,1$ and 2 and hence find $a, b$ and $c$.

Problem 3

One way to estimate the evolutionary distance between species is to identify sections of their DNA which are similar and so must derive from a common ancestor species. If such sections differ at very few sites, the species are closely related and must have separated recently in the evolutionary past, but if the sections differ by more, the species are further apart. For example, data from the first introns of human and owl monkey insulin genes are in Table $6.12$. The first row means that there are 20 sites with $\mathrm{A}$ on both genes, 0 with A on the human and C on the monkey, and so on. If all the data lay on the diagonal, this section would be identical in both species. Note that even if sites on both genes have the same base, there could have been changes such as (ancestor) $\mathrm{A} \rightarrow \mathrm{G} \rightarrow \mathrm{T}$ (human) and (ancestor) $\mathrm{A} \rightarrow \mathrm{C} \rightarrow \mathrm{A} \rightarrow \mathrm{T}$ (monkey). Here is a (greatly simplified) model for evolutionary distance. We suppose that at a time $t_{0}$ in the past the two species we now see began to evolve away from a common ancestor species, which had a section of DNA of length $n$ similar to those we now see. Each site on that section had one of the four bases $\mathrm{A}, \mathrm{C}, \mathrm{G}$, or $\mathrm{T}$, and for each species the base at each site has since changed according to a continuous-time Markov chain with infinitesimal generator $$ G=\left(\begin{array}{cccc} -3 \gamma & \gamma & \gamma & \gamma \\ \gamma & -3 \gamma & \gamma & \gamma \\ \gamma & \gamma & -3 \gamma & \gamma \\ \gamma & \gamma & \gamma & -3 \gamma \end{array}\right) $$ independent of other sites. That is, the rate at which one base changes into, or is substituted by, another is the same for any pair of bases. (a) Check that $G$ has eigendecomposition $$ \frac{1}{4}\left(\begin{array}{cccc} 1 & -1 & -1 & -1 \\ 1 & -1 & -1 & 3 \\ 1 & -1 & 3 & -1 \\ 1 & 3 & -1 & -1 \end{array}\right)\left(\begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & -4 \gamma & 0 & 0 \\ 0 & 0 & -4 \gamma & 0 \\ 0 & 0 & 0 & -4 \gamma \end{array}\right)\left(\begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 0 & 0 & 1 \\ -1 & 0 & 1 & 0 \\ -1 & 1 & 0 & 0 \end{array}\right) $$ find its equilibrium distribution $\pi$, and show that the chain is reversible. (b) Show that exp $t G)$ has diagonal elements $\left(1+3 e^{-4 \gamma t}\right) / 4$ and off-diagonal elements $\left(1-e^{-4 \gamma t}\right) / 4$. Use this and reversibility of the chain to explain why the likelihood for $\gamma$ based on data like those above is proportional to $$ \left(1+3 e^{-8 \gamma t_{0}}\right)^{n-R}\left(1-e^{-8 \gamma t_{0}}\right)^{R} $$ where $R$ is the number of sites at which the two sections disagree. Hence find an estimate and standard error for $\gamma t_{0}$ for the data above. (c) Show that for each site, the probability of no substitution on either species in period $t$ is $1-\exp (-6 \gamma t)$, deduce that substitutions occur as a Poisson process of rate $6 \gamma$, and hence show that the estimated mean number of substitutions per site for the data above is $0.120$ Discuss the fit of this model.

Problem 3

Consider a Poisson process of intensity $\lambda$ in the plane. Find the distribution of the area of the largest disk centred on one point but containing no other points.

Problem 4

Let $Y^{\mathrm{T}}=\left(Y_{1}, \ldots, Y_{3}\right)$ be a multivariate normal variable with $$ \Omega=\left(\begin{array}{ccc} 1 & m^{-1 / 2} & \frac{1}{2} \\ m^{-1 / 2} & \frac{2}{m} & m^{-1 / 2} \\ \frac{1}{2} & m^{-1 / 2} & 1 \end{array}\right). $$ Find $\Omega^{-1}$ and hence write down the moral graph for $Y$. If $m \rightarrow \infty$, show that the distribution of $Y$ becomes degenerate while that of $\left(Y_{1}, Y_{3}\right)$ given $Y_{2}$ remains unchanged. Is the graph an adequate summary of the joint limiting distribution? Is the Markov property stable in the limit?

Problem 4

Let $Y_{1}, \ldots, Y_{n}$ represent the trajectory of a stationary two-state discrete-time Markov chain, in which $$ \operatorname{Pr}\left(Y_{j}=a \mid Y_{1}, \ldots, Y_{j-1}\right)=\operatorname{Pr}\left(Y_{j}=a \mid Y_{j-1}=b\right)=\theta_{b a}, \quad a, b=1,2 $$ note that $\theta_{11}=1-\theta_{12}$ and $\theta_{22}=1-\theta_{21}$, where $\theta_{12}$ and $\theta_{21}$ are the transition probabilities from state 1 to 2 and vice versa. Show that the likelihood can be written in form $\theta_{12}^{n_{12}}\left(1-\theta_{12}\right)^{n_{11}} \theta_{21}^{n_{21}}\left(1-\theta_{21}\right)^{n_{22}}$, where $n_{a b}$ is the number of $a \rightarrow b$ transitions in $y_{1}, \ldots, y_{n}$. Find a minimal sufficient statistic for $\left(\theta_{12}, \theta_{21}\right)$, the maximum likelihood estimates $\widehat{\theta}_{12}$ and $\widehat{\theta}_{21}$, and their asymptotic variances.

Problem 4

Consider two binary random variables with local characteristics $$ \begin{aligned} &\operatorname{Pr}\left(Y_{1}=1 \mid Y_{2}=0\right)=\operatorname{Pr}\left(Y_{1}=0 \mid Y_{2}=1\right)=1 \\ &\operatorname{Pr}\left(Y_{2}=0 \mid Y_{1}=0\right)=\operatorname{Pr}\left(Y_{2}=1 \mid Y_{1}=1\right)=1 \end{aligned} $$ Show that these do not determine a joint density for $\left(Y_{1}, Y_{2}\right) .$ Is the positivity condition satisfied?

Problem 5

Let $Y_{(1)}<\cdots0, \lambda>0 .$ Show that for $r=2, \ldots, n$ $$ \operatorname{Pr}\left(Y_{(r)}>y \mid Y_{(1)}, \ldots, Y_{(r-1)}\right)=\exp \left\\{-\lambda r\left(y-y_{(r-1)}\right)\right\\}, \quad y>y_{(r-1)} $$ and deduce that the order statistics from a general continuous distribution form a Markov process.

Problem 5

Let $X_{t}$ be a stationary first-order Markov chain with state space $\\{1, \ldots, S\\}, S>2$, and let $I_{t}$ indicate the event $X_{t}=1$. Is $\left\\{I_{t}\right\\}$ a Markov chain?

Chapter 6

Access millions of textbook solutions in one place

Recommended explanations on Math Textbooks

Discrete Mathematics

Geometry

Theoretical and Mathematical Physics

Statistics

Calculus

Applied Mathematics

Company

Product

Help