Chapter 3: Problem 33

If $R_{i}$ denotes the random amount that is earned in period $i$, then $\sum_{i=1}^{\infty} \beta^{i-1} R_{i}$, where $0<\beta<1$ is a specified constant, is called the total discounted reward with discount factor $\beta .$ Let $T$ be a geometric random variable with parameter $1-\beta$ that is independent of the $R_{i} .$ Show that the expected total discounted reward is equal to the expected total (undiscounted) reward earned by time $T$. That is, show that $$ E\left[\sum_{i=1}^{\infty} \beta^{i-1} R_{i}\right]=E\left[\sum_{i=1}^{T} R_{i}\right] $$

Short Answer

Expert verified

In conclusion, using the properties of geometric random variables, the linearity of expectation, and the independence of $T$ and $R_i$, we have shown that the expected total discounted reward is equal to the expected total undiscounted reward earned by time $T$. Specifically, we have demonstrated that: $$ E\left[\sum_{i=1}^{\infty} \beta^{i-1} R_{i}\right]=E\left[\sum_{i=1}^{T} R_{i}\right]. $$

Step by step solution

(Step 1: Expected Total Discounted Reward)

(To find the expected total discounted reward, we will utilize the fact that the expectation is a linear operator. For any random variable $X$, the expectation of that variable multiplied by a constant $c$ is equal to the expectation of the variable multiplied by the constant: $E[cX] = cE[X]$. Using this property, we have: $$ E\left[\sum_{i=1}^{\infty} \beta^{i-1} R_{i}\right] = \sum_{i=1}^{\infty} E\left[\beta^{i-1} R_{i}\right] $$ Since $\beta$ is a constant for all $i$, we can write: $$ \sum_{i=1}^{\infty} E\left[\beta^{i-1} R_{i}\right] = \sum_{i=1}^{\infty} \beta^{i-1} E[R_i]. $$ )

(Step 2: Expected Total Undiscounted Reward by Time $T$)

(We will now evaluate the expected total undiscounted reward earned by time $T$. To do this, let's first compute the joint probability mass function (PMF) of $T=t$ and $R_i=r_i$ for $i=1,2,\dots,t$. Since $T$ is independent of the $R_i$s, we have: $$ P(T=t, R_1=r_1, R_2=r_2, \dots, R_t=r_t) = P(T=t) \prod_{i=1}^{t} P(R_i=r_i) $$ Now, we can compute the expected total undiscounted reward for time $t$: $$ E\left[\sum_{i=1}^{t} R_{i}\right] = \sum_{i=1}^{t} E[R_i] $$ Next, we find the expected total undiscounted reward earned by time $T$ by summing the expected values multiplied by the probability mass function of $T$ for each integer value of $t$: $$ E\left[\sum_{i=1}^{T} R_{i}\right] = \sum_{t=1}^{\infty} P(T=t) \sum_{i=1}^{t} E[R_i] $$ )

(Step 3: Show That Both Expressions Are Equal)

(We can now show that the expressions for the expected total discounted reward and the expected total undiscounted reward by time $T$ are equal. Recall that $T$ is a geometric random variable with parameter $1-\beta$ and PMF $P(T=t) = (1-\beta)^{t-1}\beta$: $$ E\left[\sum_{i=1}^{T} R_{i}\right] = \sum_{t=1}^{\infty} P(T=t) \sum_{i=1}^{t} E[R_i] = \sum_{t=1}^{\infty} (1-\beta)^{t-1}\beta \sum_{i=1}^{t} E[R_i] $$ Now, let's manipulate this expression further: $$ \sum_{t=1}^{\infty} (1-\beta)^{t-1}\beta \sum_{i=1}^{t} E[R_i] = \sum_{t=1}^{\infty} \beta^{t-1} (1-\beta) \sum_{i=1}^{t} E[R_i] $$ Now we can change the order of summation, first summing over $t$ and then summing over $i$: $$ \sum_{t=1}^{\infty} \beta^{t-1} (1-\beta) \sum_{i=1}^{t} E[R_i] = \sum_{i=1}^\infty E[R_i] \sum_{t=i}^\infty \beta^{t-1} (1-\beta) $$ Using the geometric series formula and noting that $\sum_{t=i}^{\infty} \beta^{t-1} (1-\beta) = 1$, we have: $$ \sum_{i=1}^\infty E[R_i] \sum_{t=i}^\infty \beta^{t-1} (1-\beta) =\sum_{i=1}^{\infty} E[R_i] =\sum_{i=1}^{\infty} \beta^{i-1} E[R_i] = E\left[\sum_{i=1}^{\infty} \beta^{i-1} R_{i}\right] $$ Therefore, we have shown that the expected total discounted reward is equal to the expected total undiscounted reward earned by time $T$. That is, $$ E\left[\sum_{i=1}^{\infty} \beta^{i-1} R_{i}\right]=E\left[\sum_{i=1}^{T} R_{i}\right]. $$ )

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Short Answer

Step by step solution

(Step 1: Expected Total Discounted Reward)

(Step 2: Expected Total Undiscounted Reward by Time \(T\))

(Step 3: Show That Both Expressions Are Equal)

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Math Textbooks

Mechanics Maths

Statistics

Discrete Mathematics

Decision Maths

Calculus

Pure Maths

Study anywhere. Anytime. Across all devices.

Company

Product

Help