Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

This exercise examines the accuracy of various branch predictors for the following repeating pattern (e.g., in a loop) of branch outcomes: T, NT, T, T, NT

4.16.1 [5] What is the accuracy of always-taken and always-not-taken predictors for this sequence of branch outcomes?

4.16.2 [5] What is the accuracy of the two-bit predictor for the first4 branches in this pattern, assuming that the predictor starts off in the bottom left state from Figure 4.63 (predict not taken)?

4.16.3 [10] What is the accuracy of the two-bit predictor if this pattern is repeated forever?

4.16.4 [30] Design a predictor that would achieve a perfect accuracy if this pattern is repeated forever. You predictor should be a sequential circuit with one output that provides a prediction (1 for taken, 0 for not taken) and no inputs other than the clock and the control signal that indicates that the instruction is a conditional branch.

4.16.5 [10] What is the accuracy of your predictor from 4.16.4 if it is given a repeating pattern that is the exact opposite of this one?

4.16.6 [20] Repeat 4.16.4, but now your predictor should be able to eventually (after a warm-up period during which it can make wrong predictions) start perfectly predicting both this pattern and its opposite. Your predictor should have an input that tells it what the real outcome was. Hint: this input lets your predictor determine which of the two repeating patterns it is given.

Short Answer

Expert verified

4.16.1

The accuracy for the “always taken” is 60%.

The accuracy for the “always-not-taken” is 40%.

4.16.2

The accuracy will be 25%.

4.16.3

The accuracy will be 60%.

4.16.4

A predictor can be an “x-bit shift register” and “x” represents the number of the branch results in the destination pattern.

This can be initialized as the pattern where “0” represents the “(NT) always-not-taken” and “1” represents the “(T) always-taken”.

The prediction value always should be in the bit of the leftmost in the shift register.

After each prediction branch, the register will be shifted.

4.16.5

The accuracy will be zero.

4.16.6

The predictor will be remaining same in the part of the “d”.

But the exception is that it should be compared this prediction to the original result and toggle all of the bits in the “shift-register” only for the incorrect prediction.

The given pattern should be always predicted by the predictors.

But for the opposite pattern,

The 1st prediction will be not correct, hence the state of the predictor is toggled, and then the rest of the predictions will be correct for always.

So, no warm-up time duration exists for the specified pattern, only the warm-up time duration exists for the opposite pattern is one single branch.

Step by step solution

01

Define the concept.

4.16.1

The accuracy for the “always taken” =35×100%

Where 3 is the number of examined accuracy of the branch predictor “always taken” and 5 is the total number of examined accuracy of all branch predictors.

The accuracy for the “always-not-taken” =25×100%

Where 2 is the number of examined accuracy of the branch predictor “always-not-taken” and 5 is the total number of examined accuracy of all branch predictors.

4.16.2

Let’s assume that the predictor starts off in the bottom left state from the mentioned figure.

State

The left-over

The right-over

The left-over

The right-over

prediction

always-not-taken(“NT”)

always-not-taken(“NT”)

always-not-taken(“NT”)

always-not-taken(“NT”)

result

always taken(“T”)

always-not-taken(“NT”)

always taken(“T”)

always taken(“T”)

4.16.3

role="math" localid="1654969307828" Theaccuracywillbe=numberof"correct"statetotalnumberofstate×100%

4.16.4

A predictor can be an “x-bit shift register” and “x” represents the number of the branch results in the destination pattern.

This can be initialized as the pattern where “0” represents the “(NT) always-not-taken” and “1” represents the “(T) always-taken”.

4.16.5

The outcome of the predictor should be always opposite of the original result of the “branch” instruction.

4.16.6

The predictor will be remaining same in the part of the “d”.

But the exception is that it should be compared this prediction to the original result and toggle the of the bits in the “shift-register” only for the incorrect prediction.

The given pattern should be always predicted by the predictors.

But for the opposite pattern,

The 1st prediction will be not correct, hence the state of the predictor is toggled and then the rest of the predictions will be correct for always.

02

Determine the calculation.

4.16.1

The following written pattern represents the examined accuracy of the branch predictor:

T, NT, T, T, NT.

Where “T” is denoted as the “always taken” and “NT” is denoted as the “always-not-taken”.

The accuracy for the “always taken” =35×100%=3×20%=60%

The accuracy for the “always-not-taken” =25×100%=2×20%=40%

4.16.2

The specified figure represents the states in the scheme of “2-bit” prediction:


Let’s assume that the predictor starts off in the bottom left state from the mentioned figure.

State

The left-over

The right-over

The left-over

The right-over

prediction

always-not-taken(“NT”)

always-not-taken(“NT”)

always-not-taken(“NT”)

always-not-taken(“NT”)

result

always taken(“T”)

always-not-taken(“NT”)

always taken(“T”)

always taken(“T”)

So, the accuracy will be14×100%=25%

4.16.3

The following written pattern represents the examined accuracy of the branch predictor:

T, NT, T, T, NT.

At the first occurrence,

The values of the predictor at the time of prediction are 0,1,0,1,2.

At the second occurrence,

The values of the predictor at the time of prediction are 1,2,1,2,3.

At the third occurrence,

The values of the predictor at the time of prediction are 2,3,2,3,3.

At the fourth occurrence,

The values of the predictor at the time of prediction are 2,3,2,3,3.

In the steady-state,

Correct, incorrect, Correct, Correct, incorrect.

Hence the accuracy will be 35×100%=60%

4.16.4

After each prediction branch, the register will be shifted.

The prediction value always should be in the bit of the leftmost in the shift register.

A predictor can be an “x-bit shift register” and “x” represents the number of the branch results in the destination pattern.

This can be initialized as the pattern where “0” represents the “(NT) always-not-taken” and “1” represents the “(T) always-taken”.

4.16.5

The accuracy will be zero.

Because the outcome of the predictor should be always the opposite of the original result of the “branch” instruction.

4.16.6

No warm-up time duration exists for the specified pattern, only the warm-up time duration exists for the opposite pattern is one single branch.

The predictor will remain the same in the part of the “d”.

But the exception is that it should be compared this prediction to the original result and toggle all of the bits in the “shift-register” only for the incorrect prediction.

The given pattern should be always predicted by the predictors.

But for the opposite pattern,

The 1st prediction will be not correct, hence the state of the predictor is toggled, and then the rest of the predictions will be correct for always.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

For the problems in this exercise, assume that there are no pipeline stalls and that the breakdown of executed instructions is as follows: add addi not beq lw sw 20% 20% 0% 25% 25% 10% 4.5.1 [10] In what fraction of all cycles is the data memory used? 4.5.2 [10] In what fraction of all cycles is the input of the sign-extend circuit needed? What is this circuit doing in cycles in which its input is not needed.

Consider the following loop.

loop:lw r1,0(r1)

and r1,r1,r2

lw r1,0(r1)

lw r1,0(r1)

beq r1,r0,loop

Assume that perfect branch prediction is used (no stalls due to control hazards), that there are no delay slots, and that the pipeline has full forwarding support. Also, assume that many iterations of this loop are executed before the loop exits.

4.11.1 Show a pipeline execution diagram for the third iteration of this loop, from the cycle in which we fetch the first instruction of that iteration up to(but not including) the cycle in which we can fetch the first instruction of the next iteration. Show all instructions that are in the pipeline during these cycles (not just those from the third iteration).

4.11.2 How often (as a percentage of all cycles) do we have a cycle in which all five pipeline stages are doing useful work?

When processor designers consider a possible improvement to the processor datapath, the decision usually depends on the cost/performance trade-off . In the following three problems, assume that we are starting with a datapath from Figure 4.3, where I-Mem, Add, Mux, ALU, Regs, D-Mem, and Control blocks have latencies of 400 ps, 100 ps, 30 ps, 120 ps, 200 ps, 350 ps, and 100 ps, respectively, and costs of 1000, 30, 10, 100, 200, 2000, and 500, respectively. Consider the addition of a multiplier to the ALU. This addition will add 300 ps to the latency of the ALU and will add a cost of 600 to the ALU. The result will be 5% fewer instructions executed since we will no longer need to emulate the MUL instruction. 4.3.1 [10] What is the clock cycle time with and without this improvement? 4.3.2 [10] What is the speedup achieved by adding this improvement? 4.3.3 [10] Compare the cost/performance ratio with and without this improvement.

Question: Problems in this exercise assume that logic blocks needed to implement a processor’s datapath have the following latencies: I-Mem Add Mux ALU Regs D-Mem Sign-Extend Shift-Left-2 200ps 70ps 20ps 90ps 90ps 250ps 15ps 10ps 4.4.1 [10] If the only thing we need to do in a processor is fetch consecutive instructions (Figure 4.6), what would the cycle time be? 4.4.2 [10] Consider a datapath similar to the one in Figure 4.11, but for a processor that only has one type of instruction: unconditional PC-relative branch. What would the cycle time be for this datapath? 4.4.3 [10] Repeat 4.4.2, but this time we need to support only conditional PC-relative branches. The remaining three problems in this exercise refer to the datapath element Shift - left -2: 4.4.4 [10] Which kinds of instructions require this resource? 4.4.5 [20] For which kinds of instructions (if any) is this resource on the critical path? 4.4.6 [10] Assuming that we only support beq and add instructions, discuss how changes in the given latency of this resource affect the cycle time of the processor. Assume that the latencies of other resources do not change.

In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies:

IF

ID

EX

MEM

WB

250ps

350ps

150ps

300ps

200ps

Also, assume that instructions executed by the processor are broken down as follows:

alu

beq

lw

sw

45%

20%

20%

15%

4.8.1 [5] What is the clock cycle time in a pipelined and non-pipelined processor?

4.8.2 [10] What is the total latency of an LW instruction in a pipelined and non-pipelined processor?

4.8.3 [10] If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor? 4.8.4 [10] Assuming there are no stalls or hazards, what is the utilization of the data memory?

4.8.5 [10] Assuming there are no stalls or hazards, what is the utilization of the write-register port of the “Registers” unit? 4.8.6 [30] Instead of a single-cycle organization, we can use a multi-cycle organization where each instruction takes multiple cycles but one instruction finishes before another is fetched. In this organization, an instruction only goes through stages it actually needs (e.g., ST only takes 4 cycles because it does not need the WB stage). Compare clock cycle times and execution times with singlecycle, multi-cycle, and pipelined organization.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free