Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

This exercise is intended to help you understand the relationship between delay slots, control hazards, and branch execution in a pipelined processor. In this exercise, we assume that the following MIPS code is executed on a pipelined processor with a 5-stage pipeline, full forwarding, and a predict-taken branch predictor:

lw r2,0(r1)

label1: beq r2,r0,label2 # not taken once, then taken

lw r3,0(r2) beq r3,r0,label1 # taken

add r1,r3,r1

label2: sw r1,0(r2)

4.14.1 [10] Draw the pipeline execution diagram for this code, assuming there are no delay slots and that branches execute in the EX stage. 4.14.2 [10] Repeat 4.14.1, but assume that delay slots are used. In the given code, the instruction that follows the branch is now the delay slot instruction for that branch.

4.14.3 [20] One way to move the branch resolution one stage earlier is to not need an ALU operation in conditional branches. The branch instructions would be “bez rd,label” and “bnez rd,label”, and it would branch if the register has and does not have a zero value, respectively. Change this code to use these branch instructions instead of beq. You can assume that register R8 is available for you to use as a temporary register, and that an seq (set if equal) R-type instruction can be used. 366 Chapter 4 The Processor Section 4.8 describes how the severity of control hazards can be reduced by moving branch execution into the ID stage. This approach involves a dedicated comparator in the ID stage, as shown in Figure 4.62. However, this approach potentially adds to the latency of the ID stage, and requires additional forwarding logic and hazard detection.

4.14.4 [10] Using the first branch instruction in the given code as an example, describe the hazard detection logic needed to support branch execution in the ID stage as in Figure 4.62. Which type of hazard is this new logic supposed to detect?

4.14.5 [10] For the given code, what is the speedup achieved by moving branch execution into the ID stage? Explain your answer. In your speedup calculation, assume that the additional comparison in the ID stage does not affect clock cycle time. 4.14.6 [10] Using the first branch instruction in the given code as an example, describe the forwarding support that must be added to support branch execution in the ID stage. Compare the complexity of this new forwarding unit to the complexity of the existing forwarding unit in Figure 4.62.

Short Answer

Expert verified

4.14.1

The required diagram of the pipeline execution for the given code:

instructionCycles

lw

IF

ID

EX

ME

WB

beq

IF

ID

-

EX

ME

WB

lw

IF

ID

EX

ME

WB

beq

IF

ID

-

EX

ME

WB

beq

IF

-

ID

EX

ME

WB

sw

IF

ID

EX

ME

WB

4.14.2

The required diagram of the pipeline execution for the given code:

instructionCycles

lw

IF

ID

EX

ME

WB

beq

IF

-

EX

ID

ME

WB

lw

IF

-

ID

EX

ME

WB

beq

IF

ID

EX

ME

WB

add

IF

ID

EX

ME

WB

beq

IF

ID

EX

ME

WB

lw

IF

ID

EX

ME

WB

sw

IF

ID

EX

ME

WB

4.14.3

lw r2,0(r1)

label1: bnez r2,r0,label2 // not taken once, then taken

lw r3,0(r2)

bnez r3,r0,label1// taken

add r1,r3,r1

label2: sw r1,0(r2)

4.14.4

For these cases, it is considered that the preceding instruction's value is forwarded to the branch's stage of the "ID" if there is a possibility.

One of the characteristics of the hazard detection logic is it can able to detect the situation where the branch relies on the outcomes of the previous instruction of the type "R" or on the outcomes of the previous two loads.

During the time of using the values of the register operand by the branch in the stage of "ID", still the outcomes of the instruction type of "R" are being produced in the stage of "EX".

So, it is needed to stall the processor and do the repeating branch's stage of "ID" in the upcoming cycle.

For that, if the branch relies upon the load that is preceded immediately by this, the outcomes of the load are only produced the two cycles after entering the branch in the stage of the "ID". Hence, it is needed to stall the branch of two cycles.

And after all, if the branch relies upon the load that is the one before the last instruction, then the load is properly completing its stage of the "MEM" when the branch is in the stage of "ID". Hence, it is needed to stall the branch of one cycle.

4.14.5

The speed-up will be 0.93.

4.14.6

The new forwarding unit's complexity hasn't any different from the existing unit's complexity.

Step by step solution

01

Define the concept.

4.14.1

For the instruction “lw” the cycles are - IF, ID, EX, ME, WB.

For the instruction “beq” the cycles are - IF, ID, -, EX, ME, WB.

For the instruction “lw” the cycles are - IF, ID, EX, ME, WB.

For the instruction “beq” the cycles are - IF, ID,-, EX, ME, WB.

For the instruction “add” the cycles are - IF, ID,-, EX, ME, WB.

For the instruction “lw” the cycles are - IF,-, ID, EX, ME, WB.

For the instruction “sw” the cycles are - IF, ID, EX, ME, WB. .

4.14.2

For the instruction “lw” the cycles are - IF, ID, EX, ME, WB.

For the instruction “beq” the cycles are- IF,-, ID, -, EX, ME, WB.

For the instruction “lw” the cycles are - IF,-, ID, EX, ME, WB.

For the instruction “beq” the cycles are - IF, ID,-, EX, ME, WB.

For the instruction “beq” the cycles are - IF,-, ID, EX, ME, WB.

For the instruction “sw” the cycles are-IF, ID, EX, ME, WB. .

4.14.3

The desired sequence of instruction:

lw r2,0(r1)

label1: bnez r2,r0,label2 // not taken once, then taken

lw r3,0(r2)

bnez r3,r0,label1// taken

add r1,r3,r1

label2: sw r1,0(r2)

4.14.4

One of the characteristics of the hazard detection logic is it can able to detect the situation where the branch relies on the outcomes of the previous instruction of the type "R" or on the outcomes of the previous two loads.

During the time of using the values of the register operand by the branch in the stage of "ID", still the outcomes of the instruction type of "R" are being produced in the stage of "EX".

So, it is needed to stall the processor and do the repeating branch's stage of "ID" in the upcoming cycle.

For that, if the branch relies upon the load that is preceded immediately by this, the outcomes of the load are only produced the two cycles after entering the branch in the stage of the "ID". Hence, it is needed to stall the branch of two cycles.

And after all, if the branch relies upon the load that is the one before the last instruction, then the load is properly completing its stage of the "MEM" when the branch is in the stage of "ID". Hence, it is needed to stall the branch of one cycle.

For these three cases, the hazard is referred to as the "data hazard".

For these three cases, it is considered that the preceding instruction's value is forwarded to the branch's stage of the "ID" if there is a possibility.

4.14.5

In previous, 14 pipeline cycle was required.

Now, 15 pipeline cycle is required.

Seed-up=PreviousPresent

4.14.6

Now, the instructions of the "branch" are executed in the stage of "ID".

If the instructions of the "branch" are using the value of the register that is produced by the instruction preceded immediately.

And if the "branch" in the stage of "ID" relies on the instruction of type "R", which is in the stage of "MEM" then it is needed to forward for making sure the proper branch execution.

Hence, it is needed to forward another unit that can be able to accept the same input as that which is forwarded to the stage of "EX".

And the new unit of forwarding should be able to control the two mux that is placed that right before the branch comparator.

Each value is selected by the mux from the register. And the "ALU" output from the pipeline register of the "EX/MEM", and the value of data from the pipeline register of the "MEM/WB".

The new forwarding unit's complexity hasn't any different from the existing unit's complexity.

02

Determine the calculation.

4.14.1

Here, no delay slots exist and the branches are executed in the stage of “EX”.

The required diagram of the pipeline execution for the given code:

instructionCycles

lw

IF

ID

EX

ME

WB

beq

IF

ID

-

EX

ME

WB

lw

IF

ID

EX

ME

WB

beq

IF

ID

-

EX

ME

WB

beq

IF

-

ID

EX

ME

WB

sw

IF

ID

EX

ME

WB

4.14.2

Here, delay slots exist.

The required diagram of the pipeline execution for the given code:

instructionCycles

lw

IF

ID

EX

ME

WB

beq

IF

-

EX

ID

ME

WB

lw

IF

-

ID

EX

ME

WB

beq

IF

ID

EX

ME

WB

add

IF

ID

EX

ME

WB

beq

IF

ID

EX

ME

WB

lw

IF

ID

EX

ME

WB

sw

IF

ID

EX

ME

WB

instructionCycles


IF

ID

EX

ME

WB

beq

IF

-

-

ID

EX

ME

WB

lw

IF

ID

EX

ME

WB

beq

IF

-

-

ID

EX

ME

WB

beq

IF

-

ID

EX

ME

WB

sw

IF

ID

EX

ME

WB

4.14.3

The desired sequence of instruction:

lw r2,0(r1)

label1: bnez r2,r0,label2 // not taken once, then taken

lw r3,0(r2)

bnez r3,r0,label1// taken

add r1,r3,r1

label2: sw r1,0(r2)

4.14.4

For these below mentioned cases, it is considered that the preceding instruction's value is forwarded to the branch's stage of the "ID" if there is a possibility.

If the branch relies upon the load that is preceded immediately by this, the outcomes of the load are only produced the two cycles after entering the branch in the stage of the "ID". Hence, it is needed to stall the branch of two cycles.

And after that, if the branch relies upon the load that is the one before the last instruction, then the load is properly completing its stage of the "MEM" when the branch is in the stage of "ID". Hence, it is needed to stall the branch of one cycle.

One of the characteristics of the hazard detection logic is it can able to detect the situation where the branch relies on the outcomes of the previous instruction of the type "R" or on the outcomes of the previous two loads.

During the time of using the values of the register operand by the branch in the stage of "ID", still the outcomes of the instruction type of "R" are being produced in the stage of "EX".

So, it is needed to stall the processor and do the repeating branch's stage of "ID" in the upcoming cycle.

For these cases, the hazard is referred to as the "data hazard".

4.14.5

In previous, 14 pipeline cycle was required.

Now, 15 pipeline cycle is required.

So,

Speed up = 0.93

4.14.6

The new forwarding unit's complexity hasn't any different from the existing unit's complexity.

Each value is selected by the mux from the register. And the "ALU" output from the pipeline register of the "EX/MEM", and the value of data from the pipeline register of the "MEM/WB".

And if the "branch" in the stage of "ID" relies on the instruction of type "R", which is in the stage of "MEM" then it is needed to forward for making sure the proper branch execution.

Now, the instructions of the "branch" are executed in the stage of "ID".

If the instructions of the "branch" are using the value of the register that is produced by the instruction preceded immediately.

Hence, it is needed to forward another unit that can be able to accept the same input as that which is forwarded to the stage of "EX".

And the new unit of forwarding should be able to control the two mux that is placed that right before the branch comparator.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Question: Consider the following instruction: Instruction: AND Rd,Rs,Rt Interpretation: Reg[Rd] = Reg[Rs] AND Reg[Rt] 4.1.1 [5] What are the values of control signals generated by the control in Figure 4.2 for the above instruction? 4.1.2 [5] Which resources (blocks) perform a useful function for this instruction? 4.1.3 [10] Which resources (blocks) produce outputs, but their outputs are not used for this instruction? Which resources produce no outputs for this instruction?.

This exercise explores energy efficiency and its relationship with performance. Problems in this exercise assume the following energy consumption for activity in Instruction memory, Registers, and Data memory. You can assume that the other components of the datapath spend a negligible amount of energy.

Assume that components in the datapath have the following latencies. You can assume that the other components of the datapath have negligible latencies.

4.19.1 [10] How much energy is spent to execute an ADD instruction in a single-cycle and in 5-stage pipelined design?

4.19.2 [10] What is the worst-case MIPS instruction in terms of energy consumption, and what is the energy spent to execute it?

4.19.3 [10] If energy reduction is paramount, how would you change the pipelined design? What is the percentage reduction in the energy spent by an LW instruction after this change?

4.19.4 [10] What is the performance impact of your changes from 4.19.3?

4.19.5 [10]We can eliminate the MemRead control signal and have

the data memory be read in every cycle, i.e., we can permanently have MemRead=1. Explain why the processor still functions correctly aft er this change. What is the effect of this change on clock frequency and energy consumption?

4.19.6 [10] If an idle unit spends 10% of the power it would spend

if it were active, what is the energy spent by the instruction memory in each cycle? What percentage of the overall energy spent by the instruction memory does this idle energy represent?

This exercise is intended to help you understand the relationship between forwarding, hazard detection, and ISA design. Problems in this exercise refer to the following sequence of instructions, and assume that it is executed on a 5-stage pipelined datapath:

add r5,r2,r1

lw r3,4(r5)

lw r2,0(r2)

or r3,r5,r3

sw r3,0(r5)

4.13.1 [5] If there is no forwarding or hazard detection, insert nops to ensure correct execution.

4.13.2 [10] Repeat 4.13.1 but now use nops only when a hazard cannot be avoided by changing or rearranging these instructions. You can assume register R7 can be used to hold temporary values in your modified code.

4.13.3 [10] If the processor has forwarding, but we forgot to implement the hazard detection unit, what happens when this code executes? 4.13.4 [20] If there is forwarding, for the first five cycles during the execution of this code, specify which signals are asserted in each cycle by hazard detection and forwarding units in Figure 4.60.

4.13.5 [10] If there is no forwarding, what new inputs and output signals do we need for the hazard detection unit in Figure 4.60? Using this instruction sequence as an example, explain why each signal is needed. 4.13.6 [20] For the new hazard detection unit from 4.13.5, specify which output signals it asserts in each of the first five cycles during the execution of this code.

This exercise examines the accuracy of various branch predictors for the following repeating pattern (e.g., in a loop) of branch outcomes: T, NT, T, T, NT

4.16.1 [5] What is the accuracy of always-taken and always-not-taken predictors for this sequence of branch outcomes?

4.16.2 [5] What is the accuracy of the two-bit predictor for the first4 branches in this pattern, assuming that the predictor starts off in the bottom left state from Figure 4.63 (predict not taken)?

4.16.3 [10] What is the accuracy of the two-bit predictor if this pattern is repeated forever?

4.16.4 [30] Design a predictor that would achieve a perfect accuracy if this pattern is repeated forever. You predictor should be a sequential circuit with one output that provides a prediction (1 for taken, 0 for not taken) and no inputs other than the clock and the control signal that indicates that the instruction is a conditional branch.

4.16.5 [10] What is the accuracy of your predictor from 4.16.4 if it is given a repeating pattern that is the exact opposite of this one?

4.16.6 [20] Repeat 4.16.4, but now your predictor should be able to eventually (after a warm-up period during which it can make wrong predictions) start perfectly predicting both this pattern and its opposite. Your predictor should have an input that tells it what the real outcome was. Hint: this input lets your predictor determine which of the two repeating patterns it is given.

4.18 In this exercise we compare the performance of 1-issue and 2-issue processors, taking into account program transformations that can be made to optimize for 2-issue execution. Problems in this exercise refer to the following loop (written in C):

for(i=0;i!=j;i+=2)

b[i]=a[i]–a[i+1];

When writing MIPS code, assume that variables are kept in registers as follows, and that all registers except those indicated as Free are used to keep various variables, so they cannot be used for anything else.

I

j

a

b

c

Free

R5

R6

R1

R2

R3

R10, R11, R12

4.18.1 [10] Translate this C code into MIPS instructions. Your translation should be direct, without rearranging instructions to achieve better performance.

4.18.2 [10] If the loop exits aft er executing only two iterations, draw a pipeline diagram for your MIPS code from 4.18.1 executed on a 2-issue processor shown in Figure 4.69. Assume the processor has perfect branch prediction and can fetch any two instructions (not just consecutive instructions) in the same cycle.

4.18.3 [10] Rearrange your code from 4.18.1 to achieve better performance on a 2-issue statically scheduled processor from Figure 4.69.

4.18.4 [10] Repeat 4.18.2, but this time use your MIPS code from 4.18.3. 4.18.5 [10] What is the speedup of going from a 1-issue processor to a 2-issue processor from Figure 4.69? Use your code from 4.18.1 for both 1-issue and 2-issue, and assume that 1,000,000 iterations of the loop are executed. As in 4.18.2, assume that the processor has perfect branch predictions, and that a 2-issue processor can fetch any two instructions in the same cycle.

4.18.6 [10] Repeat 4.18.5, but this time assume that in the 2-issue processor one of the instructions to be executed in a cycle can be of any kind, and the other must be a non-memory instruction

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free