Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

This exercise is intended to help you understand the relationship between forwarding, hazard detection, and ISA design. Problems in this exercise refer to the following sequence of instructions, and assume that it is executed on a 5-stage pipelined datapath:

add r5,r2,r1

lw r3,4(r5)

lw r2,0(r2)

or r3,r5,r3

sw r3,0(r5)

4.13.1 [5] If there is no forwarding or hazard detection, insert nops to ensure correct execution.

4.13.2 [10] Repeat 4.13.1 but now use nops only when a hazard cannot be avoided by changing or rearranging these instructions. You can assume register R7 can be used to hold temporary values in your modified code.

4.13.3 [10] If the processor has forwarding, but we forgot to implement the hazard detection unit, what happens when this code executes? 4.13.4 [20] If there is forwarding, for the first five cycles during the execution of this code, specify which signals are asserted in each cycle by hazard detection and forwarding units in Figure 4.60.

4.13.5 [10] If there is no forwarding, what new inputs and output signals do we need for the hazard detection unit in Figure 4.60? Using this instruction sequence as an example, explain why each signal is needed. 4.13.6 [20] For the new hazard detection unit from 4.13.5, specify which output signals it asserts in each of the first five cycles during the execution of this code.

Short Answer

Expert verified

4.13.1

The required sequence of the instructions:

add r5, r2, r1

nop

nop

ld r3, 4(r5)

ld r2, 0(x2)

nop

or r3, r5, r3

nop

nop

sd r3, 0(r5)

4.13.2

In the sequence of instructions, “nop” was used only when the necessary purpose. So, this can’t possible to change or rearrange these instructions.

4.13.3

For the given condition, the code will be executed without any obstacles.

4.13.4

Which signals are asserted in every cycle is specified by the hazard detection and the units of forwarding in the mentioned figure:

instruction
Cycles

add

IF

ID

EX

ME

WB

ld

IF

ID

EX

ME

WB

ld

IF

ID

EX

ME

WB

or

IF

ID

EX

ME

WB

sd

IF

ID

EX

ME

WB

4.13.5

Further, no signal is needed.

4.13.6

After assertion of each output signal is specified in every cycle:

instruction
Cycles
Values

add

IF

ID

EX

ME

WB

PCWrite -1

ld

IF

ID

EX

ME

WB

PCWrite-1

ld

IF

ID

EX

ME

WB

PCWrite-1

or

IF

ID

EX

ME

WB

PCWrite-0

sd

IF

ID

EX

ME

WB

PCWrite-0

Step by step solution

01

Define the concept.

4.13.1

If neither forwarding nor hazard detection exists.

The instruction “nops” is inserted two times after the instruction “add”.

The instruction “nops” is inserted one time after the instruction “ld r2, 0(x2)”.

The instruction “nops” is inserted two times after the instruction “or r3, r5, r3”.

4.13.2

add r5, r2, r1

nop

nop

ld r3, 4(r5)

ld r2, 0(x2)

nop

or r3, r5, r3

nop

nop

sd r3, 0(r5)

In the sequence of instructions,

The instruction “nops” is inserted two times after the instruction “add”.

The instruction “nops” is inserted one time after the instruction “ld r2, 0(x2)”.

The instruction “nops” is inserted two times after the instruction “or r3, r5, r3”.

Therefore, “nop” was used only when the necessary purpose. So, this can’t possible to change or rearrange these instructions.

4.13.3

Hazard detection is only needed for inserting the stall when any load that uses the result of the load is followed by the instruction.

Hence, that does not occur for this purpose.

Hence, no obstacle will be created for this.

4.13.4

For the instruction “add” ,

In the first cycle “IF”.

In the second cycle “ID”.

In the third cycle “EX”.

In the fourth cycle “ME”.

In the fifth cycle “WB”.

For the instruction “ld” ,

In the second cycle “IF”.

In the third cycle “ID”.

In the fourth cycle “EX”.

In the fifth cycle “ME”.

In the sixth cycle “WB”.

For the instruction “ld” ,

In the third cycle “IF”.

In the fourth cycle “ID”.

In the fifth cycle “EX”.

In the sixth cycle “ME”.

In the seventh cycle “WB”.

For the instruction “or” ,

In the fourth cycle “IF”.

In the fifth cycle “ID”.

In the sixth cycle “EX”.

In the seventh cycle “ME”.

In the eigth cycle “WB”.

For the instruction “sd” ,

In the fifth cycle “IF”.

In the sixth cycle “ID”.

In the seventh cycle “EX”.

In the eigth cycle “ME”.

In the ninth cycle “WB”.

4.13.5

The instruction which is in the stages of "ID" requires to stall if it relys on the produced value by the "EX" stage instruction or the "MEM" stage instruction . Hence, it is needed for checking the destination register of these two specified instructions.

For the "EX" stage instruction, it is needed for checking the “Rd” for instruction of type "R" and “RD” for loading.

For the "MEM" stage instruction, already the destination register has been selected. Hence, it is needed for checking the number of the “register”.

The further inputs to the unit of the "hazard detection” are register "Rd" from the pipeline register of the "ID/EX" and the output register number from the pipeline register of the "EX/MEM".

The field of "Rt" from the register of "ID/EX" already is the hazard detection input unit in the mentioned figure.

4.13.6

The PCWrite will contain 1 for the first instruction.

The PCWrite will contain 1 for the second instruction.

The PCWrite will contain 1 for the third instruction.

The PCWrite will contain 0 for the fourth instruction.

The PCWrite will contain 0 for the fifth instruction.

02

Determine the calculation.

4.13.1

Here, neither forwarding nor hazard detection exists.

After inserting the instruction “nops” by ensuring the correct execution.

The required sequence of the instructions:

add r5, r2, r1

nop

nop

ld r3, 4(r5)

ld r2, 0(x2)

nop

or r3, r5, r3

nop

nop

sd r3, 0(r5)

4.13.2

add r5, r2, r1

nop

nop

ld r3, 4(r5)

ld r2, 0(x2)

nop

or r3, r5, r3

nop

nop

sd r3, 0(r5)

In this sequence of the instructions, “nop” was used only when the necessary purpose. So, this can’t possible to change or rearrange these instructions.

4.13.3

For the given condition, the code will be executed without any obstacles.

Hazard detection is only needed for inserting the stall when any load that uses the result of the load is followed by the instruction.

Hence, that does not occur for this purpose.

4.13.4

Here, no forwarding exists in the first five cycles.

Which signals are asserted in every cycle is specified by the hazard detection and the units of forwarding in the mentioned figure:

instructionCycles

add

IF

ID

EX

ME

WB

ld

IF

ID

EX

ME

WB

ld

IF

ID

EX

ME

WB

or

IF

ID

EX

ME

WB

sd

IF

ID

EX

ME

WB

Details of the forwarding:

ForwardA

ForwardB

Explanation

X

X

There are no instructions for the “EX” stage.

X

X

There are no instructions for the “EX” stage.

0

0

There are no forwarding values that are taken from the registers.

2

0

The base register is taken from the result of previous instruction

0

0

base register taken from registers

0

1

rs1 = x15 taken from register rs2 = x13 taken from result of 1st ld – two instructions ago

0

0

base register is taken from the register file

4.13.5

The required sequence of the instructions:

add r5, r2, r1

nop

nop

ld r3, 4(r5)

ld r2, 0(x2)

nop

or r3, r5, r3

nop

nop

sd r3, 0(r5)

Further, no signal is needed.

4.13.6

After assertion of each output signal is specified in every cycle:

instructionCyclesValues

add

IF

ID

EX

ME

WB

PCWrite -1

ld

IF

ID

EX

ME

WB

PCWrite-1

ld

IF

ID

EX

ME

WB

PCWrite-1

or

IF

ID

EX

ME

WB

PCWrite-0

sd

IF

ID

EX

ME

WB

PCWrite-0

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

When silicon chips are fabricated, defects in materials (e.g., silicon) and manufacturing errors can result in defective circuits. A very common defect is for one wire to affect the signal in another. This is called a cross-talk fault. A special class of cross-talk faults is when a signal is connected to a wire that has a constant logical value (e.g., a power supply wire). In this case we have a stuck-at-0 or a stuckat-1 fault, and the affected signal always has a logical value of 0 or 1, respectively. The following problems refer to bit 0 of the Write Register input on the register fi le in Figure 4.24. 4.6.1 [10] Let us assume that processor testing is done by filling the PC, registers, and data and instruction memories with some values (you can choose which values), letting a single instruction execute, then reading the PC, memories, and registers. These values are then examined to determine if a particular fault is present. Can you design a test (values for PC, memories, and registers) that would determine if there is a stuck-at-0 fault on this signal? 4.6.2 [10] Repeat 4.6.1 for a stuck-at-1 fault. Can you use a single test for both stuck-at-0 and stuck-at-1? If yes, explain how; if no, explain why not. 4.6.3 [60] If we know that the processor has a stuck-at-1 fault on this signal, is the processor still usable? To be usable, we must be able to convert any program that executes on a normal MIPS processor into a program that works on this processor. You can assume that there is enough free instruction memory and data memory to let you make the program longer and store additional data. Hint: the processor is usable if every instruction “broken” by this fault can be replaced with a sequence of “working” instructions that achieve the same effect. 4.6.4 [10] Repeat 4.6.1, but now the fault to test for is whether the “MemRead” control signal becomes 0 if RegDst control signal is 0, no fault otherwise. 4.6.5 [10] Repeat 4.6.4, but now the fault to test for is whether the “Jump” control signal becomes 0 if RegDst control signal is 0, no fault otherwise.

This exercise explores how exception handling affects pipeline design. The first three problems in this exercise refer to the following two instructions:

Instruction 1

Instruction 2

BNE R1,R2, Label

LW R1,0(R1)

4.17.1 Which exceptions can each of these instructions trigger? For each of these exceptions, specify the pipeline stage in which it is detected.

4.17.2 If there is a separate handler address for each exception, show how the pipeline organization must be changed to be able to handle this exception. You can assume that the addresses of these handlers are known when the processor is designed.

4.17.3 If the second instruction is fetched right after the first instruction, describe what happens in the pipeline when the first instruction causes the first exception you listed in 4.17.1. Show the pipeline execution diagram from the time the first instruction is fetched until the time the first instruction of the exception handler is completed.

4.17.4 In vectored exception handling, the table of exception handler

addresses is in data memory at a known (fixed) address. Change the pipeline to implement this exception handling mechanism. Repeat 4.17.3 using this modified pipeline and vectored exception handling.

4.17.5 We want to emulate vectored exception handling (described in 4.17.4) on a machine that has only one fixed handler address. Write the code that should be at that fixed address. Hint: this code should identify the exception, get the right address from the exception vector table, and transfer execution to that handler.

This exercise is intended to help you understand the cost/complexity/performance trade-offs of forwarding in a pipelined processor. Problems in this exercise refer to pipelined data paths from Figure 4.45. These problems assume that, of all the instructions executed in a processor, the following fraction of these instructions have a particular type of RAW data dependence. The type of RAW data dependence is identified by the stage that produces the result (EX or MEM) and the instruction that consumes the result (1st instruction that follows the one that produces the result, 2nd instruction that follows, or both). We assume that the register write is done in the first half of the clock cycle and that register reads are done in the second half of the cycle, so “EX to 3rd” and “MEM to 3rd” dependencies are not counted because they cannot result in data hazards. Also, assume that the CPI of the processor is 1 if there are no data hazards.

Ex to 1st only

MEM to 1st only

EX to 2nd only

MEM to 2nd only

EX to 1st and MEM to 2nd

Other RAW Dependences

5%

20%

5%

10%

10%

10%

Assume the following latencies for individual pipeline stages. For the EX stage, latencies are given separately for a processor without forwarding and for a processor with different kinds of forwarding.

IF

ID

EX(no FW)

EX (full FW)

EX(FW from EX/MEM only)

Ex(FW from MEM/WB only)

MEM

WB

150ps

100ps

120ps

150ps

140ps

130ps

120ps

100ps

4.12.1 If we use no forwarding, what fraction of cycles are we stalling due to data hazards?

4.12.2 If we use full forwarding (forward all results that can be forwarded), what fraction of cycles are we staling due to data hazards?

4.12.3 Let us assume that we cannot afford to have three-input Muxes that are needed for full forwarding. We have to decide if it is better to forward only from the EX/MEM pipeline register (next-cycle forwarding) or only from the MEM/WB pipeline register (two-cycle forwarding). Which of the two options results in fewer data stall cycles?

4.12.4 For the given hazard probabilities and pipeline stage latencies, what is the speedup achieved by adding full forwarding to a pipeline that had no forwarding?

4.12.5 What would be the additional speedup (relative to a processor with forwarding) if we added time-travel forwarding that eliminates all data hazards? Assume that the yet-to-be-invented time-travel circuitry adds 100 ps to the latency of the full-forwarding EX stage.

4.12.6 Repeat 4.12.3 but this time determine which of the two options results in a shorter time per instruction.

In this exercise, we examine how resource hazards, control hazards, and Instruction Set Architecture (ISA) design can affect pipelined execution. Problems in this exercise refer to the following fragment of MIPS code:

sw r16,12(r6)

lw r16,8(r6)

beq r5,r4,Label # Assume r5!=r4

add r5,r1,r4

slt r5,r15,r4

Assume that individual pipeline stages have the following latencies:

IF

ID

EX

MEM

WB

200ps

120ps

150ps

190ps

100ps

4.10.1 For this problem, assume that all branches are perfectly predicted (this eliminates all control hazards) and that no delay slots are used. If we only have one memory (for both instructions and data), there is a structural hazard every time we need to fetch an instruction in the same cycle in which another instruction accesses data. To guarantee forward progress, this hazard must always be resolved in favor of the instruction that accesses data. What is the total execution time of this instruction sequence in the 5-stage pipeline that only has one memory? We have seen that data hazards can be eliminated by addingnops to the code. Can you do the same with this structural hazard? Why?

4.10.2 For this problem, assume that all branches are perfectly predicted (this eliminates all control hazards) and that no delay slots are used. If we change load/store instructions to use a register (without an offset) as the address, these instructions no longer need to use the ALU. As a result, MEM and EX stages can be overlapped and the pipeline has only 4 stages. Change this code to accommodate this changed ISA. Assuming this change does not affect clock cycle time, what speedup is achieved in this instruction sequence?

4.10.3 Assuming stall-on-branch and no delay slots, what speedup is achieved on this code if branch outcomes are determined in the ID stage, relative to the execution where branch outcomes are determined in the EX stage?

4.10.4. Given these pipeline stage latencies, repeat the speedup calculation from 4.10.2, but take into account the (possible) change in clock cycle time. When EX and MEM are done in a single stage, most of their work can be done in parallel. As a result, the resulting EX/MEM stage has a latency that is the larger of the original two, plus 20 ps needed for the work that could not be done in parallel.

4.10.5Given these pipeline stage latencies, repeat the speedup calculation from 4.10.3, taking into account the (possible) change in clock cycle time. Assume that the latency ID stage increases by 50% and the latency of the EX stage decrease by 10ps when branch outcome resolution is moved from EX to I

4.10.6 Assuming stall-on-branch and no delay slots, what is the new clock cycle time and execution time of this instruction sequence ifbeqaddress computation is moved to the MEM stage? What is the speedup from this change? Assume that the latency of the EX stage is reduced by 20 ps and the latency of the MEM stage is unchanged when branch outcome resolution is moved from EX to MEM.

When processor designers consider a possible improvement to the processor datapath, the decision usually depends on the cost/performance trade-off . In the following three problems, assume that we are starting with a datapath from Figure 4.3, where I-Mem, Add, Mux, ALU, Regs, D-Mem, and Control blocks have latencies of 400 ps, 100 ps, 30 ps, 120 ps, 200 ps, 350 ps, and 100 ps, respectively, and costs of 1000, 30, 10, 100, 200, 2000, and 500, respectively. Consider the addition of a multiplier to the ALU. This addition will add 300 ps to the latency of the ALU and will add a cost of 600 to the ALU. The result will be 5% fewer instructions executed since we will no longer need to emulate the MUL instruction. 4.3.1 [10] What is the clock cycle time with and without this improvement? 4.3.2 [10] What is the speedup achieved by adding this improvement? 4.3.3 [10] Compare the cost/performance ratio with and without this improvement.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free