Chapter 4: Q14E (page 365)
This exercise is intended to help you understand the relationship between delay slots, control hazards, and branch execution in a pipelined processor. In this exercise, we assume that the following MIPS code is executed on a pipelined processor with a 5-stage pipeline, full forwarding, and a predict-taken branch predictor:
lw r2,0(r1)
label1: beq r2,r0,label2 # not taken once, then taken
lw r3,0(r2) beq r3,r0,label1 # taken
add r1,r3,r1
label2: sw r1,0(r2)
4.14.1 [10] Draw the pipeline execution diagram for this code, assuming there are no delay slots and that branches execute in the EX stage. 4.14.2 [10] Repeat 4.14.1, but assume that delay slots are used. In the given code, the instruction that follows the branch is now the delay slot instruction for that branch.
4.14.3 [20] One way to move the branch resolution one stage earlier is to not need an ALU operation in conditional branches. The branch instructions would be “bez rd,label” and “bnez rd,label”, and it would branch if the register has and does not have a zero value, respectively. Change this code to use these branch instructions instead of beq. You can assume that register R8 is available for you to use as a temporary register, and that an seq (set if equal) R-type instruction can be used. 366 Chapter 4 The Processor Section 4.8 describes how the severity of control hazards can be reduced by moving branch execution into the ID stage. This approach involves a dedicated comparator in the ID stage, as shown in Figure 4.62. However, this approach potentially adds to the latency of the ID stage, and requires additional forwarding logic and hazard detection.
4.14.4 [10] Using the first branch instruction in the given code as an example, describe the hazard detection logic needed to support branch execution in the ID stage as in Figure 4.62. Which type of hazard is this new logic supposed to detect?
4.14.5 [10] For the given code, what is the speedup achieved by moving branch execution into the ID stage? Explain your answer. In your speedup calculation, assume that the additional comparison in the ID stage does not affect clock cycle time. 4.14.6 [10] Using the first branch instruction in the given code as an example, describe the forwarding support that must be added to support branch execution in the ID stage. Compare the complexity of this new forwarding unit to the complexity of the existing forwarding unit in Figure 4.62.
Short Answer
4.14.1
The required diagram of the pipeline execution for the given code:
instruction | Cycles | |||||||||||||
lw | IF | ID | EX | ME | WB | |||||||||
beq | IF | ID | - | EX | ME | WB | ||||||||
lw | IF | ID | EX | ME | WB | |||||||||
beq | IF | ID | - | EX | ME | WB | ||||||||
beq | IF | - | ID | EX | ME | WB | ||||||||
sw | IF | ID | EX | ME | WB |
4.14.2
The required diagram of the pipeline execution for the given code:
instruction | Cycles | |||||||||||||
lw | IF | ID | EX | ME | WB | |||||||||
beq | IF | - | EX | ID | ME | WB | ||||||||
lw | IF | - | ID | EX | ME | WB | ||||||||
beq | IF | ID | EX | ME | WB | |||||||||
add | IF | ID | EX | ME | WB | |||||||||
beq | IF | ID | EX | ME | WB | |||||||||
lw | IF | ID | EX | ME | WB | |||||||||
sw | IF | ID | EX | ME | WB |
4.14.3
lw r2,0(r1) |
label1: bnez r2,r0,label2 // not taken once, then taken |
lw r3,0(r2) |
bnez r3,r0,label1// taken |
add r1,r3,r1 |
label2: sw r1,0(r2) |
4.14.4
For these cases, it is considered that the preceding instruction's value is forwarded to the branch's stage of the "ID" if there is a possibility.
One of the characteristics of the hazard detection logic is it can able to detect the situation where the branch relies on the outcomes of the previous instruction of the type "R" or on the outcomes of the previous two loads.
During the time of using the values of the register operand by the branch in the stage of "ID", still the outcomes of the instruction type of "R" are being produced in the stage of "EX".
So, it is needed to stall the processor and do the repeating branch's stage of "ID" in the upcoming cycle.
For that, if the branch relies upon the load that is preceded immediately by this, the outcomes of the load are only produced the two cycles after entering the branch in the stage of the "ID". Hence, it is needed to stall the branch of two cycles.
And after all, if the branch relies upon the load that is the one before the last instruction, then the load is properly completing its stage of the "MEM" when the branch is in the stage of "ID". Hence, it is needed to stall the branch of one cycle.
4.14.5
The speed-up will be 0.93.
4.14.6
The new forwarding unit's complexity hasn't any different from the existing unit's complexity.