Chapter 4: Q12E (page 363)
This exercise is intended to help you understand the cost/complexity/performance trade-offs of forwarding in a pipelined processor. Problems in this exercise refer to pipelined data paths from Figure 4.45. These problems assume that, of all the instructions executed in a processor, the following fraction of these instructions have a particular type of RAW data dependence. The type of RAW data dependence is identified by the stage that produces the result (EX or MEM) and the instruction that consumes the result (1st instruction that follows the one that produces the result, 2nd instruction that follows, or both). We assume that the register write is done in the first half of the clock cycle and that register reads are done in the second half of the cycle, so “EX to 3rd” and “MEM to 3rd” dependencies are not counted because they cannot result in data hazards. Also, assume that the CPI of the processor is 1 if there are no data hazards.
Ex to 1st only | MEM to 1st only | EX to 2nd only | MEM to 2nd only | EX to 1st and MEM to 2nd | Other RAW Dependences |
5% | 20% | 5% | 10% | 10% | 10% |
Assume the following latencies for individual pipeline stages. For the EX stage, latencies are given separately for a processor without forwarding and for a processor with different kinds of forwarding.
IF | ID | EX(no FW) | EX (full FW) | EX(FW from EX/MEM only) | Ex(FW from MEM/WB only) | MEM | WB |
150ps | 100ps | 120ps | 150ps | 140ps | 130ps | 120ps | 100ps |
4.12.1 If we use no forwarding, what fraction of cycles are we stalling due to data hazards?
4.12.2 If we use full forwarding (forward all results that can be forwarded), what fraction of cycles are we staling due to data hazards?
4.12.3 Let us assume that we cannot afford to have three-input Muxes that are needed for full forwarding. We have to decide if it is better to forward only from the EX/MEM pipeline register (next-cycle forwarding) or only from the MEM/WB pipeline register (two-cycle forwarding). Which of the two options results in fewer data stall cycles?
4.12.4 For the given hazard probabilities and pipeline stage latencies, what is the speedup achieved by adding full forwarding to a pipeline that had no forwarding?
4.12.5 What would be the additional speedup (relative to a processor with forwarding) if we added time-travel forwarding that eliminates all data hazards? Assume that the yet-to-be-invented time-travel circuitry adds 100 ps to the latency of the full-forwarding EX stage.
4.12.6 Repeat 4.12.3 but this time determine which of the two options results in a shorter time per instruction.
Short Answer
4.12.1. If no forwarding is used, the stall cycles are 46%.
4.12.2. If full forwarding is used, the stall cycles are 17%.
4.12.3.MEM/WB has fewer stall cycles compared to EX/MEM.
4.12.4. The speedup achieved by adding full forwarding to a pipeline that had no forwarding is 1.54.
4.12.5.The additional speedup (relative to a processor with forwarding) if we added time-travel forwarding that eliminates all data hazards is 0.72
4.10.6 MEM/WB results in a shorter time per instruction with 202.5 ps.