Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Question: In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor fetches the following instruction word: 10101100011000100000000000010100. Assume that data memory is all zeros and that the processor’s registers have the following values at the beginning of the cycle in which the above instruction word is fetched:

r0

r1

r2

r3

r4

r5

r6

r8

r12

r31

0

–1

2

–3

–4

10

6

8

2

–16

4.7.1 [5] What are the outputs of the sign-extend and the jump “Shift left 2” unit (near the top of Figure 4.24) for this instruction word?

4.7.2 [10] What are the values of the ALU control unit’s inputs for this instruction?

4.7.3 [10] What is the new PC address after this instruction is executed? Highlight the path through which this value is determined.

4.7.4 [10] For each Mux, show the values of its data output during the execution of this instruction and these register values.

4.7.5 [10] For the ALU and the two add units, what are their data input values?

4.7.6 [10] What are the values of all inputs for the “Registers” unit?

Short Answer

Expert verified

4.7.1

The outputs of the sign-extend is 00000000000000000000000000010100

The outputs of the jump “Shift left 2” unit is 0001100010000000000001010000

4.7.2

The values of the ALU control unit’s inputs – 00 [ALUOp], 010100 [Instruction]

4.7.3

The required values:

The new PC address - PC + 4.

The required path will be PC to Add (PC + 4) to branch Mux to jump Mux to PC.

4.7.4

The required values:

WrReg Mux - 2 or 0 (RegDst is X)

ALU Mux - 20

Mem/ALU Mux - X

Branch Mux - PC + 4

Jump Mux - PC + 4

4.7.5

The required values:

ALU: −3 and 20

Add (PC + 4) : PC and 4

Add (Branch) :

4.7.6

The required values:

Read Register 1: 3

Read Register 2: 2

Write Register: X

Write Data: X

RegWrite : 0

Step by step solution

01

Define the concept.

4.7.1

Given instruction word is 10101100011000100000000000010100.

Also given that the data memory is all zeros and that the processor’s registers have the values:

r0

0

r1

-1

r2

2

r3

-3

r4

-4

r5

10

r6

6

r8

8

r12

2

r31

-16

4.7.2

Given instruction word is 10101100011000100000000000010100.

Also given that the data memory is all zeros and that the processor’s registers have the values: 00 for the 2 bit ALUOp and 010100 for the 6 bit Instruction

4.7.3

After execution of the specified instruction, the new PC address will be PC + 4.

Thee value is computed by the path of “ PC to Add (PC + 4) to branch Mux to jump Mux to PC”.

4.7.4

During the execution of the specified instruction, the values of its data output and these corresponding values of registers are different.

4.7.5

The data input values for the ALU and the two “add” units are computed as,

For the ALU: −3 and 20, for the Add (PC + 4) : PC and 4, for the Add (Branch) : .

4.7.6

The components of the “register” unit are the WrReg Mux, the ALU Mux the Mem/ALU Mux, the Branch Mux the Jump Mux. And their values are 3, 2, X, X, 0 respectively.

02

Determine the calculation.

4.7.1

Given instruction word is 10101100011000100000000000010100.

Also given that the data memory is all zeros and that the processor’s registers have the values:

r0

0

r1

-1

r2

2

r3

-3

r4

-4

r5

10

r6

6

r8

8

r12

2

r31

-16

Following tabulated values represent the outputs of the sign-extend and the jump “Shift left 2” unit for the given instruction word.

The outputs of the sign extend

The jump “Shift left 2” unit

00000000000000000000000000010100

0001100010000000000001010000

4.7.2

Given instruction word is 10101100011000100000000000010100.

Also given that the data memory is all zeros and that the processor’s registers have the values:

r0

0

r1

-1

r2

2

r3

-3

r4

-4

r5

10

r6

6

r8

8

r12

2

r31

-16

The required values:

ALUOp

00

Instruction

010100

4.7.3

After execution of the instruction, the new PC address will be PC + 4.

The Highlighted path by which the value is computed - PC to Add (PC + 4) to branch Mux to jump Mux to PC.

4.7.4

During the execution of the specified instruction, the values of its data output and these corresponding values of registers are written below for every Mux.

WrReg Mux

2 or 0 (RegDst is X)

ALU Mux

20

Mem/ALU Mux

X

Branch Mux

PC + 4

Jump Mux

PC + 4

4.7.5

The data input values for the ALU and the two “add” units are written below:

ALU

−3 and 20

Add (PC + 4)

PC and 4

Add (Branch)

4.7.6

The following values have represented all inputs for the “Registers” unit:

Read Register 1

3

Read Register 2

2

Write Register

X

Write Data

X

RegWrite

0

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

This exercise explores how exception handling affects pipeline design. The first three problems in this exercise refer to the following two instructions:

Instruction 1

Instruction 2

BNE R1,R2, Label

LW R1,0(R1)

4.17.1 Which exceptions can each of these instructions trigger? For each of these exceptions, specify the pipeline stage in which it is detected.

4.17.2 If there is a separate handler address for each exception, show how the pipeline organization must be changed to be able to handle this exception. You can assume that the addresses of these handlers are known when the processor is designed.

4.17.3 If the second instruction is fetched right after the first instruction, describe what happens in the pipeline when the first instruction causes the first exception you listed in 4.17.1. Show the pipeline execution diagram from the time the first instruction is fetched until the time the first instruction of the exception handler is completed.

4.17.4 In vectored exception handling, the table of exception handler

addresses is in data memory at a known (fixed) address. Change the pipeline to implement this exception handling mechanism. Repeat 4.17.3 using this modified pipeline and vectored exception handling.

4.17.5 We want to emulate vectored exception handling (described in 4.17.4) on a machine that has only one fixed handler address. Write the code that should be at that fixed address. Hint: this code should identify the exception, get the right address from the exception vector table, and transfer execution to that handler.

This exercise examines the accuracy of various branch predictors for the following repeating pattern (e.g., in a loop) of branch outcomes: T, NT, T, T, NT

4.16.1 [5] What is the accuracy of always-taken and always-not-taken predictors for this sequence of branch outcomes?

4.16.2 [5] What is the accuracy of the two-bit predictor for the first4 branches in this pattern, assuming that the predictor starts off in the bottom left state from Figure 4.63 (predict not taken)?

4.16.3 [10] What is the accuracy of the two-bit predictor if this pattern is repeated forever?

4.16.4 [30] Design a predictor that would achieve a perfect accuracy if this pattern is repeated forever. You predictor should be a sequential circuit with one output that provides a prediction (1 for taken, 0 for not taken) and no inputs other than the clock and the control signal that indicates that the instruction is a conditional branch.

4.16.5 [10] What is the accuracy of your predictor from 4.16.4 if it is given a repeating pattern that is the exact opposite of this one?

4.16.6 [20] Repeat 4.16.4, but now your predictor should be able to eventually (after a warm-up period during which it can make wrong predictions) start perfectly predicting both this pattern and its opposite. Your predictor should have an input that tells it what the real outcome was. Hint: this input lets your predictor determine which of the two repeating patterns it is given.

This exercise is intended to help you understand the relationship between delay slots, control hazards, and branch execution in a pipelined processor. In this exercise, we assume that the following MIPS code is executed on a pipelined processor with a 5-stage pipeline, full forwarding, and a predict-taken branch predictor:

lw r2,0(r1)

label1: beq r2,r0,label2 # not taken once, then taken

lw r3,0(r2) beq r3,r0,label1 # taken

add r1,r3,r1

label2: sw r1,0(r2)

4.14.1 [10] Draw the pipeline execution diagram for this code, assuming there are no delay slots and that branches execute in the EX stage. 4.14.2 [10] Repeat 4.14.1, but assume that delay slots are used. In the given code, the instruction that follows the branch is now the delay slot instruction for that branch.

4.14.3 [20] One way to move the branch resolution one stage earlier is to not need an ALU operation in conditional branches. The branch instructions would be “bez rd,label” and “bnez rd,label”, and it would branch if the register has and does not have a zero value, respectively. Change this code to use these branch instructions instead of beq. You can assume that register R8 is available for you to use as a temporary register, and that an seq (set if equal) R-type instruction can be used. 366 Chapter 4 The Processor Section 4.8 describes how the severity of control hazards can be reduced by moving branch execution into the ID stage. This approach involves a dedicated comparator in the ID stage, as shown in Figure 4.62. However, this approach potentially adds to the latency of the ID stage, and requires additional forwarding logic and hazard detection.

4.14.4 [10] Using the first branch instruction in the given code as an example, describe the hazard detection logic needed to support branch execution in the ID stage as in Figure 4.62. Which type of hazard is this new logic supposed to detect?

4.14.5 [10] For the given code, what is the speedup achieved by moving branch execution into the ID stage? Explain your answer. In your speedup calculation, assume that the additional comparison in the ID stage does not affect clock cycle time. 4.14.6 [10] Using the first branch instruction in the given code as an example, describe the forwarding support that must be added to support branch execution in the ID stage. Compare the complexity of this new forwarding unit to the complexity of the existing forwarding unit in Figure 4.62.

In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies:

IF

ID

EX

MEM

WB

250ps

350ps

150ps

300ps

200ps

Also, assume that instructions executed by the processor are broken down as follows:

alu

beq

lw

sw

45%

20%

20%

15%

4.8.1 [5] What is the clock cycle time in a pipelined and non-pipelined processor?

4.8.2 [10] What is the total latency of an LW instruction in a pipelined and non-pipelined processor?

4.8.3 [10] If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor? 4.8.4 [10] Assuming there are no stalls or hazards, what is the utilization of the data memory?

4.8.5 [10] Assuming there are no stalls or hazards, what is the utilization of the write-register port of the “Registers” unit? 4.8.6 [30] Instead of a single-cycle organization, we can use a multi-cycle organization where each instruction takes multiple cycles but one instruction finishes before another is fetched. In this organization, an instruction only goes through stages it actually needs (e.g., ST only takes 4 cycles because it does not need the WB stage). Compare clock cycle times and execution times with singlecycle, multi-cycle, and pipelined organization.

In this exercise, we examine how resource hazards, control hazards, and Instruction Set Architecture (ISA) design can affect pipelined execution. Problems in this exercise refer to the following fragment of MIPS code:

sw r16,12(r6)

lw r16,8(r6)

beq r5,r4,Label # Assume r5!=r4

add r5,r1,r4

slt r5,r15,r4

Assume that individual pipeline stages have the following latencies:

IF

ID

EX

MEM

WB

200ps

120ps

150ps

190ps

100ps

4.10.1 For this problem, assume that all branches are perfectly predicted (this eliminates all control hazards) and that no delay slots are used. If we only have one memory (for both instructions and data), there is a structural hazard every time we need to fetch an instruction in the same cycle in which another instruction accesses data. To guarantee forward progress, this hazard must always be resolved in favor of the instruction that accesses data. What is the total execution time of this instruction sequence in the 5-stage pipeline that only has one memory? We have seen that data hazards can be eliminated by addingnops to the code. Can you do the same with this structural hazard? Why?

4.10.2 For this problem, assume that all branches are perfectly predicted (this eliminates all control hazards) and that no delay slots are used. If we change load/store instructions to use a register (without an offset) as the address, these instructions no longer need to use the ALU. As a result, MEM and EX stages can be overlapped and the pipeline has only 4 stages. Change this code to accommodate this changed ISA. Assuming this change does not affect clock cycle time, what speedup is achieved in this instruction sequence?

4.10.3 Assuming stall-on-branch and no delay slots, what speedup is achieved on this code if branch outcomes are determined in the ID stage, relative to the execution where branch outcomes are determined in the EX stage?

4.10.4. Given these pipeline stage latencies, repeat the speedup calculation from 4.10.2, but take into account the (possible) change in clock cycle time. When EX and MEM are done in a single stage, most of their work can be done in parallel. As a result, the resulting EX/MEM stage has a latency that is the larger of the original two, plus 20 ps needed for the work that could not be done in parallel.

4.10.5Given these pipeline stage latencies, repeat the speedup calculation from 4.10.3, taking into account the (possible) change in clock cycle time. Assume that the latency ID stage increases by 50% and the latency of the EX stage decrease by 10ps when branch outcome resolution is moved from EX to I

4.10.6 Assuming stall-on-branch and no delay slots, what is the new clock cycle time and execution time of this instruction sequence ifbeqaddress computation is moved to the MEM stage? What is the speedup from this change? Assume that the latency of the EX stage is reduced by 20 ps and the latency of the MEM stage is unchanged when branch outcome resolution is moved from EX to MEM.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free