Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Consider the following instruction: Instruction: AND Rd,Rs,Rt Interpretation: Reg[Rd] = Reg[Rs] AND Reg[Rt] 4.1.1 [5] What are the values of control signals generated by the control in Figure 4.2 for the above instruction? 4.1.2 [5] Which resources (blocks) perform a useful function for this instruction? 4.1.3 [10] Which resources (blocks) produce outputs, but their outputs are not used for this instruction? Which resources produce no outputs for this instruction?

Short Answer

Expert verified

4.1.1

The required values are written below:

RegWrite

Memread

ALUMUX

MemWrite

ALUOP

RegMux

Branch

0

0

1

1

Add

X

0

4.1.2

All resources (blocks) perform a useful function for this instruction except the “branch add” unit and write the port of the registers.

4.1.3

Resources

Output

Branch add

Data memory[Not used output]

Branch add, second the read port register

No outputs

Step by step solution

01

Define the concept.

4.1.1

The control signal “ALUMux” can control Mux.

The output of the register file is selected at the ALU input, 0 (Reg) And the immediate from the instruction word is selected by “1(Imm)” as the second input to ALU.

The control signal “RegMux” can control Mux at the data input to the register file.

The output of the ALU is selected by the “0” ALU and the output of the memory is selected by “0”Mem.

The value “X” is denoted as “don’t care.” So it does not care if the signal is 0 or 1.

4.1.2

The resource (blocks) “branch add” does not perform a useful function for this instruction. It writes the port of the registers.

4.1.3

The “branch add” produces the data memory as an output. This output is not used. However, the resource “branch add, second the read port register” produces no output.

02

Determine the calculation.

4.1.1

The specified picture is Figure 4.2.

It is also given that the instruction is “AND Rd,Rs,Rt” and the interpretation is “Reg[Rd] = Reg[Rs] AND Reg[Rt].”

For the specified instruction, the control generated the values of control signals in the mentioned figure-4.2.

RegWrite

Memread

ALUMUX

MemWrite

ALUOP

RegMux

Branch

0

0

1

1

Add

X

0

Here “X” is denoted as a “don’t care” situation.

4.1.2

All resources (blocks) perform a useful function for this instruction but the resource (blocks) “branch add” does not perform a useful function for this instruction. It writes the port of the registers.

4.1.3

The “branch add” produces the data memory as an output. This output is not used where the “branch add, second the read port register” produces no output.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Consider the following loop.

loop:lw r1,0(r1)

and r1,r1,r2

lw r1,0(r1)

lw r1,0(r1)

beq r1,r0,loop

Assume that perfect branch prediction is used (no stalls due to control hazards), that there are no delay slots, and that the pipeline has full forwarding support. Also, assume that many iterations of this loop are executed before the loop exits.

4.11.1 Show a pipeline execution diagram for the third iteration of this loop, from the cycle in which we fetch the first instruction of that iteration up to(but not including) the cycle in which we can fetch the first instruction of the next iteration. Show all instructions that are in the pipeline during these cycles (not just those from the third iteration).

4.11.2 How often (as a percentage of all cycles) do we have a cycle in which all five pipeline stages are doing useful work?

The importance of having a good branch predictor depends on how often conditional branches are executed. Together with branch predictor accuracy, this will determine how much time is spent stalling due to mispredicted branches. In this exercise, assume that the breakdown of dynamic instructions into various instruction categories is as follows:

R-Type

BEQ

JMP

LW

SW

40%

25%

5%

25%

5%

Also, assume the following branch predictor accuracies:

Always-Taken

Always-Not-Taken

2-Bit

45%

55%

85%

4.15.1 [10] Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to mispredicted branches with the always-taken predictor? Assume that branch outcomes are determined in the EX stage, that there are no data hazards, and that no delay slots are used.

4.15.2 [10] Repeat 4.15.1 for the “always-not-taken” predictor.

4.15.3 [10] Repeat 4.15.1 for for the 2-bit predictor.

4.15.4 [10] With the 2-bit predictor, what speedup would be achieved if we could convert half of the branch instructions in a way that replaces a branch instruction with an ALU instruction? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced. 4.17 Exercises 367 4.15.5 [10] With the 2-bit predictor, what speedup would be achieved if we could convert half of the branch instructions in a way that replaced each branch instruction with two ALU instructions? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced.

4.15.6 [10] Some branch instructions are much more predictable than others. If we know that 80% of all executed branch instructions are easy-to-predict loop-back branches that are always predicted correctly, what is the accuracy of the 2-bit predictor on the remaining 20% of the branch instructions?

This exercise is intended to help you understand the cost/complexity/performance trade-offs of forwarding in a pipelined processor. Problems in this exercise refer to pipelined data paths from Figure 4.45. These problems assume that, of all the instructions executed in a processor, the following fraction of these instructions have a particular type of RAW data dependence. The type of RAW data dependence is identified by the stage that produces the result (EX or MEM) and the instruction that consumes the result (1st instruction that follows the one that produces the result, 2nd instruction that follows, or both). We assume that the register write is done in the first half of the clock cycle and that register reads are done in the second half of the cycle, so “EX to 3rd” and “MEM to 3rd” dependencies are not counted because they cannot result in data hazards. Also, assume that the CPI of the processor is 1 if there are no data hazards.

Ex to 1st only

MEM to 1st only

EX to 2nd only

MEM to 2nd only

EX to 1st and MEM to 2nd

Other RAW Dependences

5%

20%

5%

10%

10%

10%

Assume the following latencies for individual pipeline stages. For the EX stage, latencies are given separately for a processor without forwarding and for a processor with different kinds of forwarding.

IF

ID

EX(no FW)

EX (full FW)

EX(FW from EX/MEM only)

Ex(FW from MEM/WB only)

MEM

WB

150ps

100ps

120ps

150ps

140ps

130ps

120ps

100ps

4.12.1 If we use no forwarding, what fraction of cycles are we stalling due to data hazards?

4.12.2 If we use full forwarding (forward all results that can be forwarded), what fraction of cycles are we staling due to data hazards?

4.12.3 Let us assume that we cannot afford to have three-input Muxes that are needed for full forwarding. We have to decide if it is better to forward only from the EX/MEM pipeline register (next-cycle forwarding) or only from the MEM/WB pipeline register (two-cycle forwarding). Which of the two options results in fewer data stall cycles?

4.12.4 For the given hazard probabilities and pipeline stage latencies, what is the speedup achieved by adding full forwarding to a pipeline that had no forwarding?

4.12.5 What would be the additional speedup (relative to a processor with forwarding) if we added time-travel forwarding that eliminates all data hazards? Assume that the yet-to-be-invented time-travel circuitry adds 100 ps to the latency of the full-forwarding EX stage.

4.12.6 Repeat 4.12.3 but this time determine which of the two options results in a shorter time per instruction.

In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies:

IF

ID

EX

MEM

WB

250ps

350ps

150ps

300ps

200ps

Also, assume that instructions executed by the processor are broken down as follows:

alu

beq

lw

sw

45%

20%

20%

15%

4.8.1 [5] What is the clock cycle time in a pipelined and non-pipelined processor?

4.8.2 [10] What is the total latency of an LW instruction in a pipelined and non-pipelined processor?

4.8.3 [10] If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor? 4.8.4 [10] Assuming there are no stalls or hazards, what is the utilization of the data memory?

4.8.5 [10] Assuming there are no stalls or hazards, what is the utilization of the write-register port of the “Registers” unit? 4.8.6 [30] Instead of a single-cycle organization, we can use a multi-cycle organization where each instruction takes multiple cycles but one instruction finishes before another is fetched. In this organization, an instruction only goes through stages it actually needs (e.g., ST only takes 4 cycles because it does not need the WB stage). Compare clock cycle times and execution times with singlecycle, multi-cycle, and pipelined organization.

This exercise explores energy efficiency and its relationship with performance. Problems in this exercise assume the following energy consumption for activity in Instruction memory, Registers, and Data memory. You can assume that the other components of the datapath spend a negligible amount of energy.

Assume that components in the datapath have the following latencies. You can assume that the other components of the datapath have negligible latencies.

4.19.1 [10] How much energy is spent to execute an ADD instruction in a single-cycle and in 5-stage pipelined design?

4.19.2 [10] What is the worst-case MIPS instruction in terms of energy consumption, and what is the energy spent to execute it?

4.19.3 [10] If energy reduction is paramount, how would you change the pipelined design? What is the percentage reduction in the energy spent by an LW instruction after this change?

4.19.4 [10] What is the performance impact of your changes from 4.19.3?

4.19.5 [10]We can eliminate the MemRead control signal and have

the data memory be read in every cycle, i.e., we can permanently have MemRead=1. Explain why the processor still functions correctly aft er this change. What is the effect of this change on clock frequency and energy consumption?

4.19.6 [10] If an idle unit spends 10% of the power it would spend

if it were active, what is the energy spent by the instruction memory in each cycle? What percentage of the overall energy spent by the instruction memory does this idle energy represent?

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free