Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

For the problems in this exercise, assume that there are no pipeline stalls and that the breakdown of executed instructions is as follows: add addi not beq lw sw 20% 20% 0% 25% 25% 10% 4.5.1 [10] In what fraction of all cycles is the data memory used? 4.5.2 [10] In what fraction of all cycles is the input of the sign-extend circuit needed? What is this circuit doing in cycles in which its input is not needed.

Short Answer

Expert verified

4.5.1

The required fraction of the data memory used is 35%.

4.5.2

The required input fraction of the sign-extend circuit is 80%.

The resource for activation of it’s the usage sends to the control signal.

Step by step solution

01

Define the concept.

4.5.1

The required fraction of the data memory used is referred to as the sum of the breakdown of the executed instruction “lw” and the breakdown of the executed instruction “sw” instruction.

The required fraction = (the breakdown of the executed instruction “lw” + the breakdown of the executed instruction “sw” instruction) .

4.5.2

The required input fraction of the sign-extend circuit is known as the sum ofthe breakdown of the executed instruction “addi” , the breakdown of the executed instruction “beq” , the breakdown of the executed instruction “lw” and the breakdown of “sw”.

The required fraction = (the breakdown of the executed instruction “addi” + the breakdown of the executed instruction “beq” + the breakdown of the executed instruction “lw” + the breakdown of “sw”) .

02

 Determine the calculation.

4.5.1

Given that,

Name of the executed instructions

The breakdown

add

20%

addi

20%

not

0%

beq

25%

lw

25%

sw

10%

The required fraction of the data memory used= (the breakdown of the executed instruction “lw” + the breakdown of the executed instruction “sw” instruction) =(25+10)%= 35%.

4.5.2

Given that,

Name of the executed instructions

The breakdown

add

20%

addi

20%

not

0%

beq

25%

lw

25%

sw

10%

The required input fraction of the sign-extend circuit = (the breakdown of the executed instruction “addi” + the breakdown of the executed instruction “beq” + the breakdown of the executed instruction “lw” + the breakdown of “sw”)

= (20+25+25+10)%=80%.

The resource for activation of it’s the usage sends to the control signal.

The operation is performed even in unnecessary areas, it is ignored because it is not used in cycles.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

When silicon chips are fabricated, defects in materials (e.g., silicon) and manufacturing errors can result in defective circuits. A very common defect is for one wire to affect the signal in another. This is called a cross-talk fault. A special class of cross-talk faults is when a signal is connected to a wire that has a constant logical value (e.g., a power supply wire). In this case we have a stuck-at-0 or a stuckat-1 fault, and the affected signal always has a logical value of 0 or 1, respectively. The following problems refer to bit 0 of the Write Register input on the register fi le in Figure 4.24. 4.6.1 [10] Let us assume that processor testing is done by filling the PC, registers, and data and instruction memories with some values (you can choose which values), letting a single instruction execute, then reading the PC, memories, and registers. These values are then examined to determine if a particular fault is present. Can you design a test (values for PC, memories, and registers) that would determine if there is a stuck-at-0 fault on this signal? 4.6.2 [10] Repeat 4.6.1 for a stuck-at-1 fault. Can you use a single test for both stuck-at-0 and stuck-at-1? If yes, explain how; if no, explain why not. 4.6.3 [60] If we know that the processor has a stuck-at-1 fault on this signal, is the processor still usable? To be usable, we must be able to convert any program that executes on a normal MIPS processor into a program that works on this processor. You can assume that there is enough free instruction memory and data memory to let you make the program longer and store additional data. Hint: the processor is usable if every instruction “broken” by this fault can be replaced with a sequence of “working” instructions that achieve the same effect. 4.6.4 [10] Repeat 4.6.1, but now the fault to test for is whether the “MemRead” control signal becomes 0 if RegDst control signal is 0, no fault otherwise. 4.6.5 [10] Repeat 4.6.4, but now the fault to test for is whether the “Jump” control signal becomes 0 if RegDst control signal is 0, no fault otherwise.

The importance of having a good branch predictor depends on how often conditional branches are executed. Together with branch predictor accuracy, this will determine how much time is spent stalling due to mispredicted branches. In this exercise, assume that the breakdown of dynamic instructions into various instruction categories is as follows:

R-Type

BEQ

JMP

LW

SW

40%

25%

5%

25%

5%

Also, assume the following branch predictor accuracies:

Always-Taken

Always-Not-Taken

2-Bit

45%

55%

85%

4.15.1 [10] Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to mispredicted branches with the always-taken predictor? Assume that branch outcomes are determined in the EX stage, that there are no data hazards, and that no delay slots are used.

4.15.2 [10] Repeat 4.15.1 for the “always-not-taken” predictor.

4.15.3 [10] Repeat 4.15.1 for for the 2-bit predictor.

4.15.4 [10] With the 2-bit predictor, what speedup would be achieved if we could convert half of the branch instructions in a way that replaces a branch instruction with an ALU instruction? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced. 4.17 Exercises 367 4.15.5 [10] With the 2-bit predictor, what speedup would be achieved if we could convert half of the branch instructions in a way that replaced each branch instruction with two ALU instructions? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced.

4.15.6 [10] Some branch instructions are much more predictable than others. If we know that 80% of all executed branch instructions are easy-to-predict loop-back branches that are always predicted correctly, what is the accuracy of the 2-bit predictor on the remaining 20% of the branch instructions?

The basic single-cycle MIPS implementation in Figure 4.2 can only implement some instructions. New instructions can be added to an existing Instruction Set Architecture (ISA), but the decision whether or not to do that depends, among other things, on the cost and complexity the proposed addition introduces into the processor datapath and control. The first three problems in this exercise refer to the new instruction: Instruction: LWI Rt,Rd(Rs) Interpretation: Reg[Rt] = Mem[Reg[Rd]+Reg[Rs]] 4.2.1 [10] Which existing blocks (if any) can be used for this instruction? 4.2.2 [10] which new functional blocks (if any) do we need for this instruction? 4.2.3 [10] what new signals do we need (if any) from the control unit to support this instruction?

In this exercise we examine in detail how an instruction is executed in a single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor fetches the following instruction word: 10101100011000100000000000010100. Assume that data memory is all zeros and that the processor’s registers have the following values at the beginning of the cycle in which the above instruction word is fetched:

r0

r1

r2

r3

r4

r5

r6

r8

r12

r31

0

–1

2

–3

–4

10

6

8

2

–16

4.7.1 [5] What are the outputs of the sign-extend and the jump “Shift left 2” unit (near the top of Figure 4.24) for this instruction word?

4.7.2 [10] What are the values of the ALU control unit’s inputs for this instruction?

4.7.3 [10] What is the new PC address after this instruction is executed? Highlight the path through which this value is determined.

4.7.4 [10] For each Mux, show the values of its data output during the execution of this instruction and these register values.

4.7.5 [10] For the ALU and the two add units, what are their data input values?

4.7.6 [10] What are the values of all inputs for the “Registers” unit?

This exercise is intended to help you understand the relationship between delay slots, control hazards, and branch execution in a pipelined processor. In this exercise, we assume that the following MIPS code is executed on a pipelined processor with a 5-stage pipeline, full forwarding, and a predict-taken branch predictor:

lw r2,0(r1)

label1: beq r2,r0,label2 # not taken once, then taken

lw r3,0(r2) beq r3,r0,label1 # taken

add r1,r3,r1

label2: sw r1,0(r2)

4.14.1 [10] Draw the pipeline execution diagram for this code, assuming there are no delay slots and that branches execute in the EX stage. 4.14.2 [10] Repeat 4.14.1, but assume that delay slots are used. In the given code, the instruction that follows the branch is now the delay slot instruction for that branch.

4.14.3 [20] One way to move the branch resolution one stage earlier is to not need an ALU operation in conditional branches. The branch instructions would be “bez rd,label” and “bnez rd,label”, and it would branch if the register has and does not have a zero value, respectively. Change this code to use these branch instructions instead of beq. You can assume that register R8 is available for you to use as a temporary register, and that an seq (set if equal) R-type instruction can be used. 366 Chapter 4 The Processor Section 4.8 describes how the severity of control hazards can be reduced by moving branch execution into the ID stage. This approach involves a dedicated comparator in the ID stage, as shown in Figure 4.62. However, this approach potentially adds to the latency of the ID stage, and requires additional forwarding logic and hazard detection.

4.14.4 [10] Using the first branch instruction in the given code as an example, describe the hazard detection logic needed to support branch execution in the ID stage as in Figure 4.62. Which type of hazard is this new logic supposed to detect?

4.14.5 [10] For the given code, what is the speedup achieved by moving branch execution into the ID stage? Explain your answer. In your speedup calculation, assume that the additional comparison in the ID stage does not affect clock cycle time. 4.14.6 [10] Using the first branch instruction in the given code as an example, describe the forwarding support that must be added to support branch execution in the ID stage. Compare the complexity of this new forwarding unit to the complexity of the existing forwarding unit in Figure 4.62.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free