Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

This exercise explores energy efficiency and its relationship with performance. Problems in this exercise assume the following energy consumption for activity in Instruction memory, Registers, and Data memory. You can assume that the other components of the datapath spend a negligible amount of energy.

Assume that components in the datapath have the following latencies. You can assume that the other components of the datapath have negligible latencies.

4.19.1 [10] How much energy is spent to execute an ADD instruction in a single-cycle and in 5-stage pipelined design?

4.19.2 [10] What is the worst-case MIPS instruction in terms of energy consumption, and what is the energy spent to execute it?

4.19.3 [10] If energy reduction is paramount, how would you change the pipelined design? What is the percentage reduction in the energy spent by an LW instruction after this change?

4.19.4 [10] What is the performance impact of your changes from 4.19.3?

4.19.5 [10]We can eliminate the MemRead control signal and have

the data memory be read in every cycle, i.e., we can permanently have MemRead=1. Explain why the processor still functions correctly aft er this change. What is the effect of this change on clock frequency and energy consumption?

4.19.6 [10] If an idle unit spends 10% of the power it would spend

if it were active, what is the energy spent by the instruction memory in each cycle? What percentage of the overall energy spent by the instruction memory does this idle energy represent?

Short Answer

Expert verified

4.19.1 – Energy spent is 340 pJ.

4.19.2 – Load instruction is the worst case and energy spent is 480 pJ.

4.19.3 – Percentage reduction is 14.58%

4.19.4 – No Significant performance change take place.

4.19.5 – The clock frequency and energy consumption remain the same,

4.19.6 – The total energy spent is 143.5pJ and the perctange is 2.44%

Step by step solution

01

Step 1:Energy efficiency and performance.

The processors generally have 5 operations and they are I-Mem, Register Read, Register Write, D-Mem Read and D-Mem Write.

The total energy spent for an instruction can be calculated by adding all the energy spent by the operations performed during the operation.

The power used by the instruction unit can be calculated with the formula

Power=clockcycletime-IMemlatency×activeenergyforI-MemLatencyforI-Mem×percentageofactivepower

Energy Spent by Instruction memory=idleenergytotalenergy

02

(4.19.1)Step 2: Finding required energy.

In a single cycle design, ADD instruction has to perform:

  • One I-Mem to fetch the instruction.
  • Two register read to read the two operands.
  • One register write to perform the save operation.

Now the energy required is the total energy required to perform the three operations.

I-Mem = 140, Read = 70 and Write = 60.

Adding all the energies required:

=140+2×70+60=340

The required energy is 340 pJ.

03

(4.19.2)Step 3: Energy consumption for MIPS instruction.

The worst case MIPS instruction is a load instruction because the sum of the energy consumed by memory read and the energy consumed by register write is more than just the energy consumed by memory write.

A load instruction requires, One Instruction memory is read, two registers are read, 1 register is written and 1 memory is read.

Instruction memory = 140 pJ

Read register = 70 pJ

Write register = 60 pJ

Read memory = 140 pJ

The total energy consumed is:

=140+2×70×+60+140=480

The energy spent is 480 pJ

04

(4.19.3)Step 4: Percentage reduction in energy spent.

A load instruction requires, One Instruction memory is read, two registers are read, 1 register is written and 1 memory is read.

Instruction memory = 140 pJ

Read register = 70 pJ

Write register = 60 pJ

Read memory = 140 pJ

The total energy consumed is:

=140+2×70+60+140=480

The energy spent is 480 pJ

If the changes are implemented, the load requires only one register read. This register read is the one required to generate the address. Here, calculate the energy required by a load instruction when these register read signals are used: One Instruction memory is read, one registers is read, 1 register is written and 1 memory is read.

So, the total energy consumed is:

=140+70+60+140=410

The energy spent after changes is 410 pJ.

The energy saved by using register read control is

=480-410=70pJ

Percentage reduction

=70480=0.14583314.58%

The percentage reduction in energy consumed is 14.58%.

05

(4.19.4)Step 5: Performance impact.

Consider the following latencies for various components of datapath;

Instruction Memory (I-Mem)=200ps

Control=150ps

Register read or Write=90ps

ALU=90ps

Data Memory (D-Mem) Read or Write=250ps

To calculate the impact on performance due to addition of register read control signals:

Firstly, consider the clock cycle time before the register read control signals are added. In this case, the registers are being read while control unit decodes the instruction.Here, is the longest of the latencies is for I-Mem, which is critical path latency for MEM stage.In the MEM stage, the critical path is the D-Mem latency. Here, value of clock cycle time is 250ps.

Now, consider the clock cycle time after the register read control signals are added.

In this case, the latencies of registers read and control unit are not overlapped. As a result, the latency of ID stage increases by adding the latencies of the two. Here, the new latency of ID stage is given by:

=latency of control unit + latency of register read

=150+90=240ps

Even with the increased latency value of ID stage, it is still less than that for MEM stage (240<250).In the MEM stage, the critical path is the D-Mem latency. Here, value of clock cycle time is 250ps.Finally, it is concluded that there is no change in clock cycle time even after the register read control signals are added to the pipeline.

Hence, with given latencies, there is no impact on the performance of a 5-stage pipeline by addition of register read control signals.

06

(4.19.5)Step 6: Effect on Clock frequency and Energy consumption.

Consider the case when the MemRead control signal is eliminated and the data memory is read in every cycle. This means that MemRead is always 1. Here, If memory is read in every cycle,

It is either used say for a load instruction.If it not required, (say for non-load instructions that write to a register); it does not get beyond the WB Multiplexor. So, it’s wasted.Or it does not get written to any register at all (Say for all other instructions including stalls). So, it’s wasted here also. So, the processor will still function properly.

Before the change, the memory is read only in cycles when an instruction is in MEM stage. Even with the change memory is read in every cycle. As the clock cycle time allows enough time for memory to be read in each case. So, the change above change does not affect the clock cycle time.

Before the change, the memory is read only in cycles when an instruction is in MEM stage. Even with the change memory read occurs in every cycle. As a result, the same amount of energy is used for data memory read in each case. So, this change also does not affect energy consumption.

Hence, it can be concluded that, even after allowing data memory access in every cycle in lieu of elimination of MemRead Control Signal; the processor function properly and clock frequency and energy consumption remain the same.

07

(4.19.6)Step 7: Effect on Clock frequency and Energy consumption.

Consider the following latencies for various components of datapath;

Instruction Memory (I-Mem)=200ps

Control=150ps

Register read or Write=90ps

ALU=90ps

Data Memory (D-Mem) Read or Write=250ps

Also consider the energy consumption by various components of datapath as;

Instruction Memory (I-Mem)= 140 pJ

Register read= 70pJ

Register Write=60pJ

Data memory (D-Mem) Read=140pJ

Data memory (D-Mem) write=120pJ

Here, given that the power spent by active instruction memory=140 pJ. Also given the latency of instruction memory= 200ps

Firstly, calculate the clock cycle time; Here, the longest of the latencies is for I-Mem, which is critical path latency for MEM stage. In the MEM stage, the critical path is the D-Mem latency. Here, value of clock cycle time is 250ps.

Now, when unit is idle it spends only 10% of active power.

The power used by instruction memory when unit is idle is given by:

Power=clockcycletime-IMemlatency×activeenergyforI-MemlatencyforI-Mem×percentageofactivepower

=250-200×140200×0.1=3.5

Now, calculate the total energy spent on instruction memory (I-Mem)

Total energy spent on instruction memory is given by:

=energy when I-Mem is active + energy when I-Mem is idle

=140+3.4=143.5pJ

Finally, calculate the percentage representation of idle energy amongst the total energy.

Energy spent by the instruction memory is given by:

=idleenergytotalenergy=3.5143.5=0.02439=2.44%

Hence, the total energy spent on instruction memory (I-Mem) in each cycle is 143.5pJ and the percentage representation of idle energy amongst the total energy spent by the instruction memory is 2.44%

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

This exercise is intended to help you understand the relationship between forwarding, hazard detection, and ISA design. Problems in this exercise refer to the following sequence of instructions, and assume that it is executed on a 5-stage pipelined datapath:

add r5,r2,r1

lw r3,4(r5)

lw r2,0(r2)

or r3,r5,r3

sw r3,0(r5)

4.13.1 [5] If there is no forwarding or hazard detection, insert nops to ensure correct execution.

4.13.2 [10] Repeat 4.13.1 but now use nops only when a hazard cannot be avoided by changing or rearranging these instructions. You can assume register R7 can be used to hold temporary values in your modified code.

4.13.3 [10] If the processor has forwarding, but we forgot to implement the hazard detection unit, what happens when this code executes? 4.13.4 [20] If there is forwarding, for the first five cycles during the execution of this code, specify which signals are asserted in each cycle by hazard detection and forwarding units in Figure 4.60.

4.13.5 [10] If there is no forwarding, what new inputs and output signals do we need for the hazard detection unit in Figure 4.60? Using this instruction sequence as an example, explain why each signal is needed. 4.13.6 [20] For the new hazard detection unit from 4.13.5, specify which output signals it asserts in each of the first five cycles during the execution of this code.

In this exercise, we examine how resource hazards, control hazards, and Instruction Set Architecture (ISA) design can affect pipelined execution. Problems in this exercise refer to the following fragment of MIPS code:

sw r16,12(r6)

lw r16,8(r6)

beq r5,r4,Label # Assume r5!=r4

add r5,r1,r4

slt r5,r15,r4

Assume that individual pipeline stages have the following latencies:

IF

ID

EX

MEM

WB

200ps

120ps

150ps

190ps

100ps

4.10.1 For this problem, assume that all branches are perfectly predicted (this eliminates all control hazards) and that no delay slots are used. If we only have one memory (for both instructions and data), there is a structural hazard every time we need to fetch an instruction in the same cycle in which another instruction accesses data. To guarantee forward progress, this hazard must always be resolved in favor of the instruction that accesses data. What is the total execution time of this instruction sequence in the 5-stage pipeline that only has one memory? We have seen that data hazards can be eliminated by addingnops to the code. Can you do the same with this structural hazard? Why?

4.10.2 For this problem, assume that all branches are perfectly predicted (this eliminates all control hazards) and that no delay slots are used. If we change load/store instructions to use a register (without an offset) as the address, these instructions no longer need to use the ALU. As a result, MEM and EX stages can be overlapped and the pipeline has only 4 stages. Change this code to accommodate this changed ISA. Assuming this change does not affect clock cycle time, what speedup is achieved in this instruction sequence?

4.10.3 Assuming stall-on-branch and no delay slots, what speedup is achieved on this code if branch outcomes are determined in the ID stage, relative to the execution where branch outcomes are determined in the EX stage?

4.10.4. Given these pipeline stage latencies, repeat the speedup calculation from 4.10.2, but take into account the (possible) change in clock cycle time. When EX and MEM are done in a single stage, most of their work can be done in parallel. As a result, the resulting EX/MEM stage has a latency that is the larger of the original two, plus 20 ps needed for the work that could not be done in parallel.

4.10.5Given these pipeline stage latencies, repeat the speedup calculation from 4.10.3, taking into account the (possible) change in clock cycle time. Assume that the latency ID stage increases by 50% and the latency of the EX stage decrease by 10ps when branch outcome resolution is moved from EX to I

4.10.6 Assuming stall-on-branch and no delay slots, what is the new clock cycle time and execution time of this instruction sequence ifbeqaddress computation is moved to the MEM stage? What is the speedup from this change? Assume that the latency of the EX stage is reduced by 20 ps and the latency of the MEM stage is unchanged when branch outcome resolution is moved from EX to MEM.

Question: Problems in this exercise assume that logic blocks needed to implement a processor’s datapath have the following latencies: I-Mem Add Mux ALU Regs D-Mem Sign-Extend Shift-Left-2 200ps 70ps 20ps 90ps 90ps 250ps 15ps 10ps 4.4.1 [10] If the only thing we need to do in a processor is fetch consecutive instructions (Figure 4.6), what would the cycle time be? 4.4.2 [10] Consider a datapath similar to the one in Figure 4.11, but for a processor that only has one type of instruction: unconditional PC-relative branch. What would the cycle time be for this datapath? 4.4.3 [10] Repeat 4.4.2, but this time we need to support only conditional PC-relative branches. The remaining three problems in this exercise refer to the datapath element Shift - left -2: 4.4.4 [10] Which kinds of instructions require this resource? 4.4.5 [20] For which kinds of instructions (if any) is this resource on the critical path? 4.4.6 [10] Assuming that we only support beq and add instructions, discuss how changes in the given latency of this resource affect the cycle time of the processor. Assume that the latencies of other resources do not change.

Question: When silicon chips are fabricated, defects in materials (e.g., silicon) and manufacturing errors can result in defective circuits. A very common defect is for one wire to affect the signal in another. This is called a cross-talk fault. A special class of cross-talk faults is when a signal is connected to a wire that has a constant logical value (e.g., a power supply wire). In this case we have a stuck-at-0 or a stuckat-1 fault, and the affected signal always has a logical value of 0 or 1, respectively. The following problems refer to bit 0 of the Write Register input on the register fi le in Figure 4.24. 4.6.1 [10] Let us assume that processor testing is done by filling the PC, registers, and data and instruction memories with some values (you can choose which values), letting a single instruction execute, then reading the PC, memories, and registers. These values are then examined to determine if a particular fault is present. Can you design a test (values for PC, memories, and registers) that would determine if there is a stuck-at-0 fault on this signal? 4.6.2 [10] Repeat 4.6.1 for a stuck-at-1 fault. Can you use a single test for both stuck-at-0 and stuck-at-1? If yes, explain how; if no, explain why not. 4.6.3 [60] If we know that the processor has a stuck-at-1 fault on this signal, is the processor still usable? To be usable, we must be able to convert any program that executes on a normal MIPS processor into a program that works on this processor. You can assume that there is enough free instruction memory and data memory to let you make the program longer and store additional data. Hint: the processor is usable if every instruction “broken” by this fault can be replaced with a sequence of “working” instructions that achieve the same effect. 4.6.4 [10] Repeat 4.6.1, but now the fault to test for is whether the “MemRead” control signal becomes 0 if RegDst control signal is 0, no fault otherwise. 4.6.5 [10] Repeat 4.6.4, but now the fault to test for is whether the “Jump” control signal becomes 0 if RegDst control signal is 0, no fault otherwise.

In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies:

IF

ID

EX

MEM

WB

250ps

350ps

150ps

300ps

200ps

Also, assume that instructions executed by the processor are broken down as follows:

alu

beq

lw

sw

45%

20%

20%

15%

4.8.1 [5] What is the clock cycle time in a pipelined and non-pipelined processor?

4.8.2 [10] What is the total latency of an LW instruction in a pipelined and non-pipelined processor?

4.8.3 [10] If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor? 4.8.4 [10] Assuming there are no stalls or hazards, what is the utilization of the data memory?

4.8.5 [10] Assuming there are no stalls or hazards, what is the utilization of the write-register port of the “Registers” unit? 4.8.6 [30] Instead of a single-cycle organization, we can use a multi-cycle organization where each instruction takes multiple cycles but one instruction finishes before another is fetched. In this organization, an instruction only goes through stages it actually needs (e.g., ST only takes 4 cycles because it does not need the WB stage). Compare clock cycle times and execution times with singlecycle, multi-cycle, and pipelined organization.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free