Chapter 4: 3. (page 358)

When processor designers consider a possible improvement to the processor datapath, the decision usually depends on the cost/performance trade-off . In the following three problems, assume that we are starting with a datapath from Figure 4.3, where I-Mem, Add, Mux, ALU, Regs, D-Mem, and Control blocks have latencies of 400 ps, 100 ps, 30 ps, 120 ps, 200 ps, 350 ps, and 100 ps, respectively, and costs of 1000, 30, 10, 100, 200, 2000, and 500, respectively. Consider the addition of a multiplier to the ALU. This addition will add 300 ps to the latency of the ALU and will add a cost of 600 to the ALU. The result will be 5% fewer instructions executed since we will no longer need to emulate the MUL instruction. 4.3.1 [10] What is the clock cycle time with and without this improvement? 4.3.2 [10] What is the speedup achieved by adding this improvement? 4.3.3 [10] Compare the cost/performance ratio with and without this improvement.

Short Answer

Expert verified

4.3.1

The clock cycle time with and without this improvement is 1130 and 1430 respectively.

4.3.2

No speedup is achieved.

4.3.3

The cost/performance ratio with and without this improvement is.

Step by step solution

Define the concept.

4.3.1

$C l o c k c y c l e t i m e = (I - M e m + R e g + M u x + A L U + D - M e m + M u x)$

4.3.2

Running time = number of instruction x latency

4.3.3

$C l o c k c y c l e t i m e = (I - m e m + (2 \times a d d) + (3 \times m u x) + A L U + r e g + D - m e m + c o n t r o l b l o c k s)$

Determine the calculation.

4.3.1

Given that, the I-Mem, Add, Mux, ALU, Regs, D-Mem, and thecontrol blocks have latencies of 400 ps, 100 ps, 30 ps, 120 ps, 200 ps, 350 psand costs of the mentioned latencies are respectively 1000, 30, 10, 100, 200, 2000, and 500.

It is considered that the addition of a multiplier to the ALU. Also, 300 ps to the latency of the ALU will be added to this addition and will add a cost of 600 to the ALU. The result will be 5% fewer instructions as there is no need to emulate the MUL instruction.

So, the clock cycle time without this improvement= (400+30+200+120+350+30) = 1130.

So, the clock cycle time with the improvement is

(400+30+200+120+300+350+30) = 1430[As, 300 is the additional latency of ALU which is added for improvement].

4.3.2

Let’s assume, the 1130 ps is the latency of the 1000 instructions.

Hence the running time will be 1130000.

The 1430 ps is the latency of the 950 instructions.

Hence the running time will be 1358500.

Hence, 1358500 – 1130000 = 2,28,500.

No speedup is achieved by adding the specified instruction (MUL) rather than

running time will be decreased by 2,28,500.

4.3.3

Let’s assume the increasing cost to 10000000.

The comparison of the cost and the performance:

Clock cycle time = (1000+(2x30)+(3x10)+100+200+2000+500)

= 3890

$C o s t p e r f o r m a n c e r a t i o = \frac{\cos t}{\frac{1}{c l o c k c y c l e t i m e w i t h o u t i m p r o v e m e n t}} = \frac{10000000}{\frac{1}{1130}} = 10000000 \times 1130 = 0.00113 s e c$

$C o s t p e r f o r m a n c e = (\frac{3890}{\frac{1}{0.00113}}) = 4.4 C o s t = 3890 + 600 = 4490 10000000 \times 1430 \times 0.95 = 1358000000 p s = 0.0013585 s e c \frac{4490}{\frac{1}{0.0013585}} = 6.1 C o s t = 3890 - 400 = 3490 10000000 \times 1430 \times 0.95 = 13000000 p s = 0.00113 s e c \frac{3490}{\frac{1}{0.00113}} = 3.9437$

Hence, there is no improvement.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Ex to 1^st only	MEM to 1^st only	EX to 2^nd only	MEM to 2^nd only	EX to 1^st and MEM to 2^nd	Other RAW Dependences
5%	20%	5%	10%	10%	10%

IF	ID	EX(no FW)	EX (full FW)	EX(FW from EX/MEM only)	Ex(FW from MEM/WB only)	MEM	WB
150ps	100ps	120ps	150ps	140ps	130ps	120ps	100ps

IF	ID	EX	MEM	WB
200ps	120ps	150ps	190ps	100ps

R-Type	BEQ	JMP	LW	SW
40%	25%	5%	25%	5%

Always-Taken	Always-Not-Taken	2-Bit
45%	55%	85%

Short Answer

Step by step solution

Define the concept.

Determine the calculation.

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Computer Science Textbooks

Computer Organisation and Architecture

Theory of Computation

Algorithms in Computer Science

Computer Programming

Computer Network

Computer Systems

Study anywhere. Anytime. Across all devices.

Company

Product

Help