Chapter 1: Q9E (page 56)

Assume for arithmetic, load/store, and branch instructions, a processor have CPIs of 1, 12, and 5, respectively. Also assume that on a single processor a program requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and 256 million branch instructions. Assume that each processor has a 2 GHz clock frequency.
Assume that, as the program is parallelized to run over multiple cores, the number of arithmetic and load/store instructions per processor is divided by 0.7 x p (where p is the number of processors) but the number of branch instructions per processor remains the same.
1.9.1 Find the total execution time for this program on 1, 2, 4, and 8 processors, and show the relative speedup of the 2, 4, and 8 processor result relative to the single processor result.
1.9.2 If the CPI of the arithmetic instructions was doubled, what would the impact be on the execution time of the program on 1, 2, 4, or 8 processors?
1.9.3 To what should the CPI of load/store instructions be reduced in order for a single processor to match the performance of four processors using the original CPI values?

Short Answer

Expert verified

1.9.1)

Execution time for processor 1: 9.6 sec

Execution time for processor 2: 7.02 sec

Execution time for processor 4: 3.86 sec

Execution time for processor 8: 2.25 sec

Relative speed taken by processor 2: 1.37

Relative speed taken by processor 4: 2.49

Relative speed taken by processor 8: 4.27

1.9.2)

Impact on execution time on 1 processor: 10.88ms

Impact on execution time on 2 processor: 7.954ms

Impact on execution time on 4 processor: 4.297ms

Impact on execution time on 8 processor: 2.468ms

1.9.3)

reduced CPI of load/store instructions is 25%

Step by step solution

Consider the following data

CPI for

$A r i t h m e t i c i n s t r u c t i o n : 1 L o a d / S t o r e i n s t r u c t i o n : 12 B r a n c h i n s t r u c t i o n : 5$

No. of Instruction per processor

$A r i t h m e t i c : 2.56 \times 10^{9} L o a d / S t o r e : 1.28 \times 10^{9} B r a n c h : 256 \times 10^{6} i n s t r u c t i o n s$

Clock rate=

When a program is distributed to operate on multicore processors, the amount of arithmetic and load store operations per processor is divided by 0.7 and multiplied by the number of processors p, but the branch command remains unchanged. There are four processors: 1,2,4,8. To calculate the execution, the processor uses the following method:

Write a formula for execution time for a processor:

$E x e c u t i o n t i m e = \frac{c l o c k c y c l e}{c l o c k r a t e}$ .........(1)

Write a formula for clock cycle.

$C l o c k c y l e s = C P I_{f p} x N o . F P i n s t r . + C P I_{i n t} x N o . I N T i n s t r . + C P I_{1 / z} x N o . L / S i n s t r . + C P l_{b r a n c h} x N o . b r a n c h i n s t r$ ............(2)

Write a formula forRelative speedup

$Re l a t i v e s p e e d u p = \frac{T i m e t a k e n b y 1 p r o c e s s o r}{T i m e t a k e n b y p p r o c e s s o r s}$ .....................(3)

Execution time for processor 1, processor 2, processor 4, and processor 8

1.9.1

Execution time for processor 1:

Compute the clock cycle using the following strategy:

$C l o c k c y c l e = C P I_{f p} x N o . F P i n s t r . + C P l_{i n t}, * N o . I N T i n s t r + C P I_{1 / z} x N o . L / S i n s t r . + C P I_{b r a n c h}, * N o . b r a n c h i n s t r$

Therefore,

$\begin{array}{rcl} C l o c k c y c l e & = & 2560000000 x1 + 1280000000x12 + 256000000 x5 \\ = & 2560000000 + 15360000000 + 1280000000 \\ = & 19200000000 cycles \end{array}$

Now calculate the execution time with the help of following method:

$E x e c u t i o n t i m e f o r a p r o c e s s o r = \frac{c l o c k c y c l e}{c l o c k r a t e}$

Therefore,

$C P U e x e c u t i o n t i m e = 19200000000 c y c l e s / 2 \times 10^{9} c y c l e s / s e c = 9600000000 / 10^{9} = 9.6 s e c$

$\begin{array}{rcl} R e l a t i v e s p e e d u p & = & \frac{T i m e t a k e n b y 1 p r o c e s s o r}{T i m e t a k e n b y p p r o c e s s o r s} \\ = & 9.6 / 9.6 \\ = & 1 \end{array}$

Execution time for processor 2:

When a programme is distributed to operate on multicore processors, the amount of arithmetic and load store operations per processor is divided by 0.7 and multiplied by the number of processors p, but the branch command remains unchanged.There are four processors: 1,2,4,8.

Compute the clock cycle using the following strategy:

$C l o c k c y l e s = C P I_{f p} x N o . F P i n s t r . + C P l_{i n t}, x N o . I N T i n s t r + C P I_{1 / z} x N o . L / S i n s t r . + C P l_{b r a n c h}, * N o . b r a n c h i n s t r$

Therefore,

$c l o c k c y c l e = \frac{2560}{0.7 * 8} * 2 + \frac{1289}{0.7 * 8} * 12 + 256 * 5 = 914.23 + 2742.86 + 1280 = 4937.08 c y c l e$

Now calculate the execution time with the help of following method:

$E x e c u t i o n t i m e f o r a p r o c e s s o r = \frac{c l o c k c y c l e}{c l o c k r a t e}$

Therefore

$C P U e x e c u t i o n t i m e = 14040000000 c y c l e s / 2 * 10^{8} c y c l e s / s e c = 7020000000 / 10^{9} = 7.02 s e c$

$\begin{array}{rcl} R e l a t i v e s p e e d u p & = & \frac{T i m e t a k e n b y 1 p r o c e s s o r}{T i m e t a k e n b y 2 p r o c e s s o r s} \\ = \frac{9.6}{7.02} \\ =1.37 \end{array}$

Execution time for processor 4:

Compute the clock cycle using the following strategy:

$C l o c k e y e s = C P I_{f p}, x N o . F P i n s t r . + C P I_{i n t}, x N o . I N T i n s t r + C P I_{1 / z}, x N o . L / S i n s t r . + C P l_{b r a n c h} * N o . b r a n c h i n s t r$

Therefore,

$c l o c k c y c l e = (2560000000 * 1) / (0.7 * 4) + (1280000000 * 12) / (0.7 * 4) + 256000000 * 5 = 7720000000 c y c l e s$

Now calculate the execution time with the help of following method:

$E x e c u t i o n t i m e f o r a p r o c e s s o r = \frac{c l o c k c y c l e}{c l o c k r a t e}$

Therefore,

$\begin{array}{rcl} C P U e x e c u t i o n t i m e \\ = & 1720000000 c y c l e s / 2 * 10^{9} c y c l e s / s e c \\ = & 3.86 s e c \\ R e l a t i v e s p e e d u p & = & \frac{T i m e t a k e n b y 1 p r o c e s s o r}{T i m e t a k e n b y 4 p r o c e s s o r s} \\ = & 9.6 / 3.86 \\ = & 2.49 \end{array}$

Execution time for processor 8:

Compute the clock cycle using the following strategy:

$C l o c k c y l e s = C P l_{f p} x N o . F P i n s t r . + C P I i n t, x N o . I N T i n s t r + C P I_{1 / z} x N o . L / S i n s t r . + C P l_{b r a n c h} * N o . b r a n c h i n s t r$

Therefore,

localid="1655192387163" $c l o c k c y c l e = \frac{2560000000 * 1}{0.7 * 8} + \frac{1280000000 * 12}{0.7 * 8} + 256000000 * 5 = \frac{25600000000}{5.6} + \frac{15360000000}{5.6} + 1280000000 = 4500000000 c y c l e s$

Now calculate the execution time with the help of following method:

$E x e c u t i o n t i m e f o r a p r o c e s s o r = \frac{c l o c k c y c l e}{c l o c k r a t e}$

Therefore,

$C P U e x e c u t i o n t i m e = 4500000000 c y c l e s / 2 * 10^{9} C y c l e / s e c = 2250000000 / 10^{9} = 2.25 s e c$

Time taken by processor

$\begin{array}{rcl} R e l a t i v e s p e e d u p & = & \frac{T i m e t a k e n b y 1 p r o c e s s o r}{T i m e t a k e n b y 8 p r o c e s s o r} \\ = & 9.6 / 225 \\ = & 4.27 \end{array}$

Execution time with double Arithmetic instructions.

1.9.2

Execution time for processor 1 when Arithmetic instructions are doubled:

If an Arithmetic instruction is doubled (CPI=2) then Execution time for processor 1 can be calculated as:

First calculate the Clock cycle, which can be calculated as:

Clock cycle

$= 2560 \times 2 + 1280 \times 12 + 256 \times 5 = 21760 c y c l e s$

Now execution time can be calculated as:

$C P U e x e c u t i o n t i m e = 21760 c y c l e s / 2 x 10 c y c l e s / s e c = 10.88 m s$

Execution time for processor 2 when Arithmetic instructions are doubled:

If an Arithmetic instruction is doubled (CPI=2) then Execution time for processor 2 can be calculated as:

$C l o c k c y c l e = (2560 / 0.7 * 2) * 2 + 1280 / (0.7 * 2) * 12 + 256 * 5 = 3657.14 + 10971 43 + 1280 = 15908.57 c y c l e s$

Now execution time can be calculated as:

$C P U e x e c u t i o n t i m e = 15908.57 c y c l e s / 2 * 10^{9} c y c l e s / s e c = 7.954 m s$

Execution time for processor 4 when Arithmetic instructions are doubled:

If an Arithmetic instruction is doubled (CPI=2) then Execution time for processor 4 can be calculated as:

$c l o c k cycle = \frac{2560}{0.7 * 4} * 2 + \frac{1280}{0.7 * 4} * 12 + 256 * 5 = 1828.57 + 5485.71 + 1280 = 8594$

Now execution time can be calculated as:

CPU execution time

$= 8594.28 c y c l e s / 2 * 10^{9} c y c l e s / s e c = 4.297 m s$

Execution time for processor 8 when Arithmetic instructions are doubled:

If an Arithmetic instruction is doubled (CPI=2) then Execution time for processor 8 can be calculated as:

$c l o c k c y c l e = \frac{2560}{0.7 * 8} * 2 + \frac{1289}{0.7 * 8} * 12 + 256 * 5 = 914.23 + 2742.86 + 1280 = 4937.08 c y c l e$

Now execution time can be calculated as:

$= 4937.08 c y c l e s / 2 x 10^{9} c y c l e s / s e c = 2.468 m s$

Determine the reduced CPI of load/store instructions.

1.9.3

For 4 processors:

Therefore,

clock cycle

$= (2560000000 / 2.8) + (15360000000 / 2.8) + 1280000000 = 7720000000 c y c l e s$

Now calculate the execution time with the help of following method:

$E x e c u t i o n t i m e f o r a p r o c e s s o r = \frac{c l o c k c y c l e}{c l o c k r a t e}$

Therefore,

$C P U e x e c u t i o n t i m e = 7720000000 c y c l e s / 2 x 10^{9} c y c l e s / s e c = 3860000000 / 10^{9} = 3.86 \sec$

Reducing CPI of a single processor to match the performance of 4 processors:

Compute the clock cycle using the following strategy:

$C l o c k c y l e s = C P I_{f p} x N o . F P i n s t r . + C P I_{i n t} x N o . I N T i n s t r . + C P I_{1 / z} x N o . L / S i n s t r . + C P I_{b r a n c h} x N o . b r a n c h i n s t r_{}$

Therefore,

clock cycle

$= 2560000000 x 1 + 1280000000 x a + 256000000 x 5 = 2560000000 + 1280000000 a + 1280000000 = 3840000000 + 1280000000 x a$

Now calculate the execution time with the help of following method:

$E x e c u t i o n t i m e f o r a p r o c e s s o r = \frac{c l o c k c y c l e}{c l o c k r a t e}$

Therefore,

CPU execution time

$= \frac{3840000000 + 1280000000 * a}{2 * 10^{9} c y c l e / \sec} = \frac{3840000000}{2 * 10^{9}} + \frac{1280000000 * a}{2 * 10^{9}} = 3.86 = 1.92 + 0.64 * a = 1.94 = 0.64 * a = a = 3.03$

The reduced CPI is calculated as follows:

Original CPI for load instructions=12

$\begin{array}{rcl} R e d u c e d C P I & = & \frac{a}{o r i g i n a l C P I f o r l o a d i n s t r u c t i o n s} \\ = & 3.03 / 12 = 0.25 o r 25 % \end{array}$

Thus, the reduced CPI of load/store instructions is 25%.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Short Answer

Step by step solution

Consider the following data

Execution time for processor 1, processor 2, processor 4, and processor 8

Execution time with double Arithmetic instructions.

Determine the reduced CPI of load/store instructions.

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Computer Science Textbooks

Functional Programming

Issues in Computer Science

Big Data

Data Structures

Computer Systems

Theory of Computation

Study anywhere. Anytime. Across all devices.

Company

Product

Help