Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

When a program is adapted to run on multiple processors in a multiprocessor system, the execution time on each processor is comprised of computing time and the overhead time required for locked critical sections and/or to send data from one processor to another.

Assume a program requires t = 100 s of execution time on one processor. When run p processors, each processor requires t/p s, as well as an additional 4 s of overhead, irrespective of the number of processors. Compute the per-processor execution time for 2, 4, 8, 16, 32, 64, and 128 processors. For each case, list the corresponding speedup relative to a single processor and the ratio between actual speedup versus ideal speedup (speedup if there was no overhead).

Short Answer

Expert verified

The answer is listed below for each case:

Processor number

Execution Time

Speed-up

Speed-up without the overhead

Ratio

2

54

1.85

2

0.925

4

29

3.45

4

0.8625

8

16.5

6.06

8

0.7575

16

10.25

9.76

16

0.61

32

7.125

14.04

32

0.43875

64

5.5425

18.05

64

0.281875

128

4.78125

20.92

128

0.1634375

Step by step solution

01

Define the concept.

Theexecutiontimeoftheprogram=100Numberoftheprocessor+4Thecorrespondingspeeduprelativetosingleprocessor=100Executivetimeoftheprogram

TheratioAtcualspeedupversusidealspeedup=SpeedupNumberoftheprocessorThespeedupwithoutoverhead=100100Numberoftheprocessor

02

Determine the calculation.

Given that, the value of “t” is 100s.

And the execution time is calculated for the processors of 2, 4, 8, 16, 32, 64, and 128.

Processor number

Execution Time

Speed-up

Speed-up without overhead

Ratio

2

1002+4=50+4=54

10054=1.85

10050=2

1.852=0.925

4

1004+4=25+4=29

10029=3.45

10024=4

3.454=0.8625

8

1008+4=12.5+4=16.5

10016.5=6.06

10012.5=8

6.068=0.7575

16

10016+4=6.25+4=10.25

10010.25=9.76

1006.25=16

9.7616=0.61

32

10032+4=3.125+4=7.125

1007.125=14.04

1003.125=32

14.0432=0.43875

64

10064+4=1.5425+4=5.5425

1005.5425=18.05

1001.5625=64

role="math" localid="1655127771056" 18.0564=0.281875

128

100125+4=4.78125

1004.78125=20.92

128

=0.1634375

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

B .19 [5] <§B.4>The Verilog code on page B-53 is for a D flip-flop. Show the Verilog code for a D latch.

B.31 [10] <§B.6> Instead of thinking of an adder as a device that adds: two numbers and then links the carries together, we can think of the adder as a hardware device that can add three inputs together (ai, bi, ci) and produce two outputs (s, ci + 1). When adding two numbers together, there is little we can do with this observation. When we are adding more than two operands, it is possible to reduce the cost of the carry. The idea is to form two independent sums, called S (sum bits) and C (carry bits). At the end of the process, we need to add C and S together using a normal adder. This technique of delaying carry propagation until the end of a sum of numbers is called carry save addition. The block drawing on the lower right of Figure B.14.1 (see below) shows the organization, with two levels of carry save adders connected by a single normal adder. Calculate the delays to add four 16-bit numbers using full carry-lookahead adders versus carry save with a carry-lookahead adder forming the final sum. (The time unit T in Exercise B.28 is the same)

One logic function that is used for a variety of purposes(including within adders and to compute parity) is exclusive OR. The output of a two-input exclusive OR function is true only if exactly one of the inputs is true. Show the truth table for a two-input exclusive OR function and implement this function using AND gates, OR gates, and inverters

The results of the SPEC CPU2006 bzip2 benchmark running on an AMD Barcelona has an instruction count of 2.389E12, an execution time of 750 s, and a reference time of 9650 s.

1.11.1 [5] <§§1.6, 1.9> Find the CPI if the clock cycle time is 0.333 ns.

1.11.2 [5] <§1.9> Find the SPECratio.

1.11.3 [5] <§§1.6, 1.9> Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% without affecting the CPI.

1.11.4 [5] <§§1.6, 1.9> Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% and the CPI is increased by 5%.

1.11.5 [5] <§§1.6, 1.9> Find the change in the SPECratio for this change.

1.11.6 [10] <§1.6> Suppose that we are developing a new version of the AMD Barcelona processor with a 4 GHz clock rate. We have added some additional instructions to the instruction set in such a way that the number of instructions has been reduced by 15%. Th e execution time is reduced to 700 s and the new SPECratio is 13.7. Find the new CPI.

1.11.7 [10] <§1.6> Th is CPI value is larger than obtained in 1.11.1 as the clock rate was increased from 3 GHz to 4 GHz. Determine whether the increase in the CPI is similar to that of the clock rate. If they are dissimilar, why?

1.11.8 [5] <§1.6> By how much has the CPU time been reduced?

58 Chapter 1 Computer Abstractions and Technology

1.11.9 [10] <§1.6> For a second benchmark, libquantum, assume an execution time of 960 ns, CPI of 1.61, and clock rate of 3 GHz. If the execution time is reduced by an additional 10% without aff ecting to the CPI and with a clock rate of 4 GHz, determine the number of instructions.

1.11.10 [10] <§1.6> Determine the clock rate required to give a further 10% reduction in CPU time while maintaining the number of instructions and with the CPI unchanged.

1.11.11 [10] <§1.6> Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20% while the number of instructions is unchanged.

Show that there are 2nentries in a truth table for a function with n inputs.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free