Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

We would like to execute the loop below as efficiently as possible. We have two different machines, a MIMD machine and a SIMD machine.

for (i=0;i<2000;i++)

for(j=0;j<3000;j++)

X_array[i][j] = Y_array[j][i] + 200;

6.11.1 [10] For a 4 CPU MIMD machine, show the sequence of MIPS instructions that you would execute on each CPU. What is the speedup for this MIMD machine?

6.11.2 [10] For an 8-wide SIMD machine (i.e.,8 parallel SIMD functional units), write an assembly program in using your own SIMD extensions to MIPS to execute the loop. Compare the number of instructions executed on the SIMD machine to MIMD machine.

Short Answer

Expert verified

(6.11.1)–

for i := 0 step 1 until 2000

begin

for j := 0 step 1 until 3000

for ADD;

ADD:

X_array[i][j]:=Y_array[j][i]+200;

join;

end;

(6.11.2)

for j := 0 step 1 until 3000

begin

X_array[i][j] := Y_array[j][i] + 200, (0<i<2000);

end;

Step by step solution

01

SIMD and MIMD

SIMD means Single Instruction Multiple Data. In this type of architecture all the units work parallely on the same single instruction but with different data inputs.

MIMD means Multiple Instruction Multiple Data:- In this type of architecture all the processor units works on different instructions but to achieve the common task.

MISD and SMID are said to have only one similarity which is both the two architectures support vector processing.

There are several between the two architectures. In SIMD, single instruction is applied to different data simultaneously while in MISD, several instructions operate on a single piece of data. SIMD produces immediate results, while MISD does not. SIMD uses array processor, while MISD uses pipelined vector and systolic array processor. SMID follows parallel processing while MISD follows series processing.

02

Loop for MIMD Machine.

for i := 0 step 1 until 2000 #loop runs until i value less than 2000

begin

for j := 0 step 1 until 3000 #loop runs until j value less than 3000

for ADD;

ADD:

X_array[i][j]:=Y_array[j][i]+200; #Calculate value of X using array addition

join;

end;

03

Loop for SIMD Machine.

for j := 0 step 1 until 3000 #loop runs for j value less than 3000

begin

X_array[i][j] := Y_array[j][i] + 200, (0<i<2000); #Calculate value of X using array addition.

end;

The number of instructions executed on the SIMD Machine is lesser than that of on the MIMD Machine.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

B.28 [10] <§B.6> Now calculate the relative performance of adders. Assume that hardware corresponding to any equation containing only OR or AND terms, such as the equations for pi and gi on page B-40, takes one time unit T. Equations that consist of the OR of several AND terms, such as the equations for c1, c2, c3, and c4 on page B-40, would thus take two time units, 2T. The reason is it would take T to produce the AND terms and then an additional T to produce the result of the OR. Calculate the numbers and performance ratio for 4-bit adders for both ripple carry and carry lookahead. If the terms in equations are further defined by other equations, then add the appropriate delays for those intermediate equations, and continue recursively until the actual input bits of the adder are used in an equation. Include a drawing of each adder labeled with the calculated delays and the path of the worst-case delay highlighted.

AMD has recently announced that they will be integrating a graphics processing unit with their x86 cores in a single package, though with different clocks for each of the cores. This is an example of a heterogeneous multiprocessor system which we expect to see produced commercially in the near future. One of the key design points will be to allow for fast data communication between the CPU and the GPU. Presently communications must be performed between discrete CPU and GPU chips. But this is changing in AMDs Fusion architecture. Presently the plan is to use multiple (at least 16) PCI express channels for facilitate intercommunication. Intel is also jumping into this arena with their Larrabee chip. Intel is considering to use their QuickPath interconnect technology.

6.15.1Compare the bandwidth and latency associated with these two interconnect technologies.

Question: Consider the following portions of two different programs running at the same time on four processors in a symmetric multi-core processor (SMP). Assume that before this code is run, both x and y are 0.

Core 1: x = 2;

Core 2: y = 2;

Core 3: w = x + y + 1;

Core 4: z = x + y;

6.7.1 [10] What are all the possible resulting values of w, x, y, and z? For each possible outcome, explain how we might arrive at those values. You will need to examine all possible interleaving’s of instructions

6.7.2 [5] How could you make the execution more deterministic so that only one set of values is possible?

Assume we want to execute the DAXPY loop show on page 511 in MIPS assembly on the NVIDIA 8800 GTX GPU described in this chapter. In this problem, we will assume that all math operations are performed on single-precision floating point numbers (we will rename the loop SAXPY). Assume that instructions take the following number of cycles to execute.

Loads

Stores

Add.S

Mult.S

5

2

3

4

6.13.1Describe how you will constructs warps for the SAXPY loop to exploit the 8 cores provided in a single multiprocessor.

Question: 6.19 In future systems, we expect to see heterogeneous computing platforms constructed out of heterogeneous CPUs. We have begun to see some appear in the embedded processing market in systems that contain both floating point DSPs and a microcontroller CPUs in a multichip module package. Assume that you have three classes of CPU: CPU A—A moderate speed multi-core CPU (with a floating point unit) that can execute multiple instructions per cycle. CPU B—A fast single-core integer CPU (i.e., no floating point unit) that can execute a single instruction per cycle. CPU C—A slow vector CPU (with floating point capability) that can execute multiple copies of the same instruction per cycle. 6.16 Exercises 573 574 Chapter 6 Parallel Processors from Client to Cloud Assume that our processors run at the following frequencies: CPU A CPU B CPU C 1 GHz 3 GHz 250 MHz CPU A can execute 2 instructions per cycle, CPU B can execute 1 instruction per cycle, and CPU C can execute 8 instructions (though the same instruction) per cycle. Assume all operations can complete execution in a single cycle of latency without any hazards. All three CPUs have the ability to perform integer arithmetic, though CPU B cannot perform floating point arithmetic. CPU A and B have an instruction set similar to a MIPS processor. CPU C can only perform floating point add and subtract operations, as well as memory loads and stores. Assume all CPUs have access to shared memory and that synchronization has zero cost. The task at hand is to compare two matrices X and Y that each contain 1024 × 1024 floating point elements. The output should be a count of the number indices where the value in X was larger or equal to the value in Y.

6.19.1 [10] Describe how you would partition the problem on the 3 different CPUs to obtain the best performance.

6.19.2 [10] What kind of instruction would you add to the vector CPU C to obtain better performance?

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free