Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Assume we want to execute the DAXPY loop show on page 511 in MIPS assembly on the NVIDIA 8800 GTX GPU described in this chapter. In this problem, we will assume that all math operations are performed on single-precision floating point numbers (we will rename the loop SAXPY). Assume that instructions take the following number of cycles to execute.

Loads

Stores

Add.S

Mult.S

5

2

3

4

6.13.1Describe how you will constructs warps for the SAXPY loop to exploit the 8 cores provided in a single multiprocessor.

Short Answer

Expert verified

6.13.1

The first step is to characterize the DAXPY loop for the core values that can be calculated with the scaling and adding the intrinsic functions for each statement.

Loop:

daxpy (int saxID, double *p,double *q,*r, double b)

saxSize=SIZE/N

saxS=chuck_id*saxSize

saxE=saxS+saxSize

for(i=saxS;i<sax;i++)

r[i]=b*p[i]+q[i]

Step by step solution

01

Determine DAXPY LOOP

When the MIPS instruction set architecture is extended with vector instructions and the vector registers, vector operations use same names as MIPS with “V” appended to it. The input for vector instructions is taken by a vector register of scalar register.

Y=a×X+Y

In the above expression, X and Y are vectors of 64 double precision floating point numbers initially resident in memory and “a” is a scalar double precision variable. This is called DAXPY loop, stands for double precisiona×XplusY .

02

Determine the SAXPY Loop code

6.13.1

The first step is to characterize the DAXPY loop for the core values that can be calculated with the scaling and adding the intrinsic functions for each statement. Then the damp loop for exploiting the subroutines which will fetch the default library for each individual processor and implements the large performance routines. The loop is given below.

Loop:

daxpy (int saxID, double *p,double *q,*r, double b)

saxSize=SIZE/N

saxS=chuck_id*saxSize

saxE=saxS+saxSize

for(i=saxS;i<sax;i++)

r[i]=b*p[i]+q[i]

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Figure B.8.8 on page B-55 illustrates the implementation of the register file for the MIPS datapath. Pretend that a new register file is to be built, but that there are only two registers and only one read port, and that each register has only 2 bits of data. Redraw Figure B.8.8 so that every wire in your diagram corresponds to only 1 bit of data (unlike the diagram in Figure B.8.8, in which some wires are 5 bits and some wires are 32 bits). Redraw the registers using D flipflops. You do not need to show how to implement a D flip-flop or a multiplexor

Consider the following recursive mergesort algorithm (another classic divide and conquer algorithm). Mergesort was first described by John Von Neumann in 1945. The basic idea is to divide an unsorted list x of m elements into two sublists of about half the size of the original list. Repeat this operation on each sublist, and continue until we have lists of size 1 in length. Then starting with sublists of length 1, “merge” the two sublists into a single sorted list.

Mergesort(m)

var list left, right, result

if length(m) <= 1

return m

else

var middle = length(m) / 2

for each x in m up to middle

add x to left

for each x in m after middle

add x to right

left = Mergesort(left)

2

right = Mergesort(right)

result = Merge(left, right)

return result

}

The merge step is carried out by the following pesudocode:

Merge(left, right)

var list result

while length(left) > 0 and length(right) > 0

if first(left) <= first(right)

append first(left) to result

left = rest(left)

else

append first(right) to result

right = rest(right)

if length(left) > 0

append rest(left) to result

if length(right) > 0

append rest(right) to result

return result

}

6.5.1 [10] Assume that you have Y cores on a multi-core processor to run MergeSort. Assuming that Y is much smaller than length(m), express the speedup factor you might expect to obtain for values of Y and length(m). Plot these on a graph.

6.5.2 [10] Next, assume that Y is equal to length (m). How would this affect your conclusions in your previous answer? If you were asked with obtaining the best speedup factor possible (i.e., strong scaling), explain how you might change this code to obtain it.

Implement a switching network that has two data inputs (Aand B), two data outputs (Cand D), and a control input (S). If Sequals 1, the network is in pass-through mode, and Cshould equal A, and Dshould equal B. If Sequals 0, the network is in crossing mode, and Cshould equal B, and Dshould equal A.

Rewrite the code for fact to use fewer instructions.

Consider the following portions of two different programs running at the same time on four processors in a symmetric multi-core processor (SMP). Assume that before this code is run, both x and y are 0.

Core 1: x = 2;

Core 2: y = 2;

Core 3: w = x + y + 1;

Core 4: z = x + y;

6.7.1 [10] What are all the possible resulting values of w, x, y, and z? For each possible outcome, explain how we might arrive at those values. You will need to examine all possible interleaving’s of instructions

6.7.2 [5] How could you make the execution more deterministic so that only one set of values is possible?

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free