Chapter 6: 13 (page 571)
Assume we want to execute the DAXPY loop show on page 511 in MIPS assembly on the NVIDIA 8800 GTX GPU described in this chapter. In this problem, we will assume that all math operations are performed on single-precision floating point numbers (we will rename the loop SAXPY). Assume that instructions take the following number of cycles to execute.
Loads | Stores | Add.S | Mult.S |
5 | 2 | 3 | 4 |
6.13.1Describe how you will constructs warps for the SAXPY loop to exploit the 8 cores provided in a single multiprocessor.
Short Answer
6.13.1
The first step is to characterize the DAXPY loop for the core values that can be calculated with the scaling and adding the intrinsic functions for each statement.
Loop:
daxpy (int saxID, double *p,double *q,*r, double b)
saxSize=SIZE/N
saxS=chuck_id*saxSize
saxE=saxS+saxSize
for(i=saxS;i<sax;i++)
r[i]=b*p[i]+q[i]