Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Question: 6.18 When performing computations on sparse matrices, latency in the memory hierarchy becomes much more of a factor. Sparse matrices lack the spatial locality in the data stream typically found in matrix operations. As a result, new matrix representations have been proposed. One the earliest sparse matrix representations is the Yale Sparse Matrix Format. It stores an initial sparse m × n matrix, M in row form using three one-dimensional arrays. Let R be the number of nonzero entries in M. We construct an array A of length R that contains all nonzero entries of M (in left -to-right top-to-bottom order). We also construct a second array IA of length m + 1 (i.e., one entry per row, plus one). IA(i) contains the index in A of the first nonzero element of row i. Row i of the original matrix extends from A(IA(i)) to A(IA(i+1)−1). The third array, JA, contains the column index of each element of A, so it also is of length R.

6.18.1 [15] consider the sparse matrix X below and write C code that would store this code in Yale Sparse Matrix Format.

Row 1 [1, 2, 0, 0, 0, 0]

Row 2 [0, 0, 1, 1, 0, 0]

Row 3 [0, 0, 0, 0, 9, 0]

Row 4 [2, 0, 0, 0, 0, 2]

Row 5 [0, 0, 3, 3, 0, 7]

Row 6 [1, 3, 0, 0, 0, 1]

6.18.2 [10] In terms of storage space, assuming that each element in matrix X is single precision floating point, compute the amount of storage used to store the Matrix above in Yale Sparse Matrix Format.

6.18.3 [15] Perform matrix multiplication of Matrix X by Matrix Y shown below. [2, 4, 1, 99, 7, 2] Put this computation in a loop, and time its execution. Make sure to increase the number of times this loop is executed to get good resolution in your timing measurement. Compare the runtime of using a naïve representation of the matrix, and the Yale Sparse Matrix Format.

6.18.4 [15] Can you find a more efficient sparse matrix representation (in terms of space and computational overhead)?

Short Answer

Expert verified

6.18.1

The desired C code :

#include<stdio.h>

int main(){

int m,n;

int YaleSparseMatrix[6][6]={

{0,1,2,3,20,21},

{4,5,6,7,22,23},

{8,9,10,11,24,25},

{12,13,14,15,26,27},

{16,17,18,19,28,29},

{30,31,32,33,34,35}

};

for(m=0; m<6; m++) {

for(n=0;n<6;n++) {

printf("%d ", YaleSparseMatrix[m][n]);

}

}

return 0;

}

6.18.2

The matrix is as same as the 111 bytes of memory.

6.18.3

There is also included the outcome for both the brute force and a calculation that is computed by using the Yale sparse matrix format.

Production:

0

0

0

0

0

0

8

16

4

396

28

8

16

32

8

792

56

16

24

48

12

1188

84

24

32

64

16

1584

112

32

60

120

30

2970

210

60

6.18.4

There are a number of more efficient formats but there are also marginal impacts due to the small matrices that are used in this specified problem.

Step by step solution

01

Define the concept.

6.18.2

It is assumed that each floating-point number consumes four bytes of memory and for the index, consumes memory of two bytes as it is stored in the data type of short unsigned integer. Hence, the matrix is as same as the 111 bytes of memory.

6.18.3

The matrix X =

The matrix Y =

And the c code for checking the production:

#include <stdio.h>

#include <stdlib.h>

#define Row1 6

#define Column1 6

#define Row2 1

#define Column2 6

void Production(int matrixX[][Column1], int matrixY[][Column2])

{

int result[Row1][Column2];

for (int i = 0; i < Row1; i++) {

for (int j = 0; j < Column2; j++) {

result[i][j] = 0;

for (int k = 0; k < Row2; k++) {

result[i][j] += matrixX[i][k] * matrixY[k][j];

}

printf("%d\t", result[i][j]);

}

printf("\n");

}

}

int main(void) {

int matrixX[Row1][Column1] = {

{0,1,2,3,20,21},

{4,5,6,7,22,23},

{8,9,10,11,24,25},

{12,13,14,15,26,27},

{16,17,18,19,28,29},

{30,31,32,33,34,35}

};

int matrixY[Row2][Column2] = {

{2, 4, 1, 99, 7, 2}

};

if (Column1 != Column2) {

printf("error!as C1 not equal to C2\n");

exit(EXIT_FAILURE);

}

Production(matrixX, matrixY);

return 0;

}

6.18.4

There are a number of more efficient formats but there are also marginal impacts due to the small matrices that are used in this specified problem.

02

Determine the calculation.

6.18.1

The desired C code:

#include<stdio.h>

int main(){

int m,n;

for(m=0; m<6; m++) {

for(n=0;n<6;n++) {

printf("%d ", YaleSparseMatrix[m][n]);

}

}

return 0;

}

6.18.2

The required storage must be as same as the times of the .

.Where “R” is referred to the non-zero elements’ number and “m” is referred to the number of rows.

6.18.3

The c code for checking the production of the two specified matrix “X” and “Y”:

#include <stdio.h>

#include <stdlib.h>

#define Row1 6

#define Column1 6

#define Row2 1

#define Column2 6

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

A.7 [5] Using SPIM, write and test a program that reads in three integers and prints out the sum of the largest two of the three. Use the SPIM system calls described on pages A-43 and A-45. You can break ties arbitrarily.

Perhaps the most likely case of adding many numbers at once in a computer would be when trying to multiply more quickly by using any adders to add many numbers in a single clock cycle. Compared to the multiply algorithm in Chapter 3, a carry save scheme with many adders could multiply more than 10 times faster. This exercise estimates the cost and speed of a combinational multiplier to multiply two positive 16-bit numbers. Assume that you have 16 intermediate terms M15, M14, …, M0, called partial products, that contain the multiplicand ANDed with multiplier bits m15, m14, …, m0. The idea is to use carry save adders to reduce the noperands into 2n/3 in parallel groups of three, and do this repeatedly until you get two large numbers to add together with a traditional adder.

First, show the block organization of the 16-bit carry save adders to add these 16 terms, as shown on the right in Figure B.14.1. Then calculate the delays to add these 16 numbers. Compare this time to the iterative multiplication scheme in Chapter 3 but only assume 16 iterations using a 16-bit adder that has full carry lookahead whose speed was calculated in Exercise B.29.

B.26 [5] <§B.6> Rewrite the equations on page B-44 for a carry-lookahead logic for a 16-bit adder using a new notation. First, use the names for the CarryIn signals of the individual bits of the adder. That is, use c4, c8, c12, … instead of C1, C2, C7, …. In addition, let Pi,j; mean a propagate signal for bits i to j, and Gi,j; mean a generate signal for bits i to j. For example, the equation

C2 = G1+( P1.G0)+( P1.P0. c0) can be rewritten as

c8= G 7,4 + (P7,4 .G7,0) +( P7,4 .P3,0.c0)

This more general notation is useful in creating wider adders.

Assume we want to execute the DAXPY loop show on page 511 in MIPS assembly on the NVIDIA 8800 GTX GPU described in this chapter. In this problem, we will assume that all math operations are performed on single-precision floating point numbers (we will rename the loop SAXPY). Assume that instructions take the following number of cycles to execute.

Loads

Stores

Add.S

Mult.S

5

2

3

4

6.13.1Describe how you will constructs warps for the SAXPY loop to exploit the 8 cores provided in a single multiprocessor.

Question: 6.16 Refer to Figure 6.14b, which shows an n-cube interconnect topology of order 3 that interconnects 8 nodes. One attractive feature of an n-cube interconnection network topology is its ability to sustain broken links and still provide connectivity.

6.16.1 [10] Develop an equation that computes how many links in the n-cube (where n is the order of the cube) can fail and we can still guarantee an unbroken link will exist to connect any node in the n-cube. 6.16.2 [10] Compare the resiliency to failure of n-cube to a fully-connected interconnection network. Plot a comparison of reliability as a function of the added number of links for the two topologies.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free