Chapter 6: Q6E (page 568)

Question: Matrix multiplication plays an important role in a number of applications. Two matrices can only be multiplied if the number of columns of the first matrix is equal to the number of rows in the second. Let’s assume we have an m × n matrix A and we want to multiply it by an n × p matrix B. We can express their product as an m × p matrix denoted by AB (or A ⋅ B). If we assign C = AB, and ci,j denotes the entry in C at position (i, j), then for each element i and j with 1 ≤ i ≤ m and 1 ≤ j ≤ p. Now we want to see if we can parallelize the computation of C. Assume that matrices are laid out in memory sequentially as follows: a1,1, a2,1, a3,1, a4,1, …, etc
6.6.1 [10] Assume that we are going to compute C on both a single-core shared memory machine and a 4-core shared-memory machine. Compute the speedup we would expect to obtain on the 4-core machine, ignoring any memory issues.
6.6.2 [10] Repeat Exercise 6.6.1, assuming that updates to C incur a cache miss due to false sharing when consecutive elements are in a row (i.e., index i) are updated.
6.6.3 [10] How would you fix the false sharing issue that can occur?

Short Answer

Expert verified

As the matrix multiplication is carried on both single-core shared memory and 4-core shared memory. The speedup factor will be close to the 4-core shared memory.
Because of the cache miss, the speed-up factor will be reduced by three times the cost of serving a cache miss.
Traversing the matrix across columns instead of rows to compute the elements in C can help to fix the false sharing that can occur.

Step by step solution

Matrix Multiplication

The matrix multiplication gives a matrice from both matrixes. To apply matrix multiplication on both matrices, the number of rows in the second matrice must be equal to the no. of columns in the second matrice.

The matrix multiplication is used in many areas like Linear systems, population modeling, and Network theory. In the above question, the matrices are laid in the memory and the matrice multiplication is parallelized. The computation is done by the single-core shared memory and 4 core shared multiprocessor. The shared memory multiprocessor is also called an SMP provides a single physical address to all the processors in the machine. Even if the processors have a shared memory address, they can compute their operations using their virtual address spaces. The processors in the shared memory multiprocessor communicate through the shared variables in the memory and access the memory location via the loads and stores mechanism.

Speed up factor of Matrix Multiplication

The matrix multiplication contains two steps which are addition and multiplication. Now let us assume we have a matrice A of size $m x n$ and matrice B of size nxp . The resulting matrix from the multiplication of A and B will be of size mxp. The computation involved in additions are . The computation involved in the multiplication of the matric multiplication is . For every element in the C, the above operation should be completed. we cant add the result of the multiplication for the element until we got two products. So the speed up must be very close to the 4-core shared multiprocessor.

Chache missed due to false sharing

The computations in the matrix multiplication are addition and multiplication. Every time an element in matrix C is updated, both the operations addition and multiplication should be done. When updating the element a cache miss also will occur. A cache miss is a situation when the processor wants to retrieve the data but it’s not present, in this case, while updating the elements in matrix C. As the cache miss occurred every time the update occurs in the matrix C, it will reduce the speed-up obtained by the factor of three times the cost of servicing a cache miss

Fix for False sharing problem

False sharing problem occurs when the threads or the processors unwillingly disturb the other processor and ultimately impact the performance of each other while they modify the independent variables but have the same shared address space. To complete without false sharing we must ensure that no two processors were trying to write or read at the same address. In the above situation, False sharing can be avoided by using the j instead of i which means traversing the matrix across the columns instead of rows to compute the elements in C. Due to this the element will be mapped to different cache lines. We have to make sure that the matrice index is computed and (i, j) on the same core at this stage and it will help to eliminate the false sharing.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

add.d	I.d	s.d	addiu
4	6	1	2

Short Answer

Step by step solution

Matrix Multiplication

Speed up factor of Matrix Multiplication

Chache missed due to false sharing

Fix for False sharing problem

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Computer Science Textbooks

Issues in Computer Science

Game Design in Computer Science

Data Structures

Data Representation in Computer Science

Theory of Computation

Databases

Study anywhere. Anytime. Across all devices.

Company

Product

Help