Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Question: 6.19 In future systems, we expect to see heterogeneous computing platforms constructed out of heterogeneous CPUs. We have begun to see some appear in the embedded processing market in systems that contain both floating point DSPs and a microcontroller CPUs in a multichip module package. Assume that you have three classes of CPU: CPU A—A moderate speed multi-core CPU (with a floating point unit) that can execute multiple instructions per cycle. CPU B—A fast single-core integer CPU (i.e., no floating point unit) that can execute a single instruction per cycle. CPU C—A slow vector CPU (with floating point capability) that can execute multiple copies of the same instruction per cycle. 6.16 Exercises 573 574 Chapter 6 Parallel Processors from Client to Cloud Assume that our processors run at the following frequencies: CPU A CPU B CPU C 1 GHz 3 GHz 250 MHz CPU A can execute 2 instructions per cycle, CPU B can execute 1 instruction per cycle, and CPU C can execute 8 instructions (though the same instruction) per cycle. Assume all operations can complete execution in a single cycle of latency without any hazards. All three CPUs have the ability to perform integer arithmetic, though CPU B cannot perform floating point arithmetic. CPU A and B have an instruction set similar to a MIPS processor. CPU C can only perform floating point add and subtract operations, as well as memory loads and stores. Assume all CPUs have access to shared memory and that synchronization has zero cost. The task at hand is to compare two matrices X and Y that each contain 1024 × 1024 floating point elements. The output should be a count of the number indices where the value in X was larger or equal to the value in Y.

6.19.1 [10] Describe how you would partition the problem on the 3 different CPUs to obtain the best performance.

6.19.2 [10] What kind of instruction would you add to the vector CPU C to obtain better performance?

Short Answer

Expert verified

6.19.1

The required code for the partitioning way of the mentioned problem:

if (x[i][j] < y[i][j])

count=count+1;

6.19.2

As the processor (vector) haven’t any kind of instructions for comparisons, so it is needed to perform (CPU-A) the two conditional jumps that are parallel completely depending on the register of floating-point.

Step by step solution

01

Define the concept.

6.19.1

This line “if (x[i][j] < y[i][j])” is used for checking if x[i][j] is greater than the y[i][j].

If the above condition is true then this line “count=count+1” will be executed and the purpose of this line is to increment the count value by one.

6.19.2

Given that the “x” and “y” are two matrices that can able to contain1024 × 1024 floating-point elements.

It is needed to utilize the CPU-C processor (the vector) for issuing the two required loads.

02

Determine the calculation.

6.19.1

The required code for the partitioning way of the mentioned problem:

if (x[i][j] < y[i][j]) // for checking the condition

count=count+1; // if the above condition is satisfied the count value increases.

6.19.2

The eight elements of the matrix are in the form of parallel from “A” and also the eight elements of the matrix are in the form of parallel from “B” to the one register (vector).

After that, the subtraction of the vector will be performed.

Then it is needed to issue the store of two vectors for putting the outcomes to the memory storage.

As the processor (vector) haven’t any kind of instructions for comparisons, so it is needed to perform (CPU-A) the two conditional jumps that are parallel completely depending on the register of floating-point.

Hence, two counts are incremented depending on the comparison of conditions.

After all, it can be possible to add the required two counts for the specified two matrices without using the core B.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Question Title: Q2E

Question:You are trying to bake 3 blueberry pound cakes. Cake ingredients are as follows:

1 cup butter, softened

1 cup sugar

4 large eggs

1 teaspoon vanilla extract

1/2 teaspoon salt

1/4 teaspoon nutmeg

1 1/2 cups flour

1 cup blueberries

The recipe for a single cake is as follows:

Step 1: Preheat oven to 325°F (160°C). Grease and flour your cake pan.

Step 2: In large bowl, beat together with a mixer of butter and sugar at medium speed until light and fluffy. Add eggs, vanilla, salt, and nutmeg. Beat until thoroughly blended. Reduce mixer speed to low and add flour, 1/2 cup at a time, beating just until blended.

Step 3: Gently fold in blueberries. Spread evenly in a prepared baking pan. Bake for 60 minutes.

6.2.1 Your job is to cook 3 cakes as efficiently as possible. Assuming that you only have one oven large enough to hold one cake, one large bowl, one cake pan, and one mixer, come up with a schedule to make three cakes as quickly as possible. Identify the bottlenecks in completing this task.

6.2.2 Assume now that you have three bowls, 3 cake pans, and 3 mixers. How much faster is the process now that you have additional resources?

6.2.3 Assume now that you have two friends that will help you cook, and that you have a large oven that can accommodate all three cakes. How will this change the schedule you arrived at in Exercise 6.2.1 above?

6.2.4 Compare the cake-making task to computing 3 iterations of a loop on a parallel computer. Identify data-level parallelism and task-level parallelism in the cake-making loop.

A.7 [5] Using SPIM, write and test a program that reads in three integers and prints out the sum of the largest two of the three. Use the SPIM system calls described on pages A-43 and A-45. You can break ties arbitrarily.

Rewrite the code for fact to use fewer instructions.

A.8 [5] Using SPIM, write and test a program that reads in a positive integer using the SPIM system calls. If the integer is not positive, the program should terminate with the message “Invalid Entry”; otherwise the program should print out the names of the digits of the integers, delimited by exactly one space. For example, if the user entered “728,” the output would be “Seven Two Eight.”

Consider the following three CPU organizations:

CPU SS: A 2-core superscalar microprocessor that provides out-of-order issue

capabilities on 2 function units (FUs). Only a single thread can run on each core at a time.

CPU MT: A fine-grained multithreaded processor that allows instructions from 2 threads to be run concurrently (i.e., there are two functional units), though only instructions from a single thread can be issued on any cycle.

CPU SMT: An SMT processor that allows instructions from 2 threads to be run

concurrently (i.e., there are two functional units), and instructions from either or both threads can be issued to run on any cycle.

Assume we have two threads X and Y to run on these CPUs that include the

following operations:

Thread X Thread Y

A1 – takes 3 cycles to execute B1 – take 2 cycles to execute

A2 – no dependences B2 – conflicts for a functional unit with B1

A3 – conflicts for a functional unit with A1 B3 – depends on the result of B2

A4 – depends on the result of A3 B4 – no dependences and takes 2 cycles to execute

Assume all instructions take a single cycle to execute unless noted oterwise or they encounter a hazard.

6.9.1 [10] <§6.4> Assume that you have 1 SS CPU. How many cycles will it take to execute these two threads? How many issue slots are wasted due to hazards?

6.9.2 [10] <§6.4> Now assume you have 2 SS CPUs. How many cycles will it take to execute these two threads? How many issue slots are wasted due to hazards?

6.9.3 [10] <§6.4> Assume that you have 1 MT CPU. How many cycles will it take

to execute these two threads? How many issue slots are wasted due to hazards?

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free