Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Chapter 6: Appendix B, 32 (page 500)

Perhaps the most likely case of adding many numbers at once in a computer would be when trying to multiply more quickly by using any adders to add many numbers in a single clock cycle. Compared to the multiply algorithm in Chapter 3, a carry save scheme with many adders could multiply more than 10 times faster. This exercise estimates the cost and speed of a combinational multiplier to multiply two positive 16-bit numbers. Assume that you have 16 intermediate terms M15, M14, …, M0, called partial products, that contain the multiplicand ANDed with multiplier bits m15, m14, …, m0. The idea is to use carry save adders to reduce the noperands into 2n/3 in parallel groups of three, and do this repeatedly until you get two large numbers to add together with a traditional adder.

First, show the block organization of the 16-bit carry save adders to add these 16 terms, as shown on the right in Figure B.14.1. Then calculate the delays to add these 16 numbers. Compare this time to the iterative multiplication scheme in Chapter 3 but only assume 16 iterations using a 16-bit adder that has full carry lookahead whose speed was calculated in Exercise B.29.

Short Answer

Expert verified

Block organization of the 16-bit carry save address to add 16 terms is as follows:

Time consumed=296T

Speed up=8.22

Step by step solution

01

Determine the faster addition

The faster addition can be performed by finding the carry in the higher order bits . The faster method of finding carry will speedup the addition. “Infinite” Hardware can be used to find the fast carry. But this increases the expenses that spent in hardware units. Carry-lookahead adder is capable of finding the fast carry with speed improvements.

02

Determine the block organization and speed up.

Given:

Combinational multiplier to multiply two 16-bit numbers is introduced.

Assume that there are 16 intermediate terms M15 to M0, called partial products that contain the multiplican ANDed with multiplier bits m15 to m0.

This idea will be used to carry save adders to reduce the n operands to 2n/3 in parallel groups of three.

This will generate two larger numbers that will be added in the traditional adder.

The block diagram , that shows the 16 bit carry save adders to add 16 terms is shown below:

From, the block organization, it is depicted that the partial products are grouped in each level. Each group will be processed in parallel with four carry save adders in the level 1.

Likewise , sum of the larger numbers will be found in the level 7.

We know that , there are 7 levels involved in the calculation of the sum of 16 partial products.

From, level 1 to 6 each level will consume 2T time to perform sum.In level 7 carry look ahead adder will consume around 8T time.

The total time consumed will be:

2T×6+24T=36T

If the multiplication is carried out using traditional look ahead adder, Then the time consumed would be:

For

M0 : 16 bits = 8T

M1 : 17 bits = 12T

M2 : 18 bits = 12T

M3 : 19 bits = 12T

M4 : 20 bits = 12T

M5 : 21 bits = 16T

M6 : 22 bits = 16T

M7 : 23 bits = 16T

M8 : 24 bits = 20T

M9 : 25 bits = 20T

M10 : 26 bits = 20T

M11 : 27 bits = 20T

M12 : 28 bits = 20T

M13 : 29 bits = 24T

M14 : 30 bits = 24T

M15 : 31 bits = 24T

M16 : 32 bits = 24T

Above is the calculation for time consumed for each partial product on the traditional look ahead adder.

Total time consumed will be:

T=8T+12T×4+16T×4+20T×4+24T×4=8T+48T+64T+80T+96T=296T

Speed up is the ratio between the carry save adder orgranization and the traditional look ahead adder.

Speedup=29636=8.22

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Question: 6.17 Benchmarking is field of study that involves identifying representative workloads to run on specific computing platforms in order to be able to objectively compare performance of one system to another. In this exercise we will compare two classes of benchmarks: the Whetstone CPU benchmark and the PARSEC Benchmark suite. Select one program from PARSEC. All programs should be freely available on the Internet. Consider running multiple copies of Whetstone versus running the PARSEC Benchmark on any of systems described in Section 6.11.

6.17.1 [60] what is inherently different between these two classes of workload when run on these multi-core systems?

6.17.2 [60] In terms of the Roofline Model, how dependent will the results you obtain when running these benchmarks be on the amount of sharing and synchronization present in the workload used?

First, write down a list of the daily activities that you typically do on a weekday. For instance, you might get out of bed, take a shower, get dressed, eat breakfast, dry your hair, and brush your teeth. Make sure to break down your list so you have a minimum of 10 activities.

6.1.1 Now consider which of these activities is already exploiting some form of parallelism (e.g., brushing multiple teeth at the same time, versus one at a time, carrying one book at a time to school, versus loading them all into your backpack and then carry them “in parallel”). For each of your activities, discuss if they are already working in parallel, but if not, why they are not.

6.1.2 Next, consider which of the activities could be carried out concurrently (e.g., eating breakfast and listening to the news ). For each of your activities, describe which other activity could be paired with this activity.

6.1.3 For 6.1.2, what could we change about current systems (e.g., showers, clothes, TVs, cars) so that we could perform more tasks in parallel?

6.1.4 Estimate how much shorter time it would take to carry out these activities if you tried to carry out as many tasks in parallel as possible.

Implement the four functions described in Exercise B.11 using a PLA

Assume that X consists of 3 bits : x2, x1, x0. Write four logic functions that are true if and only if

  • X contains only one 0
  • X contains an even number of 0s
  • X when interpreted as an unsigned binary number is less than 4
  • X when interpreted as a signed (two’s complement) number is negative.

Assume we want to execute the DAXPY loop show on page 511 in MIPS assembly on the NVIDIA 8800 GTX GPU described in this chapter. In this problem, we will assume that all math operations are performed on single-precision floating point numbers (we will rename the loop SAXPY). Assume that instructions take the following number of cycles to execute.

Loads

Stores

Add.S

Mult.S

5

2

3

4

6.13.1Describe how you will constructs warps for the SAXPY loop to exploit the 8 cores provided in a single multiprocessor.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free