Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Chapter 6: Appendix B, 36 (page 500)

Figure B.8.8 on page B-55 illustrates the implementation of the register file for the MIPS datapath. Pretend that a new register file is to be built, but that there are only two registers and only one read port, and that each register has only 2 bits of data. Redraw Figure B.8.8 so that every wire in your diagram corresponds to only 1 bit of data (unlike the diagram in Figure B.8.8, in which some wires are 5 bits and some wires are 32 bits). Redraw the registers using D flipflops. You do not need to show how to implement a D flip-flop or a multiplexor.

Short Answer

Expert verified

The figure B8.8 would be redrawn for 2 -bit register file using D flipflops is as follows:

Step by step solution

01

Determine register file

A register file will have two read ports and the one write port. Five inputs will be given to the register file, whereas the outputs will be two. A register file will have the set of registers that can be read and written. The array of registers can be build using the D flip-flops. For reading register there will be no state change occur. For writing, it needs three inputs such as register number, write data and the clock signal.

02

Redraw the fig B.8.8. by using D flip-flops

Given that the new register file has two registers and only one read port. Each registers have only 2 bits of data.

Refer Fig B.8.8 to know the implementation of a register file.

The redrawn diagram using D flip-flops with one bit data.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

AMD has recently announced that they will be integrating a graphics processing unit with their x86 cores in a single package, though with different clocks for each of the cores. This is an example of a heterogeneous multiprocessor system which we expect to see produced commercially in the near future. One of the key design points will be to allow for fast data communication between the CPU and the GPU. Presently communications must be performed between discrete CPU and GPU chips. But this is changing in AMDs Fusion architecture. Presently the plan is to use multiple (at least 16) PCI express channels for facilitate intercommunication. Intel is also jumping into this arena with their Larrabee chip. Intel is considering to use their QuickPath interconnect technology.

6.15.1Compare the bandwidth and latency associated with these two interconnect technologies.

B.28 [10] <§B.6> Now calculate the relative performance of adders. Assume that hardware corresponding to any equation containing only OR or AND terms, such as the equations for pi and gi on page B-40, takes one time unit T. Equations that consist of the OR of several AND terms, such as the equations for c1, c2, c3, and c4 on page B-40, would thus take two time units, 2T. The reason is it would take T to produce the AND terms and then an additional T to produce the result of the OR. Calculate the numbers and performance ratio for 4-bit adders for both ripple carry and carry lookahead. If the terms in equations are further defined by other equations, then add the appropriate delays for those intermediate equations, and continue recursively until the actual input bits of the adder are used in an equation. Include a drawing of each adder labeled with the calculated delays and the path of the worst-case delay highlighted.

Consider the following three CPU organizations:

CPU SS: A 2-core superscalar microprocessor that provides out-of-order issue

capabilities on 2 function units (FUs). Only a single thread can run on each core at a time.

CPU MT: A fine-grained multithreaded processor that allows instructions from 2 threads to be run concurrently (i.e., there are two functional units), though only instructions from a single thread can be issued on any cycle.

CPU SMT: An SMT processor that allows instructions from 2 threads to be run

concurrently (i.e., there are two functional units), and instructions from either or both threads can be issued to run on any cycle.

Assume we have two threads X and Y to run on these CPUs that include the

following operations:

Thread X Thread Y

A1 – takes 3 cycles to execute B1 – take 2 cycles to execute

A2 – no dependences B2 – conflicts for a functional unit with B1

A3 – conflicts for a functional unit with A1 B3 – depends on the result of B2

A4 – depends on the result of A3 B4 – no dependences and takes 2 cycles to execute

Assume all instructions take a single cycle to execute unless noted oterwise or they encounter a hazard.

6.9.1 [10] <§6.4> Assume that you have 1 SS CPU. How many cycles will it take to execute these two threads? How many issue slots are wasted due to hazards?

6.9.2 [10] <§6.4> Now assume you have 2 SS CPUs. How many cycles will it take to execute these two threads? How many issue slots are wasted due to hazards?

6.9.3 [10] <§6.4> Assume that you have 1 MT CPU. How many cycles will it take

to execute these two threads? How many issue slots are wasted due to hazards?

Consider the following recursive mergesort algorithm (another classic divide and conquer algorithm). Mergesort was first described by John Von Neumann in 1945. The basic idea is to divide an unsorted list x of m elements into two sublists of about half the size of the original list. Repeat this operation on each sublist, and continue until we have lists of size 1 in length. Then starting with sublists of length 1, “merge” the two sublists into a single sorted list.

Mergesort(m)

var list left, right, result

if length(m) <= 1

return m

else

var middle = length(m) / 2

for each x in m up to middle

add x to left

for each x in m after middle

add x to right

left = Mergesort(left)

2

right = Mergesort(right)

result = Merge(left, right)

return result

}

The merge step is carried out by the following pesudocode:

Merge(left, right)

var list result

while length(left) > 0 and length(right) > 0

if first(left) <= first(right)

append first(left) to result

left = rest(left)

else

append first(right) to result

right = rest(right)

if length(left) > 0

append rest(left) to result

if length(right) > 0

append rest(right) to result

return result

}

6.5.1 [10] Assume that you have Y cores on a multi-core processor to run MergeSort. Assuming that Y is much smaller than length(m), express the speedup factor you might expect to obtain for values of Y and length(m). Plot these on a graph.

6.5.2 [10] Next, assume that Y is equal to length (m). How would this affect your conclusions in your previous answer? If you were asked with obtaining the best speedup factor possible (i.e., strong scaling), explain how you might change this code to obtain it.

Assume we want to execute the DAXPY loop show on page 511 in MIPS assembly on the NVIDIA 8800 GTX GPU described in this chapter. In this problem, we will assume that all math operations are performed on single-precision floating point numbers (we will rename the loop SAXPY). Assume that instructions take the following number of cycles to execute.

Loads

Stores

Add.S

Mult.S

5

2

3

4

6.13.1Describe how you will constructs warps for the SAXPY loop to exploit the 8 cores provided in a single multiprocessor.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free