Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

For a direct-mapped cache design with a 32-bit address, the following bits of the address are used to access the cache.

Tag

Index

offset

31-10

9-5

4-0

5.3.1 What is the cache block size (in words)?

5.3.2 How many entries does the cache have?

5.3.3 What is the ratio between total bits required for such a cache implementation over the data storage bits?

Starting from power on, the following byte-addressed cache references are recorded.

Address
041613223216010243014031001802180

5.3.4 How many blocks are replaced?

5.3.5 What is the hit ratio?

5.3.6 List the final state of the cache, with each valid entry represented as a record of <index, tag, data>

Short Answer

Expert verified

5.3.1

8 words

5.3.2

32 entries

5.3.3

Ratio

5.3.4

Number of blocks replaced =3

5.3.5

Hit ratio is

5.3.6

the final state of cache

<000000,0001,mem[1024]><000001,0011,mem[3088]><001011,0000,mem[176]><001000,0010,mem[2176]><001110,0000,mem[224]><001010,0000,mem[160]>

Step by step solution

01

Determine the formulae.

The formula for calculating block size

Cache block size = 2offset bits .……(1)

The formula for calculating entries does the cache have

Entries = 2index bits …….(2)

The formula for finding the hit ratio

Hitratio=numberofhitstotalnumberofhits..…..(3)

02

Determine the cache block size in words

5.3.1

Cache block size = 2offset bits

= 25 = 32 bytes

Here we use byte addressing with 4-byte words

So, the cache block size is 8 words

03

Determine the entries in the cache

5.3.2

Entries = 2index bits

= 25= 32bytes

04

Determine the ratio of the total bits required for such a cache implementation to the total bits required for data storage

5.3.3

Ratio=totalbitsforcachedatastoragebits=[128×32×8+20+1][128×32×8]=277256

05

Determine the number of blocks replaced

5.3.4

Block address = referenced address

Line ID = Number of blocks in cache

Address

0

4

16

132

232

160

1024

30

140

3100

180

2180

Line ID

0

0

1

8

14

10

0

1

8

1

11

8

Hit/Miss

M

H

M

M

M

M

M

M

H

M

M

M

Replace

N

N

N

N

N

N

Y

N

N

Y

N

Y

Number of blocks replaced =3

06

Determine the hit ratio.

5.3.5

Total number of hits = 12

Number of hits = 3

Hitratio=NumberofhitsTotalnumberofhits

So, the hit ratio is312=14

07

Determine the final state of cache

Final state: <index, tag, data>

In final state the tag and index are represent in binary

<000000,0001,mem[1024]><000001,0011,mem[3088]><001011,0000,mem[176]><001000,0010,mem[2176]><001110,0000,mem[224]><001010,0000,mem[160]>

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

In this exercise, we will look at the different ways capacity affects overall performance. In general, cache access time is proportional to capacity. Assume that main memory accesses take 70 ns and that memory accesses are 36% of all instructions. The following table shows data for L1 caches attached to each of two processors, P1 and P2.

L1 Size

L1 Miss Rate

L1 Hit Time

P1

2 KiB

8.0%

0.66 ns

P2

4 KiB

6.0%

0.90 ns

(5.6.1) Assuming that the L1 hit time determines the cycle times for P1 and P2, what are their respective clock rates?

(5.6.2) What is the Average Memory Access Time for P1 and P2?

(5.6.3) Assuming a base CPI of 1.0 without any memory stalls, what is the total Cpi for P1 and P2? Which processor is faster?

For the next three problems, we will consider the addition of an L2 cache to P1 to presumably make up for its limited L1 cache capacity. Use the L1 cache capacities and hit times from the previous table when solving these problems. The L2 miss rate indicated is its local miss rate.

L2 Size

L2 Miss Rate

L2 Hit Time

1 MiB

95%

5.62 ns

(5.6.4) What is the AMAT for P1 with the addition of an L2 cache? Is the AMAT better or worse with the L2 cache?

(5.6.5) Assuming a base CPI of 1.0 without any memory stalls, what is the total CPI for P1 with the addition of an L2 cache?

(5.6.6) Which processor is faster, now that P1 has an L2 cache? If P1 is faster, what miss rate would P2 need in its L1 cache to match P1’s performance? If P2 is faster, what miss rate would P1 need in its L1 cache to match P2’s performance?

Cache coherence concerns the views of multiple processors on a given cache block. The following data shows two processors and their read/write operations on two different words of a cache block X (initially X[0] = X[1] = 0). Assume the size of integers is 32 bits.

P1

P2

X0++;X1=3

X0=5;X1+=2;

5.17.1 List the possible values of the given cache block for a correct cache coherence protocol implementation. List at least one more possible value of the block if the protocol doesn’t ensure cache coherency.

5.17.2 For a snooping protocol, list a valid operation sequence on each processor/cache to finish the above read/write operations.

5.17.3 What are the best-case and worst-case numbers of cache misses

needed to execute the listed read/write instructions?

Memory consistency concerns the views of multiple data items. The following data shows two processors and their read/write operations on different cache blocks (A and B initially 0).

P1

P2

A=1;B-2;A+=2;B++;

C=B;D=A;

5.17.4 List the possible values of C and D for an implementation that ensures both consistency assumptions on page 470.

5.17.5List at least one more possible pair of values for C and D if such assumptions are not maintained.

5.17.6 For various combinations of write policies and write allocation policies, which combinations make the protocol implementation simpler?

Media applications that play audio or video files are part of a class of workloads called “streaming” workloads; i.e., they bring in large amounts of data but do not reuse much of it. Consider a video streaming workload that accesses a 512 KiB working set sequentially with the following address stream:

0, 2, 4, 6, 8, 10, 12, 14, 16, …

5.5.1 Assume a 64 KiB direct-mapped cache with a 32-byte block. What is the miss rate for the address stream above? How is this miss rate sensitive to the size of the cache or the working set? How would you categorize the misses this workload is experiencing, based on the 3C model?

5.5.2 Re-compute the miss rate when the cache block size is 16 bytes, 64 bytes, and 128 bytes. What kind of locality is this workload exploiting?

5.5.3 “Prefetching” is a technique that leverages predictable address patterns to speculatively bring in additional cache blocks when a particular cache block is accessed. One example of prefetching is a stream buffer that prefetches sequentially adjacent cache blocks into a separate buffer when a particular cache block is brought in. If the data is found in the prefetch buffer, it is considered as a hit and moved into the cache and the next cache block is prefetched. Assume a two-entry stream buffer and assume that the cache latency is such that a cache block can be loaded before the computation on the previous cache block is completed. What is the miss rate for the address stream above?

Cache block size (B) can affect both miss rate and miss latency. Assuming a 1-CPI machine with an average of 1.35 references (both instruction and data) per instruction, help find the optimal block size given the following miss rates for various block sizes.

8;4%
16:3%
32:2%
64:1.5%
128:1%

5.5.4 What is the optimal block size for a miss latency of 20×B cycles?

5.5.5 What is the optimal block size for a miss latency of 24+B cycles?

5.5.6 For constant miss latency, what is the optimal block size

Question: As described in Section 5.7, virtual memory uses a page table to track the mapping of virtual addresses to physical addresses. This exercise shoes how this table must be updated as addresses are accessed. The following data constitutes a stream of virtual addresses as seen on a system. Assume 4 KiB pages, a 4-entry fully associative TLB, and true LRU replacement. If pages must be brought in from disk, increment the next largest page number.

4669, 2227, 13916, 34587, 48870, 12608, 49225

TLB

Valid

Tag

Physical Page Number

1

11

12

1

7

4

1

3

6

0

4

9

Page table

Valid

Physical Page or in Disk

1

5

0

Disk

0

Disk

1

6

1

9

1

11

0

Disk

1

4

0

Disk

0

Disk

1

3

1

12

(5.11.1) Given the address stream shown, and the initial TLB and page table states provided above, show the final state of the system. Also list for each reference if it is a hit in the TLB, a hit in the page table, or a page fault.

(5.11.2) Repeat 5.11.1, but this time use 16 KiB pages instead of 4 KiB pages. What would be some of the advantages of having a larger page size? What are some of the disadvantages?

(5.11.3) Show the final contents of the TLB if it is 2-way set associative. Also show the contents of the TLB if it is direct mapped. Discuss the importance of having a TLB to high performance. How would virtual memory accesses be handles if there were no TLB?

There are several parameters that impact the overall size of the page table. Listed below are key page parameters.

Virtual Address Size

Page Size

Page Table Entry Size

32 bits

8 KiB

4 bytes

(5.11.4) Given the parameters shown above, calculate the total page table size for a system running 5 applications that utilize half of the memory available.

(5.11.5) Given the parameters shown above, calculate the total page table size for a system running 5 applications that utilize half of the memory available, given a two level page table approach with 256 entries. Assume each entry of the main page table is 6 bytes. Calculate the minimum amount of memory required.

(5.11.6) A cache designer wants to increase the size of a 4 KiB virtually indexed, physically tagged cache. Given the page size shown above, is it possible to make a 16 KiB direct-mapped cache, assuming 2 words per block? How would the designer increase the data size of the cache?

Question: In this exercise, we will examine space/time optimizations for page tables. The following list provides parameters of a virtual memory system.

Virtual Address (bits)

Physical DRAM Installed

Page Size

PTE Size (byte)

43

16 GiB

4KiB

4

(5.12.1) For a single-level page table, how many page table entries (PTEs) are needed? How much physical memory is needed for storing the page table?

(5.12.2) Using a multilevel page table can reduce the physical memory consumption of page tables, by keeping active PTEs in physical memory. How many levels of page tables will be needed in this case? And how many memory references are needed for address translation if missing in TLB?

(5.12.3) An inverted page table can be used to further optimize space and time. How many PTEs are needed to store the page table? Assuming a hash table implementation, what are the common case and worst case numbers of memory references needed for servicing a TLB miss?

The following table shows the contents of a 4-entry TLB.

Entry-ID

Valid

VA Page

Modified

Protection

PA Page

1

1

140

1

RW

30

2

0

40

0

RX

34

3

1

200

1

RO

32

4

1

280

0

RW

31

(5.12.4) Under what scenarios would entry 2’s valid bit be set to zero?

(5.12.5) What happens when an instruction writes to VA page 30? When would software managed TLB be faster than hardware managed TLB?

(5.12.6) What happens when an instruction writes to VA page 200?

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free