Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

In this exercise, we will examine how replacement policies impact miss rate. Assume a 2-way set associative cache with 4 blocks. To solve the problems in this exercise, you may find it helpful to draw a table like the one below, as demonstrated for the address sequence “0, 1, 2, 3, 4”.

Address of Memory Block Accessed

Hit or Miss

Evicted Block

Contents of Cache Blocks After Reference

Set 0

Set 0

Set 1

Set 1

0

Miss

Mem[0]

1

Miss

Mem[0]

Mem[1]

2

Miss

Mem[0]

Mem[2]

Mem[1]

3

Miss

Mem[0]

Mem[2]

Mem[1]

Mem[3]

4

Miss

0

Mem[4]

Mem[2]

Mem[1]

Mem[3]

Consider the following address sequence: 0, 2, 4, 8, 10, 12, 14, 16, 0

(5.13.1) Assuming an LRU replacement policy, how many hits does this address sequence exhibit?

(5.13.2) Assuming an MRU (most recently used) replacement policy, how many hits does this address sequence exhibit?

(5.13.3) Simulate a random replacement policy by flipping a coin. For example, “heads” means to evict the first block in a set, and “tails” means to evict the second block in a set. How many hits does this address sequence exhibit?

(5.13.4) Which address should be evicted at each replacement to maximize the number of hits? How many does this address sequence exhibit if you follow this “optimal” policy?

(5.13.5) Describe why it is difficult to implement a cache replacement policy that is optimal for all address sequences.

(5.13.6) Assume you could make a decision upon each memory reference whether or not you want the requested address to be cached. What impact could this have on the miss rate?

Short Answer

Expert verified

(5.13.1)

With LRU replacement, there is 0 hit with the address sequence 0, 2, 4, 8, 10, 12, 14, 16, 0.

(5.13.2)

With MRU replacement, there is 1 hit with the address sequence 0, 2, 4, 8, 10, 12, 14, 16, 0.

(5.13.3)

With random replacement policy, there is 0 or 1 hit with the address sequence 0, 2, 4, 8, 10, 12, 14, 16, 0.

(5.13.4)

Any replacement policy is optimal if it gives 1 hit.

(5.13.5)

It is difficult to implement a cache replacement policy that is optimal for all address sequences because the cache doesn’t know the future address sequences.

(5.13.6)

The miss rate could decrease if an address with less temporal locality is replaced. Otherwise, the miss rate could increase too.

Step by step solution

01

Identify the number of hits using LRU replacement

(5.13.1)

In LRU replacement, the least recently used memory block is replaced. The address sequence is 0, 2, 4, 8, 10, 12, 14, 16, 0.

All these address sequences go in set 0. The table for the given address sequences is given below:

Address of Memory Block Accessed

Hit or Miss

Evicted Block

Contents of Cache Blocks After Reference

Set 0

Set 0

Set 1

Set 1

0

Miss

Mem[0]

2

Miss

Mem[0]

Mem[2]

4

Miss

0

Mem[4]

Mem[2]

8

Miss

2

Mem[4]

Mem[8]

10

Miss

4

Mem[10]

Mem[8]

12

Miss

8

Mem[10]

Mem[12]

14

Miss

10

Mem[14]

Mem[12]

16

Miss

12

Mem[14]

Mem[16]

0

Miss

14

Mem[0]

Mem[16]

Thus, there is no hit.

02

Identify the number of hits using MRU replacement

(5.13.2)

In MRU replacement, the most recently used memory block is replaced. The address sequence is 0, 2, 4, 8, 10, 12, 14, 16, 0.

All these address sequences go in set 0. The table for the given address sequences is given below:

Address of Memory Block Accessed

Hit or Miss

Evicted Block

Contents of Cache Blocks After Reference

Set 0

Set 0

Set 1

Set 1

0

Miss

Mem[0]

2

Miss

Mem[0]

Mem[2]

4

Miss

2

Mem[0]

Mem[4]

8

Miss

4

Mem[0]

Mem[8]

10

Miss

8

Mem[0]

Mem[10]

12

Miss

10

Mem[0]

Mem[12]

14

Miss

12

Mem[0]

Mem[14]

16

Miss

14

Mem[0]

Mem[16]

0

Hit

Mem[0]

Mem[16]

Thus, the number of hits is 1.

03

Identify the number of hits using a random replacement policy

(5.13.3)

The random replacement policy is flipping a coin. There is only one repeated memory address in the given sequence. Using a random replacement policy, the maximum number of hits could be 1 (for the memory address reference 0). Thus, the number of hits could be 0 or 1.

04

Identify the number of hits using optimal policy

(5.13.4)

Using any replacement policy, the maximum hits are 1. This is because of the reason that in the address sequence, only 0 is referenced again. Therefore, any address replacement is optimal if it gives one hit.

05

Identify the difficulty to implement an optimal replacement policy

(5.13.5)

The optimal replacement policy should cause a smaller number of misses. For a smaller number of misses, that block should be replaced which is not used in the future. But the future references of address are not known to the cache. So, only a good prediction can be made. The design of an optimal cache replacement policy is difficult to design.

06

Identify the impact of making a decision about caching a requested address or not on the miss rate

(5.13.6)

The miss rate can be improved by knowing the temporal locality of the memory block. If the temporal locality of the block is less, then it can be removed from the cache and this reduces the number of misses. On the other hand, one could choose the wrong addresses to replace and this could lead to high misses.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Question: For a high-performance system such as a B-tree index for a database, the page size is determined mainly by the data size and disk performance. Assume that on average a B-tree index page is 70% full with fix-sized entries. The utility of a page is its B-tree depth, calculated as. The following table shows that for 16-byte entries, and a 10-year-old disk with a 10-year-old disk with a 10 ms latency and 10 MB/s transfer rate, the optimal page size is 16K.

Page Size (KiB)

Page Utility or B-Tree Depth (Number of Disk Accesses Saved)

Index Page Access Cost (ms)

Utility/Cost

2

6.49 (or)

10.2

0.64

4

7.49

10.4

0.72

8

8.49

10.8

0.79

16

9.49

11.6

0.82

32

10.49

13.2

0.79

64

11.49

16.4

0.70

128

12.49

22.8

0.55

256

13.49

35.6

0.38

(5.10.1) What is the best page size if entries now become 128 bytes?

(5.10.2) Based on 5.10.1, what is the best page size if pages are half full?

(5.10.3) Based on 5.10.2, what is the best page size if using a modern disk with a 3 ms latency and 100 MB/s transfer rate? Explain why future servers are likely to have larger pages.

Keeping “frequently used” (or “hot”) pages in DRAM can save disk accesses, but how do we determine the exact meaning of “frequently used” for a given system? Data engineers use the cost ratio between DRAM and disk access to quantify the reuse time threshold for hot pages. The cost of a disk access is \(Disk/accesses_per_sec, while the cost to keep a page in DRAM is \)DRAM_MiB/page _size. The typical DRAM and disk costs and typical database page sizes at several time points are listed below:

Year

DRAM Cost (\(/MiB)

Page Size (KiB)

Disk Cost (\)/disk)

Disk Access Rate (access/sec)

1987

5000

1

15,000

15

1997

15

8

2000

64

2007

0.05

64

80

83

(5.10.4) What are the reuse time thresholds for these three technology generations?

(5.10.5) What are the reuse time thresholds if we keep using the same 4K page size? What’s the trend here?

(5.10.6) What other factors can be changed to keep using the same page size (thus avoiding software rewrite)? Discuss their likeliness with current technology and cost trends.

5.2 Caches are important to providing a high-performance memory hierarchy to processors. Below is a list of 32-bit memory address references, given as word addresses.
3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253


5.2.1 [10] <§5.3> For each of these references, identify the binary address, the tag,
and the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty.


5.2.2 [10] <§5.3> For each of these references, identify the binary address, the tag,
and the index given a direct-mapped cache with two-word blocks and a total size of 8 blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty.


5.2.3 [20] < §§5.3, 5.4> You are asked to optimize a cache design for the given
references. There are three direct-mapped cache designs possible, all with a total of 8 words of data: C1 has 1-word blocks, C2 has 2-word blocks, and C3 has 4-word blocks. In terms of miss rate, which cache design is the best? If the miss stall time is 25 cycles, and C1 has an access time of 2 cycles, C2 takes 3 cycles, and C3 takes 5 cycles, which is the best cache design?


There are many different design parameters that are important to a cache’s overall performance. Below are listed parameters for different direct-mapped cache designs.


Cache Data Size: 32 KiB


Cache Block Size: 2 words


Cache Access Time: 1 cycle


5.2.4 [15] < §5.3> Calculate the total number of bits required for the cache listed
above, assuming a 32-bit address. Given that total size, find the total size of the closest direct-mapped cache with 16-word blocks of equal size or greater. Explain why the second cache, despite its larger data size, might provide slower performance than the first cache.


5.2.5 [20] <§§5.3, 5.4> Generate a series of read requests that have a lower miss rate on a 2 KiB 2-way set associative cache than the cache listed above. Identify one possible solution that would make the cache listed have an equal or lower miss rate than the 2KiB cache. Discuss the advantages and disadvantages of such a solution.


5.2.6 [15] <§5.3> Th e formula shown in Section 5.3 shows the typical method to
index a direct-mapped cache, specifically (Block address) modulo (Number of blocks in the cache). Assuming a 32-bit address and 1024 blocks in the cache, consider a different indexing function, specifically (Block address [31:27] XOR Block address [26:22]). Is it possible to use this to index a direct-mapped cache? If so, explain why and discuss any changes that might need to be made to the cache. If it is not possible, explain why.

In this exercise, we will look at the different ways capacity affects overall performance. In general, cache access time is proportional to capacity. Assume that main memory accesses take 70 ns and that memory accesses are 36% of all instructions. The following table shows data for L1 caches attached to each of two processors, P1 and P2.

L1 Size

L1 Miss Rate

L1 Hit Time

P1

2 KiB

8.0%

0.66 ns

P2

4 KiB

6.0%

0.90 ns

(5.6.1) Assuming that the L1 hit time determines the cycle times for P1 and P2, what are their respective clock rates?

(5.6.2) What is the Average Memory Access Time for P1 and P2?

(5.6.3) Assuming a base CPI of 1.0 without any memory stalls, what is the total Cpi for P1 and P2? Which processor is faster?

For the next three problems, we will consider the addition of an L2 cache to P1 to presumably make up for its limited L1 cache capacity. Use the L1 cache capacities and hit times from the previous table when solving these problems. The L2 miss rate indicated is its local miss rate.

L2 Size

L2 Miss Rate

L2 Hit Time

1 MiB

95%

5.62 ns

(5.6.4) What is the AMAT for P1 with the addition of an L2 cache? Is the AMAT better or worse with the L2 cache?

(5.6.5) Assuming a base CPI of 1.0 without any memory stalls, what is the total CPI for P1 with the addition of an L2 cache?

(5.6.6) Which processor is faster, now that P1 has an L2 cache? If P1 is faster, what miss rate would P2 need in its L1 cache to match P1’s performance? If P2 is faster, what miss rate would P1 need in its L1 cache to match P2’s performance?

Cache coherence concerns the views of multiple processors on a given cache block. The following data shows two processors and their read/write operations on two different words of a cache block X (initially X[0] = X[1] = 0). Assume the size of integers is 32 bits.

P1

P2

X0++;X1=3

X0=5;X1+=2;

5.17.1 List the possible values of the given cache block for a correct cache coherence protocol implementation. List at least one more possible value of the block if the protocol doesn’t ensure cache coherency.

5.17.2 For a snooping protocol, list a valid operation sequence on each processor/cache to finish the above read/write operations.

5.17.3 What are the best-case and worst-case numbers of cache misses

needed to execute the listed read/write instructions?

Memory consistency concerns the views of multiple data items. The following data shows two processors and their read/write operations on different cache blocks (A and B initially 0).

P1

P2

A=1;B-2;A+=2;B++;

C=B;D=A;

5.17.4 List the possible values of C and D for an implementation that ensures both consistency assumptions on page 470.

5.17.5List at least one more possible pair of values for C and D if such assumptions are not maintained.

5.17.6 For various combinations of write policies and write allocation policies, which combinations make the protocol implementation simpler?

For a direct-mapped cache design with a 32-bit address, the following bits of the address are used to access the cache.

Tag

Index

offset

31-10

9-5

4-0

5.3.1 What is the cache block size (in words)?

5.3.2 How many entries does the cache have?

5.3.3 What is the ratio between total bits required for such a cache implementation over the data storage bits?

Starting from power on, the following byte-addressed cache references are recorded.

ADDRESS

0

4

16

132

232

160

1024

30

140

3100

180

2180

5.3.4 How many blocks are replaced?

5.3.5 What is the hit ratio?

5.3.6 List the final state of the cache, with each valid entry represented as a record of <index, tag, data>

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free