Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

5.2 Caches are important to providing a high-performance memory hierarchy to processors. Below is a list of 32-bit memory address references, given as word addresses.
3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253


5.2.1 [10] <§5.3> For each of these references, identify the binary address, the tag,
and the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty.


5.2.2 [10] <§5.3> For each of these references, identify the binary address, the tag,
and the index given a direct-mapped cache with two-word blocks and a total size of 8 blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty.


5.2.3 [20] < §§5.3, 5.4> You are asked to optimize a cache design for the given
references. There are three direct-mapped cache designs possible, all with a total of 8 words of data: C1 has 1-word blocks, C2 has 2-word blocks, and C3 has 4-word blocks. In terms of miss rate, which cache design is the best? If the miss stall time is 25 cycles, and C1 has an access time of 2 cycles, C2 takes 3 cycles, and C3 takes 5 cycles, which is the best cache design?


There are many different design parameters that are important to a cache’s overall performance. Below are listed parameters for different direct-mapped cache designs.


Cache Data Size: 32 KiB


Cache Block Size: 2 words


Cache Access Time: 1 cycle


5.2.4 [15] < §5.3> Calculate the total number of bits required for the cache listed
above, assuming a 32-bit address. Given that total size, find the total size of the closest direct-mapped cache with 16-word blocks of equal size or greater. Explain why the second cache, despite its larger data size, might provide slower performance than the first cache.


5.2.5 [20] <§§5.3, 5.4> Generate a series of read requests that have a lower miss rate on a 2 KiB 2-way set associative cache than the cache listed above. Identify one possible solution that would make the cache listed have an equal or lower miss rate than the 2KiB cache. Discuss the advantages and disadvantages of such a solution.


5.2.6 [15] <§5.3> Th e formula shown in Section 5.3 shows the typical method to
index a direct-mapped cache, specifically (Block address) modulo (Number of blocks in the cache). Assuming a 32-bit address and 1024 blocks in the cache, consider a different indexing function, specifically (Block address [31:27] XOR Block address [26:22]). Is it possible to use this to index a direct-mapped cache? If so, explain why and discuss any changes that might need to be made to the cache. If it is not possible, explain why.

Short Answer

Expert verified

5.2.1

Address of word

Address of binary

Tag

Index

Hit/Miss

3

00000011

0

3

M

180

10110100

11

4

M

43

00101011

2

11

M

2

00000010

0

2

M

191

10111111

11

15

M

88

01011000

5

8

M

190

10111110

11

14

M

14

00001110

0

14

M

181

10110101

11

5

M

44

00101100

2

12

M

186

10111010

11

10

M

253

11111101

15

13

M

5.2.2

Address of word

Address of binary

Tag

Index

Hit/Miss

3

00000011

0

1

M

180

10110100

11

2

M

43

00101011

2

5

M

2

00000010

0

1

H

191

10111111

11

7

M

88

01011000

5

4

M

190

10111110

11

7

H

14

00001110

0

7

M

181

10110101

11

2

H

44

00101100

2

6

M

186

10111010

11

5

M

253

11111101

15

6

M

5.2.3

Word address

Binary address

Tag

Cache 1

Index hit/miss

Cache 2

Index hit/miss

Cache 3

Index hit/miss

3

0000 0011

0

3

M

1

M

0

M

180

1011 0100

22

4

M

2

M

1

M

43

0010 1011

5

3

M

1

M

0

M

2

0000 0010

0

2

M

1

M

0

M

191

1011 1111

23

7

M

3

M

1

M

88

0101 1000

11

0

M

0

M

0

M

190

1011 1110

23

6

M

3

H

1

H

14

0000 1110

1

6

M

3

M

1

M

181

1011 0101

22

5

M

2

H

1

M

44

0010 1100

5

4

M

2

M

1

M

186

1011 1010

23

2

M

1

M

0

M

253

1111 1101

31

5

M

2

M

1

M

Cache 2 provides the best performance.

5.2.4

Total cache size =41984

5.2.5

0,32768,0,32768,0, 32768…would miss on every access.

5.2.6

Yes, it is possible.

Step by step solution

01

Determine the cache basics:

The cache is the temporary memory, where the values can be placed while execution. A cache structure in which each memory location is mapped to one cache is known as a direct-mapped cache. If a data is requested to a cache and if the data is not available, then it is called a cache miss. If the requested data is available, then it is called hit.

02

Determine the list with the referred details.

5.2.1

Given list of 32-bit address references, as word addresses are:

3,180,43,2,191,88,190,14,181,44,186,253.

The binary address is the conversion of the word address to binary. The tag and index are the numbers to which the memory is mapped to the cache. Hit or miss depends in the data availability.

Address of word

Address of binary

Tag

Index

Hit/Miss

3

00000011

0

3

M

180

10110100

11

4

M

43

00101011

2

11

M

2

00000010

0

2

M

191

10111111

11

15

M

88

01011000

5

8

M

190

10111110

11

14

M

14

00001110

0

14

M

181

10110101

11

5

M

44

00101100

2

12

M

186

10111010

11

10

M

253

11111101

15

13

M

03

The direct-mapped cache with two-word blocks

5.2.2

The list with binary address, tag, index, hit, miss for the 8 block size is as follows:

Address

Of word

Address of binary

Tag

Index

Hit/Miss

3

00000011

0

1

M

180

10110100

11

2

M

43

00101011

2

5

M

2

00000010

0

1

H

191

10111111

11

7

M

88

01011000

5

4

M

190

10111110

11

7

H

14

00001110

0

7

M

181

10110101

11

2

H

44

00101100

2

6

M

186

10111010

11

5

M

253

11111101

15

6

M

04

Determine the best cache design in terms of miss rate and total cycles

5.2.3

Given that there are three direct-mapped cache designs possible, all with a total of 8

words of data: C1 has 1-word blocks, C2 has 2-word blocks, and C3 has 4-word blocks.

The miss stall time is 25 cycles, and C1 has an access time of 2 cycles, C2 takes 3 cycles, and C3 takes 5 cycles.

Address of word

Address of binary

Tag

Index

Hit/Miss

Index

Hit/miss

Index

Hit/Miss

3

00000011

0

3

M

1

M

0

M

180

10110100

22

4

M

2

M

1

M

43

00101011

5

3

M

1

M

0

M

2

00000010

0

2

M

1

M

0

M

191

10111111

23

7

M

3

M

1

M

88

01011000

11

0

M

0

M

0

M

190

10111110

23

6

M

3

H

1

H

14

00001110

1

6

M

3

M

1

M

181

10110101

22

5

M

2

H

1

M

44

00101100

5

4

M

2

M

1

M

186

10111010

23

2

M

1

M

0

M

253

11111101

31

5

M

2

M

1

M

Cache 1 miss rate= 100%

Cache 1 all cycles=12×25+12×2=324

Cache 2 miss rate=1012=83%

Cache 2 all cycles= 10×25+12×3=286

Cache 3 miss rate= 1112=92%

Cache 3 all cycles= 11×25+12×5=335

From, the above results, Cache 2 provides the best performance based on the miss rate and the total cycles.

05

The closest direct-mapped cache with 16-word blocks of equal size or greater.

5.2.4

The unit of cache blocks in the first cache setup must first be determined. We do this by multiplying 32 KiB by 4. (For the number of bytes per word) and by 2(for the number of words per block).

As a result, we have 4096 blocks and a 12 bit index field width. A word offset size of 1 bit and a byte offset size of 2 bits are also available .As a result, the tag field size is 32-15=17 bits. These tag bits, along with one valid bit per block, will require 18×4096=73728bits or 9216 bytes. The total cache size is thus 9216+32768=41984 bytes.

The all cache size can be approximated as follows:

Totalsize = datasize +( validbitsize + tagsize ) ×blocks

Totalsize =41984

Datasize = blocks× blocksize× wordsize

Wordsize = 4

size of tag= 32- log2(blocks)-log2( blocksize )-log2( wordsize )

Validbitsize =1

Increasing the unit of word blocks from 2 to 16 reduces the tag size from 17 to 14 bits.

We answer the inequality to determine the number of blocks

41984<=64×blocks+15×blocks

Solving this inequality gives us 531 blocks, and rounding to the next power of two gives us a 1024-block cache.

The larger block size may require an increased hit time and an increased miss penalty than the original cache. The fewer number of blocks may cause a higher conflict miss rate than the original cache.

06

Identify one possible solution

5.2.5

Associative caches are intended to limit the number of conflicts that are missed. As a result, a series of read requests with the same 12-bit index field but distinct tag fields will result in a large unit of misses.

For the cache described above, the sequence 0,32768,0,32768,0, 32768…would miss on every access, while a 2-way set associate cache with LRU replacement, even one with a significantly smaller overall capacity, would hit on every access after the first two.

The disadvantage of this set up is that with the smaller capacity, it results in larger unit of misses. The advantage is that 2-way set associative need a smaller capacity to deal with the cache miss."

07

Is it possible to use this to index a direct-mapped cache

5.2.6

Yes, it is possible to index the cache using this function. However, when the bits are XOR'd, information about the five bits is lost, therefore you'll need to include extra tag bits to identify the cache address.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Recall that we have two write policies and write allocation policies, and their combinations can be implemented either in L1 or L2 cache. Assume the following choices for L1 and L2 caches:

L1

L2

Write through, non-write allocate

Write back, write allocate

5.4.1 Buffers are employed between different levels of memory hierarchy to reduce access latency. For this given configuration, list the possible buffers needed between L1 and L2 caches, as well as L2 cache and memory.

5.4.2 Describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block.

5.4.3 For a multilevel exclusive cache (a block can only reside in one of the L1 and L2 caches), configuration, describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block

Consider the following program and cache behaviors.

Data Reads per 100 Instructions

Data writes per 1000 Instructions

Instruction Cache Miss Rate

Data Cache Miss Rate

Block Size(byte)

250

100

0.30%

2%

64%

5.4.4 For a write-through, write-allocate cache, what are the minimum read and write bandwidths (measured by byte per cycle) needed to achieve a CPI of 2?

5.4.5 For a write-back, write-allocate cache, assuming 30% of replaced data cache blocks are dirty, what are the minimal read and write bandwidths needed for a CPI of 2?

5.4.6 What are the minimal bandwidths needed to achieve the performance of CPI=1.5?

Question: This Exercise examines the single error correcting, double error detecting (SEC/DED) Hamming code.

(5.9.1) What is the minimum number of parity bits required to protect a 128-bit word using the SEC/DED code?

(5.9.2) Section 5.5 states that modern server memory modules (DIMMs) employ SEC/DED ECC to protect each 64 bits with 8 parity bits. Compute the cost/performance ratio of this code to the code from 5.9.1. In this case, cost is the relative number of parity bits needed while performance is the relative number of errors that can be corrected. Which is better?

(5.9.3) Consider a SEC code that protects 8-bit words with 4 parity bits. If we read the value 0x375, is there an error? If so, correct the error.

This exercise examines the impact of different cache designs, specifically comparing associative caches to the direct-mapped caches from Section 5.4. For these exercises, refer to the address stream shown in Exercise 5.2.

(5.7.1) Using the sequence of references from Exercise 5.2, show the final cache contents for a three-way set associative cache with two- word blocks and a total size of 24 words. Use LRU replacement. For each reference identify the index bits, the tag bits, the block offset bits, and if it is a hit or a miss.

(5.7.2) Using the references from Exercise 5.2, show that final cache contents for a fully associative cache with one-word blocks and a total size of 8 words. Use LRU replacement. For each reference identify the index bits, the tag bits, and if it is a hit or a miss.

(5.7.3) Using the references from Exercise 5.2, what is the miss rate for a fully associative cache with two-word blocks and a total size of 8 words, using LRU replacement? What is the miss rate using MRU (most recently used) replacement? Finally what is the best possible miss rate for this cache, given any replacement policy?

Multilevel caching is an important technique to overcome the limited amount of space that a first level cache can provide while still maintaining its speed. Consider a processor with the following parameters:

Base CPI, No Memory Stalls

Processor Speed

Main Memory Access Time

First Level Cache MissRate per Instruction

Second Level Cache, Direct-Mapped Speed

Global Miss Rate with Second Level Cache, Direct-Mapped

Second Level Cache, Eight-Way Set Associative Speed

Global Miss Rate with Second Level Cache, Eight-Way Set Associative

1.5

2 GHz

100 ns

7%

12 cycles

3.5%

28 cycles

1.5%

(5.7.4) Calculate the CPI for the processor in the table using: 1) only a first level cache, 2) a second level direct-mapped cache, and 3) a second level eight-way set associative cache. How do these numbers change if main memory access time is doubled? If it is cut in half?

(5.7.5) It is possible to have an even greater cache hierarchy than two levels. Given the processor above with a second level, direct-mapped cache, a designer wants to add a third level cache that takes 50 cycles to access and will reduce the global miss rate to 1.3%. Would this provide better performance? In general, what are the advantages and disadvantages of adding a third level cache?

(5.7.6) In older processors such as the Intel Pentium or Alpha 21264, the second level of cache was external (located on a different chip) from the main processor and the first level cache. While this allowed for large second level caches, the latency to access the cache was much higher, and the bandwidth was typically lower because the second level cache ran at a lower frequency. Assume a 512 KiB off-chip second level cache has a global miss rate of 4%. If each additional 512 KiB of cache lowered global miss rates by 0.7%, and the cache had a total access time of 50 cycles, how big would the cache have to be to match the performance of the second level direct-mapped cache listed above? Of the eight way-set associative cache?

In this exercise, we will examine how replacement policies impact miss rate. Assume a 2-way set associative cache with 4 blocks. To solve the problems in this exercise, you may find it helpful to draw a table like the one below, as demonstrated for the address sequence “0, 1, 2, 3, 4”.

Address of Memory Block Accessed

Hit or Miss

Evicted Block

Contents of Cache Blocks After Reference

Set 0

Set 0

Set 1

Set 1

0

Miss

Mem[0]

1

Miss

Mem[0]

Mem[1]

2

Miss

Mem[0]

Mem[2]

Mem[1]

3

Miss

Mem[0]

Mem[2]

Mem[1]

Mem[3]

4

Miss

0

Mem[4]

Mem[2]

Mem[1]

Mem[3]

Consider the following address sequence: 0, 2, 4, 8, 10, 12, 14, 16, 0

(5.13.1) Assuming an LRU replacement policy, how many hits does this address sequence exhibit?

(5.13.2) Assuming an MRU (most recently used) replacement policy, how many hits does this address sequence exhibit?

(5.13.3) Simulate a random replacement policy by flipping a coin. For example, “heads” means to evict the first block in a set, and “tails” means to evict the second block in a set. How many hits does this address sequence exhibit?

(5.13.4) Which address should be evicted at each replacement to maximize the number of hits? How many does this address sequence exhibit if you follow this “optimal” policy?

(5.13.5) Describe why it is difficult to implement a cache replacement policy that is optimal for all address sequences.

(5.13.6) Assume you could make a decision upon each memory reference whether or not you want the requested address to be cached. What impact could this have on the miss rate?

Cache coherence concerns the views of multiple processors on a given cache block. The following data shows two processors and their read/write operations on two different words of a cache block X (initially X[0] = X[1] = 0). Assume the size of integers is 32 bits.

P1

P2

X0++;X1=3

X0=5;X1+=2;

5.17.1 List the possible values of the given cache block for a correct cache coherence protocol implementation. List at least one more possible value of the block if the protocol doesn’t ensure cache coherency.

5.17.2 For a snooping protocol, list a valid operation sequence on each processor/cache to finish the above read/write operations.

5.17.3 What are the best-case and worst-case numbers of cache misses

needed to execute the listed read/write instructions?

Memory consistency concerns the views of multiple data items. The following data shows two processors and their read/write operations on different cache blocks (A and B initially 0).

P1

P2

A=1;B-2;A+=2;B++;

C=B;D=A;

5.17.4 List the possible values of C and D for an implementation that ensures both consistency assumptions on page 470.

5.17.5List at least one more possible pair of values for C and D if such assumptions are not maintained.

5.17.6 For various combinations of write policies and write allocation policies, which combinations make the protocol implementation simpler?

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free