Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Recall that we have two write policies and write allocation policies, and their combinations can be implemented either in L1 or L2 cache. Assume the following choices for L1 and L2 caches:

L1

L2

Write through, non-write allocate

Write back, write allocate

5.4.1 Buffers are employed between different levels of memory hierarchy to reduce access latency. For this given configuration, list the possible buffers needed between L1 and L2 caches, as well as L2 cache and memory.

5.4.2 Describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block.

5.4.3 For a multilevel exclusive cache (a block can only reside in one of the L1 and L2 caches), configuration, describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block

Consider the following program and cache behaviors.

Data Reads per 100 Instructions

Data writes per 1000 Instructions

Instruction Cache Miss Rate

Data Cache Miss Rate

Block Size(byte)

250

100

0.30%

2%

64%

5.4.4 For a write-through, write-allocate cache, what are the minimum read and write bandwidths (measured by byte per cycle) needed to achieve a CPI of 2?

5.4.5 For a write-back, write-allocate cache, assuming 30% of replaced data cache blocks are dirty, what are the minimal read and write bandwidths needed for a CPI of 2?

5.4.6 What are the minimal bandwidths needed to achieve the performance of CPI=1.5?

Short Answer

Expert verified

5.4.1

Buffers needed between the L1 and L2 cache is write buffer

Buffer needed L2 cache and memory is write buffer

5.4.2

If the result in in L2 cache the block must be brought into the L2 cache

5.4.3

The block will reside in L2 but not in L1 if L1 write misses. The block in L2 will be required to be written back to memory if a subsequent read miss on the same block, transferred to L1, and invalidated in L2.

5.4.4

The total read bandwidth requirement is =0.33bytes/cycle

The data write bandwidth requirement = 0.2 bytes/cycle.

5.4.5

The data read bandwidth is =0.23bytes/cycle

Now, the data write bandwidth =0.067bytes/cycle

5.4.6

For the write-through cache

The total read bandwidth =035byte/cycle

For the write-back cache

Data write bandwidth =0091byte/cycle

Step by step solution

01

Determine the formulae

The formula for determining the IPC

IPC=1CPI …….(1)

Cyclewillrequireadataread=percentageofdatareadsCPI …….(2)

Cyclewillrequireadatawrite=percentageofdatawriteCPI ……..(3)

02

Describe write policy and write allocation policy

Write policy explains what the cache performs when the CPU sends a write request

There is two cache write methods:

a. write-through policy

b. write-back policy

Write allocation policy

A write-allocate cache is a subclass of a write-back cache that allocates new lines in the cache for writings that miss the cache, allowing the writes to reach the cache.

03

List the possible buffers needed between L1 and L2 caches and between L2 cache and memory.

5.4.1

The write miss penalty in the L1 cache is low, whereas the write miss penalty in the L2 cache is high.The L2 cache’s write miss latency could be hidden by a write buffer between the L1 and L2 caches

When replacing a dirty block, write buffers would be beneficial for the L2 cache, this is because A new block is read into memory before the dirty block is written to memory

04

Step 4: Describe the procedure of handling an L1 write-miss and the possibility of replacing a dirty block.

5.4.2

For L1 cache, there is no need to check dirty blocks according to the given condition such as non-write allocation and write through cache.

Thus, we check directly L1 cache.

Check if the block is dirty if a miss occurrence occurs on L2 cache memory.

If block is dirty then a block must be allocated to L2 cache memory and then the evicted block is written to the main memory.

Otherwise

The dirty block set and L2 cache memory are simply updated if the L2 cache memory is hit.

05

Describe the procedure of handling an L1 write-miss for a multilevel exclusive cache.

5.4.3

The block will reside in L2 but not in L1 if L1 write misses. The block in L2 will be required to be written back to memory if a subsequent read miss on the same block, transferred to L1, and invalidated in L2.

06

determine the minimum read and write bandwidths needed to achieve a CPI of 2

5.4.4

First, you must read in the block from memory into cache then write the block to cache, for write allocate or write miss. You must write one word back to memory for write-through. Reading an instruction must be included for read bandwidth. Bandwidth refers to the bandwidth of memory

Given CPI = 2

When CPI =2

then IPC (Instruction per cycle) = 12=0.5

Cycle will require a data read=12.5%

Cycle will require a data write = 5%

Thus, the instruction bandwidth is = (0.0030×64)×0.5=0.096bytes/cycle

The data read bandwidth is =0.02×(0.13+0.050)×64=0.23bytes/cycle

The total read bandwidth requirement is =0.33bytes/cycle

The data write bandwidth requirement is 0.05×4= 0.2 bytes/cycle.

07

Determine the minimal read and write bandwidths needed for a CPI of 2

5.4.5

The instruction bandwidth and the data read bandwidth is same as in step 4

The instruction bandwidth is = (0.0030×64)×0.5=0.096bytes/cycle

The data read bandwidth is =0.02×(0.13+0.050)×64=0.23bytes/cycle

Now, the data write bandwidth =0.02×0.30×(0.13+0.050)×64

=0.067bytes/cycle

08

Determine the minimal bandwidths needed to achieve the performance of CPI=1.5

5.4.6

Given CPI is 1.5

Instruction throughput =115

=0.67 instructions per cycle

Data read frequency =02515

=0.17

The write frequency

role="math" localid="1655284298048" =0.1015=0067

The instruction bandwidth =(00030×64)×067

=035bytes/cycle

For the write-through cache

The data read bandwidth =002×(017+0067)×64=022byte/cycle

The total read bandwidth =035byte/cycle

Data write bandwidth =0067×4=027bytes/cycle

For the write-back cache

Data write bandwidth =002×(0.17+0.067)×64=0091byte/cycle

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

In this exercise, we will look at the different ways capacity affects overall performance. In general, cache access time is proportional to capacity. Assume that main memory accesses take 70 ns and that memory accesses are 36% of all instructions. The following table shows data for L1 caches attached to each of two processors, P1 and P2.

L1 Size

L1 Miss Rate

L1 Hit Time

P1

2 KiB

8.0%

0.66 ns

P2

4 KiB

6.0%

0.90 ns

(5.6.1) Assuming that the L1 hit time determines the cycle times for P1 and P2, what are their respective clock rates?

(5.6.2) What is the Average Memory Access Time for P1 and P2?

(5.6.3) Assuming a base CPI of 1.0 without any memory stalls, what is the total Cpi for P1 and P2? Which processor is faster?

For the next three problems, we will consider the addition of an L2 cache to P1 to presumably make up for its limited L1 cache capacity. Use the L1 cache capacities and hit times from the previous table when solving these problems. The L2 miss rate indicated is its local miss rate.

L2 Size

L2 Miss Rate

L2 Hit Time

1 MiB

95%

5.62 ns

(5.6.4) What is the AMAT for P1 with the addition of an L2 cache? Is the AMAT better or worse with the L2 cache?

(5.6.5) Assuming a base CPI of 1.0 without any memory stalls, what is the total CPI for P1 with the addition of an L2 cache?

(5.6.6) Which processor is faster, now that P1 has an L2 cache? If P1 is faster, what miss rate would P2 need in its L1 cache to match P1’s performance? If P2 is faster, what miss rate would P1 need in its L1 cache to match P2’s performance?

This exercise examines the impact of different cache designs, specifically comparing associative caches to the direct-mapped caches from Section 5.4. For these exercises, refer to the address stream shown in Exercise 5.2.

(5.7.1) Using the sequence of references from Exercise 5.2, show the final cache contents for a three-way set associative cache with two- word blocks and a total size of 24 words. Use LRU replacement. For each reference identify the index bits, the tag bits, the block offset bits, and if it is a hit or a miss.

(5.7.2) Using the references from Exercise 5.2, show that final cache contents for a fully associative cache with one-word blocks and a total size of 8 words. Use LRU replacement. For each reference identify the index bits, the tag bits, and if it is a hit or a miss.

(5.7.3) Using the references from Exercise 5.2, what is the miss rate for a fully associative cache with two-word blocks and a total size of 8 words, using LRU replacement? What is the miss rate using MRU (most recently used) replacement? Finally what is the best possible miss rate for this cache, given any replacement policy?

Multilevel caching is an important technique to overcome the limited amount of space that a first level cache can provide while still maintaining its speed. Consider a processor with the following parameters:

Base CPI, No Memory Stalls

Processor Speed

Main Memory Access Time

First Level Cache MissRate per Instruction

Second Level Cache, Direct-Mapped Speed

Global Miss Rate with Second Level Cache, Direct-Mapped

Second Level Cache, Eight-Way Set Associative Speed

Global Miss Rate with Second Level Cache, Eight-Way Set Associative

1.5

2 GHz

100 ns

7%

12 cycles

3.5%

28 cycles

1.5%

(5.7.4) Calculate the CPI for the processor in the table using: 1) only a first level cache, 2) a second level direct-mapped cache, and 3) a second level eight-way set associative cache. How do these numbers change if main memory access time is doubled? If it is cut in half?

(5.7.5) It is possible to have an even greater cache hierarchy than two levels. Given the processor above with a second level, direct-mapped cache, a designer wants to add a third level cache that takes 50 cycles to access and will reduce the global miss rate to 1.3%. Would this provide better performance? In general, what are the advantages and disadvantages of adding a third level cache?

(5.7.6) In older processors such as the Intel Pentium or Alpha 21264, the second level of cache was external (located on a different chip) from the main processor and the first level cache. While this allowed for large second level caches, the latency to access the cache was much higher, and the bandwidth was typically lower because the second level cache ran at a lower frequency. Assume a 512 KiB off-chip second level cache has a global miss rate of 4%. If each additional 512 KiB of cache lowered global miss rates by 0.7%, and the cache had a total access time of 50 cycles, how big would the cache have to be to match the performance of the second level direct-mapped cache listed above? Of the eight way-set associative cache?

Question: This Exercise examines the single error correcting, double error detecting (SEC/DED) Hamming code.

(5.9.1) What is the minimum number of parity bits required to protect a 128-bit word using the SEC/DED code?

(5.9.2) Section 5.5 states that modern server memory modules (DIMMs) employ SEC/DED ECC to protect each 64 bits with 8 parity bits. Compute the cost/performance ratio of this code to the code from 5.9.1. In this case, cost is the relative number of parity bits needed while performance is the relative number of errors that can be corrected. Which is better?

(5.9.3) Consider a SEC code that protects 8-bit words with 4 parity bits. If we read the value 0x375, is there an error? If so, correct the error.

Question: For a high-performance system such as a B-tree index for a database, the page size is determined mainly by the data size and disk performance. Assume that on average a B-tree index page is 70% full with fix-sized entries. The utility of a page is its B-tree depth, calculated as. The following table shows that for 16-byte entries, and a 10-year-old disk with a 10-year-old disk with a 10 ms latency and 10 MB/s transfer rate, the optimal page size is 16K.

Page Size (KiB)

Page Utility or B-Tree Depth (Number of Disk Accesses Saved)

Index Page Access Cost (ms)

Utility/Cost

2

6.49 (or)

10.2

0.64

4

7.49

10.4

0.72

8

8.49

10.8

0.79

16

9.49

11.6

0.82

32

10.49

13.2

0.79

64

11.49

16.4

0.70

128

12.49

22.8

0.55

256

13.49

35.6

0.38

(5.10.1) What is the best page size if entries now become 128 bytes?

(5.10.2) Based on 5.10.1, what is the best page size if pages are half full?

(5.10.3) Based on 5.10.2, what is the best page size if using a modern disk with a 3 ms latency and 100 MB/s transfer rate? Explain why future servers are likely to have larger pages.

Keeping “frequently used” (or “hot”) pages in DRAM can save disk accesses, but how do we determine the exact meaning of “frequently used” for a given system? Data engineers use the cost ratio between DRAM and disk access to quantify the reuse time threshold for hot pages. The cost of a disk access is \(Disk/accesses_per_sec, while the cost to keep a page in DRAM is \)DRAM_MiB/page _size. The typical DRAM and disk costs and typical database page sizes at several time points are listed below:

Year

DRAM Cost (\(/MiB)

Page Size (KiB)

Disk Cost (\)/disk)

Disk Access Rate (access/sec)

1987

5000

1

15,000

15

1997

15

8

2000

64

2007

0.05

64

80

83

(5.10.4) What are the reuse time thresholds for these three technology generations?

(5.10.5) What are the reuse time thresholds if we keep using the same 4K page size? What’s the trend here?

(5.10.6) What other factors can be changed to keep using the same page size (thus avoiding software rewrite)? Discuss their likeliness with current technology and cost trends.

This exercise examines the impact of different cache designs, specifically comparing associative caches to the direct-mapped caches from Section 5.4. For these exercises, refer to the address stream shown in Exercise 5.2.

(5.7.1) Using the sequence of references from Exercise 5.2, show the final cache contents for a three-way set associative cache with two- word blocks and a total size of 24 words. Use LRU replacement. For each reference identify the index bits, the tag bits, the block offset bits, and if it is a hit or a miss.

(5.7.2) Using the references from Exercise 5.2, show that final cache contents for a fully associative cache with one-word blocks and a total size of 8 words. Use LRU replacement. For each reference identify the index bits, the tag bits, and if it is a hit or a miss.

(5.7.3) Using the references from Exercise 5.2, what is the miss rate for a fully associative cache with two-word blocks and a total size of 8 words, using LRU replacement? What is the miss rate using MRU (most recently used) replacement? Finally what is the best possible miss rate for this cache, given any replacement policy?

Multilevel caching is an important technique to overcome the limited amount of space that a first level cache can provide while still maintaining its speed. Consider a processor with the following parameters:

Base CPI, No Memory Stalls

Processor Speed

Main Memory Access Time

First Level Cache MissRate per Instruction

Second Level Cache, Direct-Mapped Speed

Global Miss Rate with Second Level Cache, Direct-Mapped

Second Level Cache, Eight-Way Set Associative Speed

Global Miss Rate with Second Level Cache, Eight-Way Set Associative

1.5

2 GHz

100 ns

7%

12 cycles

3.5%

28 cycles

1.5%

(5.7.4) Calculate the CPI for the processor in the table using: 1) only a first level cache, 2) a second level direct-mapped cache, and 3) a second level eight-way set associative cache. How do these numbers change if main memory access time is doubled? If it is cut in half?

(5.7.5) It is possible to have an even greater cache hierarchy than two levels. Given the processor above with a second level, direct-mapped cache, a designer wants to add a third level cache that takes 50 cycles to access and will reduce the global miss rate to 1.3%. Would this provide better performance? In general, what are the advantages and disadvantages of adding a third level cache?

(5.7.6) In older processors such as the Intel Pentium or Alpha 21264, the second level of cache was external (located on a different chip) from the main processor and the first level cache. While this allowed for large second level caches, the latency to access the cache was much higher, and the bandwidth was typically lower because the second level cache ran at a lower frequency. Assume a 512 KiB off-chip second level cache has a global miss rate of 4%. If each additional 512 KiB of cache lowered global miss rates by 0.7%, and the cache had a total access time of 50 cycles, how big would the cache have to be to match the performance of the second level direct-mapped cache listed above? Of the eight way-set associative cache?

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free