Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Recall that we have two write policies and write allocation policies, and their combinations can be implemented either in L1 or L2 cache. Assume the following choices for L1 and L2 caches:

L1

L2

Write through, non-write allocate

Write back, write allocate

5.4.1 Buffers are employed between different levels of memory hierarchy to reduce access latency. For this given configuration, list the possible buffers needed between L1 and L2 caches, as well as L2 cache and memory.

5.4.2 Describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block.

5.4.3 For a multilevel exclusive cache (a block can only reside in one of the L1 and L2 caches), configuration, describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block

Consider the following program and cache behaviors.

Data Reads per 100 Instructions

Data writes per 1000 Instructions

Instruction Cache Miss Rate

Data Cache Miss Rate

Block Size(byte)

250

100

0.30%

2%

64%

5.4.4 For a write-through, write-allocate cache, what are the minimum read and write bandwidths (measured by byte per cycle) needed to achieve a CPI of 2?

5.4.5 For a write-back, write-allocate cache, assuming 30% of replaced data cache blocks are dirty, what are the minimal read and write bandwidths needed for a CPI of 2?

5.4.6 What are the minimal bandwidths needed to achieve the performance of CPI=1.5?

Short Answer

Expert verified

5.4.1

Buffers needed between the L1 and L2 cache is write buffer

Buffer needed L2 cache and memory is write buffer

5.4.2

If the result in in L2 cache the block must be brought into the L2 cache

5.4.3

The block will reside in L2 but not in L1 if L1 write misses. The block in L2 will be required to be written back to memory if a subsequent read miss on the same block, transferred to L1, and invalidated in L2.

5.4.4

The total read bandwidth requirement is = 0.33 bytes/cycle

The data write bandwidth requirement = 0.2 bytes/cycle.

5.4.5

The data read bandwidth is = 0.23 bytes/cycle

Now, the data write bandwidth = 0.067 bytes/cycle

5.4.6

For the write-through cache

The total read bandwidth = 0.35 byte/cycle

For the write-back cache

Data write bandwidth = 0.091 byte/cycle

Step by step solution

01

Determine the formulae

The formula for determining the IPC

IPC=ICPI.......(1)Cyclewillrequireadataread=percentageofdatareadsCPI.......(2)Cyclewillrequireadatawrite=percentageofdatawriteCPI........(3)

02

Describe write policy and write allocation policy

Write policy explains what the cache performs when the CPU sends a write request

There is two cache write methods:

a. write-through policy

b. write-back policy

Write allocation policy

A write-allocate cache is a subclass of a write-back cache that allocates new lines in the cache for writings that miss the cache, allowing the writes to reach the cache.

03

List the possible buffers needed between L1 and L2 caches and between L2 cache and memory.

5.4.1

The write miss penalty in the L1 cache is low, whereas the write miss penalty in the L2 cache is high.The L2 cache’s write miss latency could be hidden by a write buffer between the L1 and L2 caches

When replacing a dirty block, write buffers would be beneficial for the L2 cache, this is because A new block is read into memory before the dirty block is written to memory

04

Describe the procedure of handling an L1 write-miss and the possibility of replacing a dirty block.

5.4.2

For L1 cache, there is no need to check dirty blocks according to the given condition such as non-write allocation and write through cache.

Thus, we check directly L1 cache.

Check if the block is dirty if a miss occurrence occurs on L2 cache memory.

If block is dirty then a block must be allocated to L2 cache memory and then the evicted block is written to the main memory.

Otherwise

The dirty block set and L2 cache memory are simply updated if the L2 cache memory is hit.

05

Describe the procedure of handling an L1 write-miss for a multilevel exclusive cache.

5.4.3

The block will reside in L2 but not in L1 if L1 write misses. The block in L2 will be required to be written back to memory if a subsequent read miss on the same block, transferred to L1, and invalidated in L2.

06

Determine the minimum read and write bandwidths needed to achieve a CPI of 2

5.4.4

First, you must read in the block from memory into cache then write the block to cache, for write allocate or write miss. You must write one word back to memory for write-through. Reading an instruction must be included for read bandwidth. Bandwidth refers to the bandwidth of memory

Given CPI = 2

When CPI =2

then IPC (Instruction per cycle) = 12=0.5

Cycle will require a data read= 12.5%

Cycle will require a data write = 5%

Thus, the instruction bandwidth is = (0.0030×64)×0.5=0.096bytes/cycle

The data read bandwidth is =0.02×(0.13+0.050)×64=0.23bytes/cycle

The total read bandwidth requirement is = 0.33 bytes/cycle

The data write bandwidth requirement is 0.05×4= 0.2 bytes/cycle.

07

Determine the minimal read and write bandwidths needed for a CPI of 2

5.4.5

The instruction bandwidth and the data read bandwidth is same as in step 4

The instruction bandwidth is =0.0030×64×0.5=0.096bytes/cycle

The data read bandwidth is =0.02×(0.13+0.050)×64=0.23bytes/cycle

Now, the data write bandwidth =0.02×0.30×(0.13+0.050)×64=0.067bytes/cycle

08

Determine the minimal bandwidths needed to achieve the performance of CPI=1.5

5.4.6

Given CPI is 1.5

Instruction throughput = 11.5

=0.67 instructions per cycle

Data read frequency =0.251.5=0.17

The write frequency =0.101.5=0.067

The instruction bandwidth0.0030×64×0.67=0.35bytes/cycle

For the write-through cache

The data read bandwidth =0.02×0.17+0.067×64=0.22byte/cycle

The total read bandwidth =0.35byte/cycle

Data write bandwidth=0.067×4=0.27bytes/cycle

For the write-back cache

Data write bandwidth =0.02×(0.17+0.067)×64=0.091byte/cycle

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

For a direct-mapped cache design with a 32-bit address, the following bits of the address are used to access the cache.

Tag

Index

offset

31-10

9-5

4-0

5.3.1 What is the cache block size (in words)?

5.3.2 How many entries does the cache have?

5.3.3 What is the ratio between total bits required for such a cache implementation over the data storage bits?

Starting from power on, the following byte-addressed cache references are recorded.

Address
041613223216010243014031001802180

5.3.4 How many blocks are replaced?

5.3.5 What is the hit ratio?

5.3.6 List the final state of the cache, with each valid entry represented as a record of <index, tag, data>

In this exercise we show the definition of a web server log and examine code optimizations to improve log processing speed. The data structure for the log is defined as follows:

struct entry {

int srcIP; // remote IP address

char URL[128]; // request URL (e.g., “GET index.html”)

long long refTime; // reference time

int status; // connection status

char browser[64]; // client browser name

} log [NUM_ENTRIES];

Assume the following processing function for the log:

topK_sourceIP (int hour);

5.19.1 Which fields in a log entry will be accessed for the given log processing function? Assuming 64-byte cache blocks and no prefetching, how many caches misses per entry does the given function incur on average?

5.19.2 How can you reorganize the data structure to improve cache utilization and access locality? Show your structure definition code.

5.19.3 Give an example of another log processing function that would prefer a different data structure layout. If both functions are important, how would you rewrite the program to improve the overall performance? Supplement the discussion with code snippets and data.

For the problems below, use data from “Cache performance for SPEC CPU2000 Benchmarks”(http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/) for the pairs of benchmarks shown in the following table.

a.

Mesa/gcc

b.

mcf/swim

5.19.4 For 64KiB data caches with varying set associativities, what are the miss rates broken down by miss types (cold, capacity, and conflict misses) for each benchmark?

5.19.5 Select the set associativity to be used by a 64KiB L1 data cache shared by both benchmarks. If the L1 cache has to be directly mapped, select the set associativity for the 1 MiB cache.

5.19.6 Give an example in the miss rate table where higher set associativity increases the miss rate. Construct a cache configuration and reference stream to demonstrate this.

This exercise examines the impact of different cache designs, specifically comparing associative caches to the direct-mapped caches from Section 5.4. For these exercises, refer to the address stream shown in Exercise 5.2.

(5.7.1) Using the sequence of references from Exercise 5.2, show the final cache contents for a three-way set associative cache with two- word blocks and a total size of 24 words. Use LRU replacement. For each reference identify the index bits, the tag bits, the block offset bits, and if it is a hit or a miss.

(5.7.2) Using the references from Exercise 5.2, show that final cache contents for a fully associative cache with one-word blocks and a total size of 8 words. Use LRU replacement. For each reference identify the index bits, the tag bits, and if it is a hit or a miss.

(5.7.3) Using the references from Exercise 5.2, what is the miss rate for a fully associative cache with two-word blocks and a total size of 8 words, using LRU replacement? What is the miss rate using MRU (most recently used) replacement? Finally what is the best possible miss rate for this cache, given any replacement policy?

Multilevel caching is an important technique to overcome the limited amount of space that a first level cache can provide while still maintaining its speed. Consider a processor with the following parameters:

Base CPI, No Memory Stalls

Processor Speed

Main Memory Access Time

First Level Cache MissRate per Instruction

Second Level Cache, Direct-Mapped Speed

Global Miss Rate with Second Level Cache, Direct-Mapped

Second Level Cache, Eight-Way Set Associative Speed

Global Miss Rate with Second Level Cache, Eight-Way Set Associative

1.5

2 GHz

100 ns

7%

12 cycles

3.5%

28 cycles

1.5%

(5.7.4) Calculate the CPI for the processor in the table using: 1) only a first level cache, 2) a second level direct-mapped cache, and 3) a second level eight-way set associative cache. How do these numbers change if main memory access time is doubled? If it is cut in half?

(5.7.5) It is possible to have an even greater cache hierarchy than two levels. Given the processor above with a second level, direct-mapped cache, a designer wants to add a third level cache that takes 50 cycles to access and will reduce the global miss rate to 1.3%. Would this provide better performance? In general, what are the advantages and disadvantages of adding a third level cache?

(5.7.6) In older processors such as the Intel Pentium or Alpha 21264, the second level of cache was external (located on a different chip) from the main processor and the first level cache. While this allowed for large second level caches, the latency to access the cache was much higher, and the bandwidth was typically lower because the second level cache ran at a lower frequency. Assume a 512 KiB off-chip second level cache has a global miss rate of 4%. If each additional 512 KiB of cache lowered global miss rates by 0.7%, and the cache had a total access time of 50 cycles, how big would the cache have to be to match the performance of the second level direct-mapped cache listed above? Of the eight way-set associative cache?

Recall that we have two write policies and write allocation policies, and their combinations can be implemented either in L1 or L2 cache. Assume the following choices for L1 and L2 caches:

L1

L2

Write through, non-write allocate

Write back, write allocate

5.4.1 Buffers are employed between different levels of memory hierarchy to reduce access latency. For this given configuration, list the possible buffers needed between L1 and L2 caches, as well as L2 cache and memory.

5.4.2 Describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block.

5.4.3 For a multilevel exclusive cache (a block can only reside in one of the L1 and L2 caches), configuration, describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block

Consider the following program and cache behaviors.

Data Reads per 100 Instructions

Data writes per 1000 Instructions

Instruction Cache Miss Rate

Data Cache Miss Rate

Block Size(byte)

250

100

0.30%

2%

64%

5.4.4 For a write-through, write-allocate cache, what are the minimum read and write bandwidths (measured by byte per cycle) needed to achieve a CPI of 2?

5.4.5 For a write-back, write-allocate cache, assuming 30% of replaced data cache blocks are dirty, what are the minimal read and write bandwidths needed for a CPI of 2?

5.4.6 What are the minimal bandwidths needed to achieve the performance of CPI=1.5?

In this exercise, we will explore the control unit for a cache controller for a processor with a write buffer. Use the finite state machine found in Figure 5.40 as a starting point for designing your finite state machines. Assume that the cache controller is for the simple direct-mapped cache described on page 465 (Figure 5.40 in Section 5.9), but you will add a write buffer with a capacity of one block.

Recall that the purpose of a write buffer is to serve as temporary storage so that the processor doesn’t have to wait for two memory accesses on a dirty miss. Rather than writing back the dirty block before reading the new block, it buffers the dirty block and immediately begins reading the new block. The dirty block can then be written to the main memory while the processor is working.

5.16.1 What should happen if the processor issues a request that hits in the cache while a block is being written back to main memory from the write buffer?

5.16.2 What should happen if the processor issues a request that misses in the cache while a block is being written back to main memory from the write buffer?

5.16.3 Design a finite state machine to enable the use of a write buffer.

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free