Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

In this exercise, we will look at the different ways capacity affects overall performance. In general, cache access time is proportional to capacity. Assume that main memory accesses take 70 ns and that memory accesses are 36% of all instructions. The following table shows data for L1 caches attached to each of two processors, P1 and P2.

L1 Size

L1 Miss Rate

L1 Hit Time

P1

2 KiB

8.0%

0.66 ns

P2

4 KiB

6.0%

0.90 ns

(5.6.1) Assuming that the L1 hit time determines the cycle times for P1 and P2, what are their respective clock rates?

(5.6.2) What is the Average Memory Access Time for P1 and P2?

(5.6.3) Assuming a base CPI of 1.0 without any memory stalls, what is the total Cpi for P1 and P2? Which processor is faster?

For the next three problems, we will consider the addition of an L2 cache to P1 to presumably make up for its limited L1 cache capacity. Use the L1 cache capacities and hit times from the previous table when solving these problems. The L2 miss rate indicated is its local miss rate.

L2 Size

L2 Miss Rate

L2 Hit Time

1 MiB

95%

5.62 ns

(5.6.4) What is the AMAT for P1 with the addition of an L2 cache? Is the AMAT better or worse with the L2 cache?

(5.6.5) Assuming a base CPI of 1.0 without any memory stalls, what is the total CPI for P1 with the addition of an L2 cache?

(5.6.6) Which processor is faster, now that P1 has an L2 cache? If P1 is faster, what miss rate would P2 need in its L1 cache to match P1’s performance? If P2 is faster, what miss rate would P1 need in its L1 cache to match P2’s performance?

Short Answer

Expert verified

(5.6.1)

The clock rate for P1 is 1.515 GHz and P2 is 1.111 GHz.

(5.6.2)

The average memory access time for P1 is 6.26 and for P2 is 5.1.

(5.6.3)

The total CPI for P1 is 12.53 and for P2 is 7.33.

(5.6.4)

The average access time for processor P1 with both L1 and L2 cache is 6.42. The average memory access time is worse than processor P1 with only one cache.

(5.6.5)

The value of CPI is 8.317.

(5.6.6)

The processor P2 is fast. The miss rate is 0.0635

Step by step solution

01

Formula for calculating clock rate

(5.6.1)

The clock rate of the processor is the number of cycles per second it can execute. If the hit time determines the clock rate of the processor, the following formula gives the clock rate:

02

Calculation of clock rate for processors P1 and P2

For processor P1, the hit time is 0.66. Thus, the clock rate is:

For processor P2, the hit time is 0.90. Thus, the clock rate is:

03

Formula to calculate average memory access time

(5.6.2)

Average memory access time (AMAT) is a metric used to measure the performance of the memory system. The formula to calculate average memory access time is:


04

Calculation of average memory access time for processors P1 and P2

The average memory access time for P1 is calculated below:

The average memory access time for P2 is calculated below:

05

Formula to calculate CPI

(5.6.3)

The CPI is the number of cycles per instruction. The formula to calculate total CPI is given as

06

Calculation of CPI for processors P1 and P2

For processor P1, the total CPI is:

For processor P2, the total CPI is:

07

Calculation of the average access time for a processor with both L1 and L2 cache

(5.6.4)

The average access time for a processor with both L1 and L2 cache is given as:

Using the values given in the question, the average access time for processor P1 is:

The average memory access time of processor P1 with L1 and L2 cache is more than the time with lone L1 cache. So, the average memory access time is worse.

08

Calculation of CPI for processor P1 with L1 and L2 cache

(5.6.5)

To calculate the CPI with two L1 and L2 caches, the following formula can be used:

AMAT is calculated in the last step. Using that value, CPI can be calculated as:

09

Calculation of miss rate for a slower processor to match the performance of a faster processor

(5.6.6)

The average access for processor P1 is 6.42. The average access for processor P2 is 5.11. The processor P2 is fast. For P1 to match P2’s performance

5.11 = 0.66 + miss rate * 70

Miss rate = 0.0635

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Question: As described in Section 5.7, virtual memory uses a page table to track the mapping of virtual addresses to physical addresses. This exercise shoes how this table must be updated as addresses are accessed. The following data constitutes a stream of virtual addresses as seen on a system. Assume 4 KiB pages, a 4-entry fully associative TLB, and true LRU replacement. If pages must be brought in from disk, increment the next largest page number.

4669, 2227, 13916, 34587, 48870, 12608, 49225

TLB

Valid

Tag

Physical Page Number

1

11

12

1

7

4

1

3

6

0

4

9

Page table

Valid

Physical Page or in Disk

1

5

0

Disk

0

Disk

1

6

1

9

1

11

0

Disk

1

4

0

Disk

0

Disk

1

3

1

12

(5.11.1) Given the address stream shown, and the initial TLB and page table states provided above, show the final state of the system. Also list for each reference if it is a hit in the TLB, a hit in the page table, or a page fault.

(5.11.2) Repeat 5.11.1, but this time use 16 KiB pages instead of 4 KiB pages. What would be some of the advantages of having a larger page size? What are some of the disadvantages?

(5.11.3) Show the final contents of the TLB if it is 2-way set associative. Also show the contents of the TLB if it is direct mapped. Discuss the importance of having a TLB to high performance. How would virtual memory accesses be handles if there were no TLB?

There are several parameters that impact the overall size of the page table. Listed below are key page parameters.

Virtual Address Size

Page Size

Page Table Entry Size

32 bits

8 KiB

4 bytes

(5.11.4) Given the parameters shown above, calculate the total page table size for a system running 5 applications that utilize half of the memory available.

(5.11.5) Given the parameters shown above, calculate the total page table size for a system running 5 applications that utilize half of the memory available, given a two level page table approach with 256 entries. Assume each entry of the main page table is 6 bytes. Calculate the minimum amount of memory required.

(5.11.6) A cache designer wants to increase the size of a 4 KiB virtually indexed, physically tagged cache. Given the page size shown above, is it possible to make a 16 KiB direct-mapped cache, assuming 2 words per block? How would the designer increase the data size of the cache?

In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within the same rwo are stored contiguously. Assume each word is a 32-bit integer.

for(I=0;I<8;I++)

for(J=0;J<8000;J++)

A[I][J]=B[I][0]+A[J][I];

5.1.1 [5] How many 32-bit integers can be stored in a 16-byte cache block?

5.1.2 [5] References to which variables exhibit temporal locality?

5.1.3 [5] References to which variables exhibit spatial locality?

Locality is affected by both the reference order and data layout. The same computation can also be written below in Matlab, which differs from C by storing matrix elements within the same column contiguously in memory.

for I=1:8

for J=1:8000

A(I,J)=B(I,0)+A(J,I);

end

end

5.1.4. [10] How many 16-byte cache blocks are needed to store all 32-bit matrix elements being referenced?

5.1.5 [5] References to which variables exhibit temporal locality?

5.1.6 [5] References to which variables exhibit spatial locality?

For a direct-mapped cache design with a 32-bit address, the following bits of the address are used to access the cache.

Tag

Index

offset

31-10

9-5

4-0

5.3.1 What is the cache block size (in words)?

5.3.2 How many entries does the cache have?

5.3.3 What is the ratio between total bits required for such a cache implementation over the data storage bits?

Starting from power on, the following byte-addressed cache references are recorded.

Address
041613223216010243014031001802180

5.3.4 How many blocks are replaced?

5.3.5 What is the hit ratio?

5.3.6 List the final state of the cache, with each valid entry represented as a record of <index, tag, data>

Question: This Exercise examines the single error correcting, double error detecting (SEC/DED) Hamming code.

(5.9.1) What is the minimum number of parity bits required to protect a 128-bit word using the SEC/DED code?

(5.9.2) Section 5.5 states that modern server memory modules (DIMMs) employ SEC/DED ECC to protect each 64 bits with 8 parity bits. Compute the cost/performance ratio of this code to the code from 5.9.1. In this case, cost is the relative number of parity bits needed while performance is the relative number of errors that can be corrected. Which is better?

(5.9.3) Consider a SEC code that protects 8-bit words with 4 parity bits. If we read the value 0x375, is there an error? If so, correct the error.

Recall that we have two write policies and write allocation policies, and their combinations can be implemented either in L1 or L2 cache. Assume the following choices for L1 and L2 caches:

L1

L2

Write through, non-write allocate

Write back, write allocate

5.4.1 Buffers are employed between different levels of memory hierarchy to reduce access latency. For this given configuration, list the possible buffers needed between L1 and L2 caches, as well as L2 cache and memory.

5.4.2 Describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block.

5.4.3 For a multilevel exclusive cache (a block can only reside in one of the L1 and L2 caches), configuration, describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block

Consider the following program and cache behaviors.

Data Reads per 100 Instructions

Data writes per 1000 Instructions

Instruction Cache Miss Rate

Data Cache Miss Rate

Block Size(byte)

250

100

0.30%

2%

64%

5.4.4 For a write-through, write-allocate cache, what are the minimum read and write bandwidths (measured by byte per cycle) needed to achieve a CPI of 2?

5.4.5 For a write-back, write-allocate cache, assuming 30% of replaced data cache blocks are dirty, what are the minimal read and write bandwidths needed for a CPI of 2?

5.4.6 What are the minimal bandwidths needed to achieve the performance of CPI=1.5?

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free