Chapter 5: Q5E (page 486)

Media applications that play audio or video files are part of a class of workloads called “streaming” workloads; i.e., they bring in large amounts of data but do not reuse much of it. Consider a video streaming workload that accesses a 512 KiB working set sequentially with the following address stream:
0, 2, 4, 6, 8, 10, 12, 14, 16, …
5.5.1 Assume a 64 KiB direct-mapped cache with a 32-byte block. What is the miss rate for the address stream above? How is this miss rate sensitive to the size of the cache or the working set? How would you categorize the misses this workload is experiencing, based on the 3C model?
5.5.2 Re-compute the miss rate when the cache block size is 16 bytes, 64 bytes, and 128 bytes. What kind of locality is this workload exploiting?
5.5.3 “Prefetching” is a technique that leverages predictable address patterns to speculatively bring in additional cache blocks when a particular cache block is accessed. One example of prefetching is a stream buffer that prefetches sequentially adjacent cache blocks into a separate buffer when a particular cache block is brought in. If the data is found in the prefetch buffer, it is considered as a hit and moved into the cache and the next cache block is prefetched. Assume a two-entry stream buffer and assume that the cache latency is such that a cache block can be loaded before the computation on the previous cache block is completed. What is the miss rate for the address stream above?
Cache block size (B) can affect both miss rate and miss latency. Assuming a 1-CPI machine with an average of 1.35 references (both instruction and data) per instruction, help find the optimal block size given the following miss rates for various block sizes.
$8 : 4 %$
$16 : 3 %$
$32 : 2 %$
$64 : 1.5 %$
$128 : 1 %$
5.5.4 What is the optimal block size for a miss latency of 20×B cycles?
5.5.5 What is the optimal block size for a miss latency of 24+B cycles?
5.5.6 For constant miss latency, what is the optimal block size

Short Answer

Expert verified

5.5.1

Miss rate $= \frac{1}{16}$

5.5.2

if the cache block size is 16 byte

the miss rate for the given address stream is $\frac{1}{m i s s}$ $= \frac{1}{8}$

if the cache block size is 64 byte

The miss rate for the given address stream is $\frac{1}{m i s s}$ $= \frac{1}{32}$

If the cache block size is 128 byte

The miss rate for the given address stream is $\frac{1}{m i s s}$ $= \frac{1}{64}$

andthe workload is exploiting spatial locality

5.5.3

The miss rate is $0 \cdot 00038 % = 0 %$

5.5.4

Brole="math" localid="1655285883526" $= 8$ is the optimal block size

5.5.5

B role="math" localid="1655285902426" $= 32$ is the optimal block size

5.5.6

B = 128 is optimal

Step by step solution

Determine the formulae

Write the formula for calculating the size of the block

$S i z e o f t h e b l o c k$ role="math" localid="1655286115236" $= \frac{v i d e o s t r e a m i n g w o r k l o a d a c c e s s e s d a t a}{C a c h e l i n e s i z e}$ …….(1)

Write the formula for calculating the miss rate for the address stream

$m i s s r a t e f o r t h e a d d r e s s s t r e a m = \frac{S i z e o f t h e d i r e c t - m a p p e d c a c h e}{S i z e o f t h e b l o c k}$ ……..(2)

Write the formula for calculating miss rate

role="math" localid="1655286304946" $M i s s r a t e = \frac{n u m b e r o f m i s s e s}{t o t a l a c c e s s e s}$ ……..(3)

Determine the miss rate for the given address stream

5.5.1

Video streaming workload that accesses data = 512 KiB

Given address stream = $0, 2, 4, 6, 8, 10, 12, 14, 16.....$

Difference between the stream = 2

Size of direct-mapped cache = 64 KiB

Cache line size =32-byte

First, we calculate the size of the block

$\begin{matrix} size of the block = \frac{video streaming workload accesses data}{Cache line size} \\ = \frac{512 KiB}{32 bytes} \\ = 16 K \end{matrix}$

Now we calculate the miss rate for the address stream

$\begin{matrix} Miss rate for the address stream = \frac{Size of the direct - mapped cache}{Size of the block} \\ = \frac{64 KiB}{16 K} \\ = 4 B \\ = 4 \times 8 bits \\ = 32 bits \end{matrix}$

Difference between the stream 2 so the miss occurs at the $\begin{array}{l} = \frac{32}{2} \\ = 16^{t h} a c c e s s \end{array}$

Therefore, the miss will occur for every $16^{th}$ access of the address stream

So, the miss rate for the given address stream is $\frac{1}{m i s s}$ $= \frac{1}{16}$

Miss rate is not sensitive because miss rate is not dependent on the size of the cache or working set.

categorize the misses

The miss will occur for every $16^{th}$ access of the address stream so in every 16 addresses of the stream there are 15 hits and 1 miss.

Therefore, the number of misses is predictable for a given workload.

Based on the 3C model the misses in this workload is experiencing cold-stat misses or compulsory misses

Re-compute the miss rate if the cache block size is 16 byte

Given data

Video streaming workload that accesses data = 512 KiB

Given address stream = $0, 2, 4, 6, 8, 10, 12, 14, 16.....$

Difference between the stream = 2

Size of direct-mapped cache = 64 KiB

Case 1:

If cache line size = 16 byte

$\begin{matrix} size of the block = \frac{video streaming workload accesses data}{Cache line size} \\ = \frac{512 KiB}{16 bytes} \\ = 32 K \end{matrix}$

$\begin{matrix} Miss rate for the address stream = \frac{Size of the direct - mapped cache}{Size of the block} \\ = \frac{64 KiB}{32 K} \\ = 2 B \\ = 2 \times 8 bits \\ = 16 bits \end{matrix}$

Difference between the stream 2 so the miss occurs at the $\begin{array}{l} = \frac{16}{2} \\ = 8^{t h} a c c e s s \end{array}$

Therefore, the miss will occur for every $8^{t h}$ access of the address stream

So, the miss rate for the given address stream is $\frac{1}{m i s s}$ $= \frac{1}{8}$

Re-compute the miss rate if the cache block size is 64 byte

If cache line size = 64 byte

$\begin{matrix} size of the block = \frac{video streaming workload accesses data}{Cache line size} \\ = \frac{512 KiB}{64 bytes} \\ = 8 K \end{matrix}$

$\begin{matrix} Miss rate for the address stream = \frac{Size of the direct - mapped cache}{Size of the block} \\ = \frac{64 KiB}{8 K} \\ = 8 B \\ = 8 \times 8 bits \\ = 64 bits \end{matrix}$

Difference between the stream 2 so the miss occurs at the $\begin{array}{l} = \frac{64}{2} \\ = 32^{t h} a c c e s s \end{array}$

Therefore, the miss will occur for every $32^{t h}$ access of the address stream

So, the miss rate for the given address stream is $\frac{1}{m i s s}$ $= \frac{1}{32}$

Re-compute the miss rate if the cache block size is 128 byte

If cache line size = 128 byte

$\begin{matrix} size of the block = \frac{video streaming workload accesses data}{Cache line size} \\ = \frac{512 KiB}{128 bytes} \\ = 4 K \end{matrix}$

$\begin{matrix} Miss rate for the address stream = \frac{Size of the direct - mapped cache}{Size of the block} \\ = \frac{64 KiB}{4 K} \\ = 16 B \\ = 16 \times 8 bits \\ = 128 bits \end{matrix}$

Difference between the stream 2 so the miss occurs at the $\begin{array}{l} = \frac{128}{2} \\ = 64^{t h} a c c e s s \end{array}$

Therefore, the miss will occur for every $32^{t h}$ access of the address stream

So, the miss rate for the given address stream is $\frac{1}{m i s s}$ $= \frac{1}{64}$

kind of locality is this workload

Spatial memory locality is considered to occur if a location has been referenced recently and there is a chance of it being referenced again in the future and here Every access has a nearby location.

So, the workload is exploiting spatial locality

Using Prefetching to reduce the miss rate to zero

5.5.3

Given data

Video streaming workload that accesses data = 512 KiB

Given address stream = $0, 2, 4, 6, 8, 10, 12, 14, 16.....$

Difference between the stream = 2

Size of direct-mapped cache = 64 KiB

Due to the two-entry stream buffer and the cache latency, a cache block can be loaded before the previous cache block is finished being computed.

Prefetching is used to predict future accesses to the cache memory as the current block is being executed, the new predicted block is preloaded into the cache memory, thereby reducing the miss rate to zero

Number of misses = 1

Total accesses

Now, calculate the miss rate $= \frac{512 K B}{2}$

$\begin{matrix} = \frac{512 \times 1024}{2} \\ = 26144 \end{matrix}$

Now, calculate the miss rate

$\begin{matrix} Miss rate = \frac{number of misses}{total accesses} \\ = \frac{1}{262144} \\ = 0 \cdot 00000381469 \end{matrix}$

Therefore, the miss rate is $0 \cdot 00038 % = 0 %$

Determine the optimal block size for a miss latency of 20×B cycles

5.5.4

To determine the optimal block size for a miss latency we calculate AMAT (Average memory access time) for B

AMAT for B $= 8 : 0.040 \times (20 \times 8)$

$= 6 \cdot 40$

AMAT for B $= 16 : 0.030 \times (20 \times 16)$

$= 9 \cdot 60$

AMAT for B $= 32 : 0.020 \times (20 \times 32)$

$= 12 \cdot 80$

AMAT for B $= 64 : 0.015 \times (20 \times 64)$

$= 19 \cdot 20$

AMAT for B $= 128 : 0.010 \times (20 \times 128)$

$= 25 \cdot 60$

B $= 8$ is the optimal block size for a miss latency of 20×B cycles

Determine the optimal block size for a miss latency of 24+B cycles

5.5.5

Again, we calculate AMAT (Average memory access time) to determine the optimal block size for miss latency of 24+B cycles

AMAT for B $= 8 : 0.040 \times (24 + 8)$

$= 1 \cdot 28$

AMAT for B $= 16 : 0.030 \times (24 + 16)$

$= 1 \cdot 20$

AMAT for B $= 32 : 0.020 \times (24 + 32)$

$= 1 \cdot 12$

AMAT for B $= 64 : 0.015 \times (24 + 64)$

$= 1 \cdot 32$

AMAT for B $= 128 : 0.010 \times (24 + 128)$

$= 1 \cdot 52$

$B = 32$ is the optimal block size for a miss latency of 24×B cycles .

Determine the optimal block size for constant miss latency

B = 128 is optimal

Because it Is minimizing the miss rate minimizes the total miss latency.

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

TLB Misses per 1000 instructions	NPT TLB Miss Latency	Page Faults per 1000 instructions	Shadowing Page Fault Overhead
0.2	200 cycles	0.001	30,000 cycles

	L1 Size	L1 Miss Rate	L1 Hit Time
P1	2 KiB	8.0%	0.66 ns
P2	4 KiB	6.0%	0.90 ns

L2 Size	L2 Miss Rate	L2 Hit Time
1 MiB	95%	5.62 ns

a.	Mesa/gcc
b.	mcf/swim

Page Size (KiB)	Page Utility or B-Tree Depth (Number of Disk Accesses Saved)	Index Page Access Cost (ms)	Utility/Cost
2	6.49 (or)	10.2	0.64
4	7.49	10.4	0.72
8	8.49	10.8	0.79
16	9.49	11.6	0.82
32	10.49	13.2	0.79
64	11.49	16.4	0.70
128	12.49	22.8	0.55
256	13.49	35.6	0.38

Short Answer

Step by step solution

Determine the formulae

Determine the miss rate for the given address stream

categorize the misses

Re-compute the miss rate if the cache block size is 16 byte

Re-compute the miss rate if the cache block size is 64 byte

Re-compute the miss rate if the cache block size is 128 byte

kind of locality is this workload

Using Prefetching to reduce the miss rate to zero

Determine the optimal block size for a miss latency of 20×B cycles

Determine the optimal block size for a miss latency of 24+B cycles

Determine the optimal block size for constant miss latency

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Computer Science Textbooks

Issues in Computer Science

Databases

Theory of Computation

Functional Programming

Game Design in Computer Science

Computer Network

Study anywhere. Anytime. Across all devices.

Company

Product

Help

Year	DRAM Cost (\(/MiB)	Page Size (KiB)	Disk Cost (\)/disk)	Disk Access Rate (access/sec)
1987	5000	1	15,000	15
1997	15	8	2000	64
2007	0.05	64	80	83

Base CPI, No Memory Stalls	Processor Speed	Main Memory Access Time	First Level Cache MissRate per Instruction	Second Level Cache, Direct-Mapped Speed	Global Miss Rate with Second Level Cache, Direct-Mapped	Second Level Cache, Eight-Way Set Associative Speed	Global Miss Rate with Second Level Cache, Eight-Way Set Associative
1.5	2 GHz	100 ns	7%	12 cycles	3.5%	28 cycles	1.5%