Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

To support multiple virtual machines, two levels of memory virtualization are needed. Each virtual machine still controls the mapping of virtual address (VA) to physical address (PA), while the hypervisor maps the physical address (PA) of each virtual machine to the actual machine address (MA). To accelerate such mappings, a software approach called “shadow paging” duplicates each virtual machine’s page tables in the hypervisor, and intercepts VA to PA mapping changes to keep both copies consistent. To remove the complexity of shadow page tables, a hardware approach called nested page table (NPT) explicitly supports two classes of page tables (VAPA and PAMA) and can walk such tables purely in hardware.

Consider the following sequence of operations: (1) Create process; (2) TLB miss; (3) page fault; (4) context switch;

(5.14.1) What would happen for the given operation sequence for shadow page table and nested page table, respectively?

(5.14.2) Assuming an x86-based 4-level page table in both guest and nested page table, how many memory references are needed to service a TLB miss for native vs. nested page table?

(5.14.3) Among TLB miss rate, TLB miss latency, page fault rate, and page fault latency, which metrics are more important for shadow page table? Which are important for nested page table?

Assume the following parameters for a shadow paging system

TLB Misses per 1000 instructions

NPT TLB Miss Latency

Page Faults per 1000 instructions

Shadowing Page Fault Overhead

0.2

200 cycles

0.001

30,000 cycles

(5.14.4) For a benchmark with native execution CPI of 1, what are the CPI numbers if using shadow page tables vs. NPT (assuming only page table virtualization overhead)?

(5.14.5) What techniques can be used to reduce page table shadowing induced overhead?

(5.14.6) What techniques can be used to reduce NPT induced overhead?

Short Answer

Expert verified

(5.14.1)

In shadow paging, on the creation of a new process, the hypervisor updates the shadow paging table. On TLB miss, there is no change in a page table. On page fault, mapping is updated. And on context switch, mapping is invalidated.

In a nested page table, creating a new process creates two tables for the translation of addresses. On TLB miss, both tables need to be accessed. On page fault, both tables are updated and on context switch, the page tables are invalidated.

(5.14.2)

Four memory references are needed for the native page table and 24 are needed for the nested page table.

(5.14.3)

Page fault rate is an important metric for shadow paging and TLB miss rate is an important metric for the nested page table.

(5.14.4)

CPI number for shadow paging is 1.03 and for nested page table is 1.04.

(5.14.5)

Page table shadowing induced overhead is reduced by combining the updates of multiple page tables.

(5.14.6)

NPT induced overhead can be reduced by using agile paging.

Step by step solution

01

Effect of creating process, TLB miss, page fault, and context switch on Shadow paging

(5.14.1)

The process running on a virtual machine has a guest virtual address space, guest physical address space, and host physical address space. So, every guest virtual address is first translated to a guest physical address. The guest physical address is converted to the host physical address. In shadow paging, the table contains the mapping from the guest virtual address to the host physical address.

When a new process is created, the Virtual machine creates a page table, and the hypervisor updates the table to shadow paging. The TLB does not affect the shadow paging. On page fault, the new mapping is introduced in shadow paging and the old mapping is invalidated. On a context switch, the virtual machine notifies the hypervisor to invalidate the mapping of the process.

02

Effect of creating process, TLB miss, page fault, and context switch on Nested Page table (NPT)

In the Nested page table, two tables are used. On to convert virtual address to physical address and converting physical address to machine address. When a new process is created, the virtual machine creates a page table and the hypervisor adds new mapping to convert the physical address to the machine address. On TLB miss, both page tables are accessed. One to convert virtual address to physical address and the other to convert physical address to machine address. On page fault, both the virtual machine and hypervisor have to update the page table. On a context switch, the virtual machine notifies the hypervisor to invalidate the mapping of the process.

03

Formula to calculate memory references on TLB miss for native and nested page table

(5.14.2)

For a guest page table, a 4-level page table requires a memory reference equal to the number of levels. But for the nested page table, a 4-level page table requires the following number of memory references:

Where L is a number of levels.

04

Calculate the number of memory references on TLB miss

The number of memory references on TLB miss for the guest page table is 4.

The number of memory references on TLB miss for nested page table is:

05

Metrics for shadow page table and nested page table

(5.14.3)

In shadow paging, on the occurrence of a page fault, new mapping has to be introduced between virtual and physical addresses. So, for shadow paging, the page fault rate is an important metric. In the nested page table, on TLB miss, two tables are accessed. One to convert virtual address to physical address and the other to convert physical address to machine address. So, the TLB miss rate is an important metric for the nested page table.

06

Formula to calculate CPI numbers

(5.14.4)

The CPI numbers are calculated using the following formula:

07

Calculation of CPI numbers for shadow paging and Nested Page Table

The CPI number for shadow paging is calculated below:


CPI number for Nested Page Table is calculated below:

08

Technique to reduce the shadowing page table overhead

(5.14.5)

The overhead of shadow paging is reduced using the nested page table. It reduces the complexity of shadow paging. The nested page table technique is better than shadow paging. This overhead can also be reduced by combining the updates of multiple page tables.

09

Technique to reduce the Nested Page table induced overhead

(5.14.6)

One way to reduce the overhead in the Nested page table is to use caching. Similar to TLB caching, caching is done for nested page tables and overhead can be reduced.

Another way is to use agile paging. It comprises the benefits of both shadow paging and nested page table. It enables handling of the TLB misses as fast as native and changes in the page table are done without much intervention from the virtual machine.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Chip multiprocessors (CMPs) have multiple cores and their caches on a single chip. CMP on-chip L2 cache design has interesting trade-offs. The following table shows the miss rates and hit latencies for benchmarks with private vs shared L2 cache designs. Assume L1 cache misses once every 32 instructions.

Private

Shared

Benchmark A misses-per-instruction

0.30%

0.12%

Benchmark B misses-per-instruction

0.06%

0.03%

Assume the following hit latencies:

Private Cache

Shared Cache

Memory

5

20

180

5.18.1 Which cache design is better for each of these benchmarks? Use data to support your conclusion.

5.18.2 Shared cache latency increases with the CMP size. Choose the best design if the shared cache latency doubles. Off-chip bandwidth becomes the bottleneck as the number of CMP cores increases. Choose the best design if off-chip memory latency doubles.

5.18.3 Discuss the pros and cons of shared vs. private L2 caches for both single-threaded, multi-threaded, and multiprogrammed workloads, and reconsider them if having on-chip L3 caches.

5.18.4 Assume both benchmarks have a base CPI of 1(ideal L2 cache). If having a non-blocking cache improves the average number of concurrent L2 misses from 1 to 2, how much performance improvement does this provide over a shared L2 cache? How much improvement can be achieved over private L2?

5.18.5 Assume new generations of processors double the number of cores every 18 months. To maintain the same level of per-core performance, how much more off-chip memory bandwidth is needed for a processor released in three years?

5.18.6 Consider the entire memory hierarchy. What kinds of optimizations can improve the number of concurrent misses?

Mean Time Between Failures (MTBF), Mean Time To Replacement (MTTR), and Mean Time To Failure (MTTF) are useful metrics for evaluating the reliability and availability of a storage resource. Explore these concepts by answering the questions about devices with the following metrics.

MTTF

MTTR

3 Years

1 Day

(5.8.1) Calculate the MTBF for each of the devices in the table.

(5.8.2) Calculate the availability for each of the devices in the table.

(5.8.3) What happens to availability as the MTTR approaches 0? Is this a realistic situation?

(5.8.4) What happens to availability as the MTTR gets very high, i.e., a device is difficult to repair? Does this imply the device has low availability?

Question: In this exercise, we will examine space/time optimizations for page tables. The following list provides parameters of a virtual memory system.

Virtual Address (bits)

Physical DRAM Installed

Page Size

PTE Size (byte)

43

16 GiB

4KiB

4

(5.12.1) For a single-level page table, how many page table entries (PTEs) are needed? How much physical memory is needed for storing the page table?

(5.12.2) Using a multilevel page table can reduce the physical memory consumption of page tables, by keeping active PTEs in physical memory. How many levels of page tables will be needed in this case? And how many memory references are needed for address translation if missing in TLB?

(5.12.3) An inverted page table can be used to further optimize space and time. How many PTEs are needed to store the page table? Assuming a hash table implementation, what are the common case and worst case numbers of memory references needed for servicing a TLB miss?

The following table shows the contents of a 4-entry TLB.

Entry-ID

Valid

VA Page

Modified

Protection

PA Page

1

1

140

1

RW

30

2

0

40

0

RX

34

3

1

200

1

RO

32

4

1

280

0

RW

31

(5.12.4) Under what scenarios would entry 2’s valid bit be set to zero?

(5.12.5) What happens when an instruction writes to VA page 30? When would software managed TLB be faster than hardware managed TLB?

(5.12.6) What happens when an instruction writes to VA page 200?

This exercise examines the impact of different cache designs, specifically comparing associative caches to the direct-mapped caches from Section 5.4. For these exercises, refer to the address stream shown in Exercise 5.2.

(5.7.1) Using the sequence of references from Exercise 5.2, show the final cache contents for a three-way set associative cache with two- word blocks and a total size of 24 words. Use LRU replacement. For each reference identify the index bits, the tag bits, the block offset bits, and if it is a hit or a miss.

(5.7.2) Using the references from Exercise 5.2, show that final cache contents for a fully associative cache with one-word blocks and a total size of 8 words. Use LRU replacement. For each reference identify the index bits, the tag bits, and if it is a hit or a miss.

(5.7.3) Using the references from Exercise 5.2, what is the miss rate for a fully associative cache with two-word blocks and a total size of 8 words, using LRU replacement? What is the miss rate using MRU (most recently used) replacement? Finally what is the best possible miss rate for this cache, given any replacement policy?

Multilevel caching is an important technique to overcome the limited amount of space that a first level cache can provide while still maintaining its speed. Consider a processor with the following parameters:

Base CPI, No Memory Stalls

Processor Speed

Main Memory Access Time

First Level Cache MissRate per Instruction

Second Level Cache, Direct-Mapped Speed

Global Miss Rate with Second Level Cache, Direct-Mapped

Second Level Cache, Eight-Way Set Associative Speed

Global Miss Rate with Second Level Cache, Eight-Way Set Associative

1.5

2 GHz

100 ns

7%

12 cycles

3.5%

28 cycles

1.5%

(5.7.4) Calculate the CPI for the processor in the table using: 1) only a first level cache, 2) a second level direct-mapped cache, and 3) a second level eight-way set associative cache. How do these numbers change if main memory access time is doubled? If it is cut in half?

(5.7.5) It is possible to have an even greater cache hierarchy than two levels. Given the processor above with a second level, direct-mapped cache, a designer wants to add a third level cache that takes 50 cycles to access and will reduce the global miss rate to 1.3%. Would this provide better performance? In general, what are the advantages and disadvantages of adding a third level cache?

(5.7.6) In older processors such as the Intel Pentium or Alpha 21264, the second level of cache was external (located on a different chip) from the main processor and the first level cache. While this allowed for large second level caches, the latency to access the cache was much higher, and the bandwidth was typically lower because the second level cache ran at a lower frequency. Assume a 512 KiB off-chip second level cache has a global miss rate of 4%. If each additional 512 KiB of cache lowered global miss rates by 0.7%, and the cache had a total access time of 50 cycles, how big would the cache have to be to match the performance of the second level direct-mapped cache listed above? Of the eight way-set associative cache?

Recall that we have two write policies and write allocation policies, and their combinations can be implemented either in L1 or L2 cache. Assume the following choices for L1 and L2 caches:

L1

L2

Write through, non-write allocate

Write back, write allocate

5.4.1 Buffers are employed between different levels of memory hierarchy to reduce access latency. For this given configuration, list the possible buffers needed between L1 and L2 caches, as well as L2 cache and memory.

5.4.2 Describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block.

5.4.3 For a multilevel exclusive cache (a block can only reside in one of the L1 and L2 caches), configuration, describe the procedure of handling an L1 write-miss, considering the component involved and the possibility of replacing a dirty block

Consider the following program and cache behaviors.

Data Reads per 100 Instructions

Data writes per 1000 Instructions

Instruction Cache Miss Rate

Data Cache Miss Rate

Block Size(byte)

250

100

0.30%

2%

64%

5.4.4 For a write-through, write-allocate cache, what are the minimum read and write bandwidths (measured by byte per cycle) needed to achieve a CPI of 2?

5.4.5 For a write-back, write-allocate cache, assuming 30% of replaced data cache blocks are dirty, what are the minimal read and write bandwidths needed for a CPI of 2?

5.4.6 What are the minimal bandwidths needed to achieve the performance of CPI=1.5?

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free