# HG changeset patch # User Robb Romans <3r@xxxxxxxxxx> # Node ID 8eddf18dd1a469ed62d6c310f9dec496db33a36a # Parent 9983602e0ca451159f007149470d6614d65e3537 Separate file for interface/memory. Signed-off-by: Robb Romans <3r@xxxxxxxxxx> diff -r 9983602e0ca4 -r 8eddf18dd1a4 docs/src/interface.tex --- a/docs/src/interface.tex Thu Sep 15 20:25:12 2005 +++ b/docs/src/interface.tex Thu Sep 15 20:56:53 2005 @@ -90,168 +90,8 @@ %% chapter Virtual Architecture moved to architecture.tex \include{src/interface/architecture} - -\chapter{Memory} -\label{c:memory} - -Xen is responsible for managing the allocation of physical memory to -domains, and for ensuring safe use of the paging and segmentation -hardware. - - -\section{Memory Allocation} - - -Xen resides within a small fixed portion of physical memory; it also -reserves the top 64MB of every virtual address space. The remaining -physical memory is available for allocation to domains at a page -granularity. Xen tracks the ownership and use of each page, which -allows it to enforce secure partitioning between domains. - -Each domain has a maximum and current physical memory allocation. -A guest OS may run a `balloon driver' to dynamically adjust its -current memory allocation up to its limit. - - -%% XXX SMH: I use machine and physical in the next section (which -%% is kinda required for consistency with code); wonder if this -%% section should use same terms? -%% -%% Probably. -%% -%% Merging this and below section at some point prob makes sense. - -\section{Pseudo-Physical Memory} - -Since physical memory is allocated and freed on a page granularity, -there is no guarantee that a domain will receive a contiguous stretch -of physical memory. However most operating systems do not have good -support for operating in a fragmented physical address space. To aid -porting such operating systems to run on top of Xen, we make a -distinction between \emph{machine memory} and \emph{pseudo-physical -memory}. - -Put simply, machine memory refers to the entire amount of memory -installed in the machine, including that reserved by Xen, in use by -various domains, or currently unallocated. We consider machine memory -to comprise a set of 4K \emph{machine page frames} numbered -consecutively starting from 0. Machine frame numbers mean the same -within Xen or any domain. - -Pseudo-physical memory, on the other hand, is a per-domain -abstraction. It allows a guest operating system to consider its memory -allocation to consist of a contiguous range of physical page frames -starting at physical frame 0, despite the fact that the underlying -machine page frames may be sparsely allocated and in any order. - -To achieve this, Xen maintains a globally readable {\it -machine-to-physical} table which records the mapping from machine page -frames to pseudo-physical ones. In addition, each domain is supplied -with a {\it physical-to-machine} table which performs the inverse -mapping. Clearly the machine-to-physical table has size proportional -to the amount of RAM installed in the machine, while each -physical-to-machine table has size proportional to the memory -allocation of the given domain. - -Architecture dependent code in guest operating systems can then use -the two tables to provide the abstraction of pseudo-physical -memory. In general, only certain specialized parts of the operating -system (such as page table management) needs to understand the -difference between machine and pseudo-physical addresses. - -\section{Page Table Updates} - -In the default mode of operation, Xen enforces read-only access to -page tables and requires guest operating systems to explicitly request -any modifications. Xen validates all such requests and only applies -updates that it deems safe. This is necessary to prevent domains from -adding arbitrary mappings to their page tables. - -To aid validation, Xen associates a type and reference count with each -memory page. A page has one of the following -mutually-exclusive types at any point in time: page directory ({\sf -PD}), page table ({\sf PT}), local descriptor table ({\sf LDT}), -global descriptor table ({\sf GDT}), or writable ({\sf RW}). Note that -a guest OS may always create readable mappings of its own memory -regardless of its current type. -%%% XXX: possibly explain more about ref count 'lifecyle' here? -This mechanism is used to -maintain the invariants required for safety; for example, a domain -cannot have a writable mapping to any part of a page table as this -would require the page concerned to simultaneously be of types {\sf - PT} and {\sf RW}. - - -%\section{Writable Page Tables} - -Xen also provides an alternative mode of operation in which guests be -have the illusion that their page tables are directly writable. Of -course this is not really the case, since Xen must still validate -modifications to ensure secure partitioning. To this end, Xen traps -any write attempt to a memory page of type {\sf PT} (i.e., that is -currently part of a page table). If such an access occurs, Xen -temporarily allows write access to that page while at the same time -{\em disconnecting} it from the page table that is currently in -use. This allows the guest to safely make updates to the page because -the newly-updated entries cannot be used by the MMU until Xen -revalidates and reconnects the page. -Reconnection occurs automatically in a number of situations: for -example, when the guest modifies a different page-table page, when the -domain is preempted, or whenever the guest uses Xen's explicit -page-table update interfaces. - -Finally, Xen also supports a form of \emph{shadow page tables} in -which the guest OS uses a independent copy of page tables which are -unknown to the hardware (i.e.\ which are never pointed to by {\tt -cr3}). Instead Xen propagates changes made to the guest's tables to the -real ones, and vice versa. This is useful for logging page writes -(e.g.\ for live migration or checkpoint). A full version of the shadow -page tables also allows guest OS porting with less effort. - -\section{Segment Descriptor Tables} - -On boot a guest is supplied with a default GDT, which does not reside -within its own memory allocation. If the guest wishes to use other -than the default `flat' ring-1 and ring-3 segments that this GDT -provides, it must register a custom GDT and/or LDT with Xen, -allocated from its own memory. Note that a number of GDT -entries are reserved by Xen -- any custom GDT must also include -sufficient space for these entries. - -For example, the following hypercall is used to specify a new GDT: - -\begin{quote} -int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em entries}) - -{\em frame\_list}: An array of up to 16 machine page frames within -which the GDT resides. Any frame registered as a GDT frame may only -be mapped read-only within the guest's address space (e.g., no -writable mappings, no use as a page-table page, and so on). - -{\em entries}: The number of descriptor-entry slots in the GDT. Note -that the table must be large enough to contain Xen's reserved entries; -thus we must have `{\em entries $>$ LAST\_RESERVED\_GDT\_ENTRY}\ '. -Note also that, after registering the GDT, slots {\em FIRST\_} through -{\em LAST\_RESERVED\_GDT\_ENTRY} are no longer usable by the guest and -may be overwritten by Xen. -\end{quote} - -The LDT is updated via the generic MMU update mechanism (i.e., via -the {\tt mmu\_update()} hypercall. - -\section{Start of Day} - -The start-of-day environment for guest operating systems is rather -different to that provided by the underlying hardware. In particular, -the processor is already executing in protected mode with paging -enabled. - -{\it Domain 0} is created and booted by Xen itself. For all subsequent -domains, the analogue of the boot-loader is the {\it domain builder}, -user-space software running in {\it domain 0}. The domain builder -is responsible for building the initial page tables for a domain -and loading its kernel image at the appropriate virtual address. - +%% chapter Memory moved to memory.tex +\include{src/interface/memory} \chapter{Devices} diff -r 9983602e0ca4 -r 8eddf18dd1a4 docs/src/interface/memory.tex --- /dev/null Thu Sep 15 20:25:12 2005 +++ b/docs/src/interface/memory.tex Thu Sep 15 20:56:53 2005 @@ -0,0 +1,162 @@ +\chapter{Memory} +\label{c:memory} + +Xen is responsible for managing the allocation of physical memory to +domains, and for ensuring safe use of the paging and segmentation +hardware. + + +\section{Memory Allocation} + +Xen resides within a small fixed portion of physical memory; it also +reserves the top 64MB of every virtual address space. The remaining +physical memory is available for allocation to domains at a page +granularity. Xen tracks the ownership and use of each page, which +allows it to enforce secure partitioning between domains. + +Each domain has a maximum and current physical memory allocation. A +guest OS may run a `balloon driver' to dynamically adjust its current +memory allocation up to its limit. + + +%% XXX SMH: I use machine and physical in the next section (which is +%% kinda required for consistency with code); wonder if this section +%% should use same terms? +%% +%% Probably. +%% +%% Merging this and below section at some point prob makes sense. + +\section{Pseudo-Physical Memory} + +Since physical memory is allocated and freed on a page granularity, +there is no guarantee that a domain will receive a contiguous stretch +of physical memory. However most operating systems do not have good +support for operating in a fragmented physical address space. To aid +porting such operating systems to run on top of Xen, we make a +distinction between \emph{machine memory} and \emph{pseudo-physical + memory}. + +Put simply, machine memory refers to the entire amount of memory +installed in the machine, including that reserved by Xen, in use by +various domains, or currently unallocated. We consider machine memory +to comprise a set of 4K \emph{machine page frames} numbered +consecutively starting from 0. Machine frame numbers mean the same +within Xen or any domain. + +Pseudo-physical memory, on the other hand, is a per-domain +abstraction. It allows a guest operating system to consider its memory +allocation to consist of a contiguous range of physical page frames +starting at physical frame 0, despite the fact that the underlying +machine page frames may be sparsely allocated and in any order. + +To achieve this, Xen maintains a globally readable {\it + machine-to-physical} table which records the mapping from machine +page frames to pseudo-physical ones. In addition, each domain is +supplied with a {\it physical-to-machine} table which performs the +inverse mapping. Clearly the machine-to-physical table has size +proportional to the amount of RAM installed in the machine, while each +physical-to-machine table has size proportional to the memory +allocation of the given domain. + +Architecture dependent code in guest operating systems can then use +the two tables to provide the abstraction of pseudo-physical memory. +In general, only certain specialized parts of the operating system +(such as page table management) needs to understand the difference +between machine and pseudo-physical addresses. + + +\section{Page Table Updates} + +In the default mode of operation, Xen enforces read-only access to +page tables and requires guest operating systems to explicitly request +any modifications. Xen validates all such requests and only applies +updates that it deems safe. This is necessary to prevent domains from +adding arbitrary mappings to their page tables. + +To aid validation, Xen associates a type and reference count with each +memory page. A page has one of the following mutually-exclusive types +at any point in time: page directory ({\sf PD}), page table ({\sf + PT}), local descriptor table ({\sf LDT}), global descriptor table +({\sf GDT}), or writable ({\sf RW}). Note that a guest OS may always +create readable mappings of its own memory regardless of its current +type. + +%%% XXX: possibly explain more about ref count 'lifecyle' here? +This mechanism is used to maintain the invariants required for safety; +for example, a domain cannot have a writable mapping to any part of a +page table as this would require the page concerned to simultaneously +be of types {\sf PT} and {\sf RW}. + + +% \section{Writable Page Tables} + +Xen also provides an alternative mode of operation in which guests be +have the illusion that their page tables are directly writable. Of +course this is not really the case, since Xen must still validate +modifications to ensure secure partitioning. To this end, Xen traps +any write attempt to a memory page of type {\sf PT} (i.e., that is +currently part of a page table). If such an access occurs, Xen +temporarily allows write access to that page while at the same time +\emph{disconnecting} it from the page table that is currently in use. +This allows the guest to safely make updates to the page because the +newly-updated entries cannot be used by the MMU until Xen revalidates +and reconnects the page. Reconnection occurs automatically in a +number of situations: for example, when the guest modifies a different +page-table page, when the domain is preempted, or whenever the guest +uses Xen's explicit page-table update interfaces. + +Finally, Xen also supports a form of \emph{shadow page tables} in +which the guest OS uses a independent copy of page tables which are +unknown to the hardware (i.e.\ which are never pointed to by {\tt + cr3}). Instead Xen propagates changes made to the guest's tables to +the real ones, and vice versa. This is useful for logging page writes +(e.g.\ for live migration or checkpoint). A full version of the shadow +page tables also allows guest OS porting with less effort. + + +\section{Segment Descriptor Tables} + +On boot a guest is supplied with a default GDT, which does not reside +within its own memory allocation. If the guest wishes to use other +than the default `flat' ring-1 and ring-3 segments that this GDT +provides, it must register a custom GDT and/or LDT with Xen, allocated +from its own memory. Note that a number of GDT entries are reserved by +Xen -- any custom GDT must also include sufficient space for these +entries. + +For example, the following hypercall is used to specify a new GDT: + +\begin{quote} + int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em + entries}) + + \emph{frame\_list}: An array of up to 16 machine page frames within + which the GDT resides. Any frame registered as a GDT frame may only + be mapped read-only within the guest's address space (e.g., no + writable mappings, no use as a page-table page, and so on). + + \emph{entries}: The number of descriptor-entry slots in the GDT. + Note that the table must be large enough to contain Xen's reserved + entries; thus we must have `{\em entries $>$ + LAST\_RESERVED\_GDT\_ENTRY}\ '. Note also that, after registering + the GDT, slots \emph{FIRST\_} through + \emph{LAST\_RESERVED\_GDT\_ENTRY} are no longer usable by the guest + and may be overwritten by Xen. +\end{quote} + +The LDT is updated via the generic MMU update mechanism (i.e., via the +{\tt mmu\_update()} hypercall. + +\section{Start of Day} + +The start-of-day environment for guest operating systems is rather +different to that provided by the underlying hardware. In particular, +the processor is already executing in protected mode with paging +enabled. + +{\it Domain 0} is created and booted by Xen itself. For all subsequent +domains, the analogue of the boot-loader is the {\it domain builder}, +user-space software running in {\it domain 0}. The domain builder is +responsible for building the initial page tables for a domain and +loading its kernel image at the appropriate virtual address.