[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-changelog] merge?
# HG changeset patch # User cl349@xxxxxxxxxxxxxxxxxxxx # Node ID f8e7af29daa1d6fd6cb4a5f7a093b229f5607c94 # Parent a6b72464a0420c7cece60aa52d27106cc35641fe # Parent 750ad97f37b0a49451c9b887c8ccb9134cc8a1ec merge? diff -r a6b72464a042 -r f8e7af29daa1 Makefile --- a/Makefile Tue Sep 20 09:43:29 2005 +++ b/Makefile Tue Sep 20 09:43:46 2005 @@ -166,27 +166,25 @@ uninstall: [ -d $(D)/etc/xen ] && mv -f $(D)/etc/xen $(D)/etc/xen.old-`date +%s` rm -rf $(D)/etc/init.d/xend* - rm -rf $(D)/usr/$(LIBDIR)/libxc* $(D)/usr/$(LIBDIR)/libxutil* - rm -rf $(D)/usr/$(LIBDIR)/python/xen $(D)/usr/include/xen - rm -rf $(D)/usr/$(LIBDIR)/share/xen $(D)/usr/$(LIBDIR)/libxenstore* + rm -rf $(D)/etc/hotplug/xen-backend.agent rm -rf $(D)/var/run/xen* $(D)/var/lib/xen* - rm -rf $(D)/usr/include/xcs_proto.h $(D)/usr/include/xc.h - rm -rf $(D)/usr/include/xs_lib.h $(D)/usr/include/xs.h - rm -rf $(D)/usr/sbin/xcs $(D)/usr/sbin/xcsdump $(D)/usr/sbin/xen* - rm -rf $(D)/usr/sbin/netfix - rm -rf $(D)/usr/sbin/xfrd $(D)/usr/sbin/xm - rm -rf $(D)/usr/share/doc/xen $(D)/usr/man/man*/xentrace* - rm -rf $(D)/usr/bin/xen* $(D)/usr/bin/miniterm rm -rf $(D)/boot/*xen* rm -rf $(D)/lib/modules/*xen* + rm -rf $(D)/usr/bin/xen* $(D)/usr/bin/lomount rm -rf $(D)/usr/bin/cpuperf-perfcntr $(D)/usr/bin/cpuperf-xen rm -rf $(D)/usr/bin/xc_shadow - rm -rf $(D)/usr/share/xen $(D)/usr/libexec/xen + rm -rf $(D)/usr/include/xenctrl.h + rm -rf $(D)/usr/include/xs_lib.h $(D)/usr/include/xs.h + rm -rf $(D)/usr/include/xen + rm -rf $(D)/usr/$(LIBDIR)/libxenctrl* $(D)/usr/$(LIBDIR)/libxenguest* + rm -rf $(D)/usr/$(LIBDIR)/libxenstore* + rm -rf $(D)/usr/$(LIBDIR)/python/xen $(D)/usr/$(LIBDIR)/xen + rm -rf $(D)/usr/libexec/xen + rm -rf $(D)/usr/sbin/xen* $(D)/usr/sbin/netfix $(D)/usr/sbin/xm + rm -rf $(D)/usr/share/doc/xen + rm -rf $(D)/usr/share/xen rm -rf $(D)/usr/share/man/man1/xen* rm -rf $(D)/usr/share/man/man8/xen* - rm -rf $(D)/usr/lib/xen - rm -rf $(D)/etc/hotplug.d/xen-backend - rm -rf $(D)/etc/hotplug/xen-backend.agent # Legacy targets for compatibility linux24: diff -r a6b72464a042 -r f8e7af29daa1 docs/Makefile --- a/docs/Makefile Tue Sep 20 09:43:29 2005 +++ b/docs/Makefile Tue Sep 20 09:43:46 2005 @@ -12,7 +12,7 @@ pkgdocdir := /usr/share/doc/xen -DOC_TEX := $(wildcard src/*.tex) +DOC_TEX := src/user.tex src/interface.tex DOC_PS := $(patsubst src/%.tex,ps/%.ps,$(DOC_TEX)) DOC_PDF := $(patsubst src/%.tex,pdf/%.pdf,$(DOC_TEX)) DOC_HTML := $(patsubst src/%.tex,html/%/index.html,$(DOC_TEX)) diff -r a6b72464a042 -r f8e7af29daa1 docs/src/interface.tex --- a/docs/src/interface.tex Tue Sep 20 09:43:29 2005 +++ b/docs/src/interface.tex Tue Sep 20 09:43:46 2005 @@ -87,1084 +87,23 @@ mechanism and policy within the system. +%% chapter Virtual Architecture moved to architecture.tex +\include{src/interface/architecture} -\chapter{Virtual Architecture} +%% chapter Memory moved to memory.tex +\include{src/interface/memory} -On a Xen-based system, the hypervisor itself runs in {\it ring 0}. It -has full access to the physical memory available in the system and is -responsible for allocating portions of it to the domains. Guest -operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as -they see fit. Segmentation is used to prevent the guest OS from -accessing the portion of the address space that is reserved for -Xen. We expect most guest operating systems will use ring 1 for their -own operation and place applications in ring 3. +%% chapter Devices moved to devices.tex +\include{src/interface/devices} -In this chapter we consider the basic virtual architecture provided -by Xen: the basic CPU state, exception and interrupt handling, and -time. Other aspects such as memory and device access are discussed -in later chapters. - -\section{CPU state} - -All privileged state must be handled by Xen. The guest OS has no -direct access to CR3 and is not permitted to update privileged bits in -EFLAGS. Guest OSes use \emph{hypercalls} to invoke operations in Xen; -these are analogous to system calls but occur from ring 1 to ring 0. - -A list of all hypercalls is given in Appendix~\ref{a:hypercalls}. - - - -\section{Exceptions} - -A virtual IDT is provided --- a domain can submit a table of trap -handlers to Xen via the {\tt set\_trap\_table()} hypercall. Most trap -handlers are identical to native x86 handlers, although the page-fault -handler is somewhat different. - - -\section{Interrupts and events} - -Interrupts are virtualized by mapping them to \emph{events}, which are -delivered asynchronously to the target domain using a callback -supplied via the {\tt set\_callbacks()} hypercall. A guest OS can map -these events onto its standard interrupt dispatch mechanisms. Xen is -responsible for determining the target domain that will handle each -physical interrupt source. For more details on the binding of event -sources to events, see Chapter~\ref{c:devices}. - - - -\section{Time} - -Guest operating systems need to be aware of the passage of both real -(or wallclock) time and their own `virtual time' (the time for -which they have been executing). Furthermore, Xen has a notion of -time which is used for scheduling. The following notions of -time are provided: - -\begin{description} -\item[Cycle counter time.] - -This provides a fine-grained time reference. The cycle counter time is -used to accurately extrapolate the other time references. On SMP machines -it is currently assumed that the cycle counter time is synchronized between -CPUs. The current x86-based implementation achieves this within inter-CPU -communication latencies. - -\item[System time.] - -This is a 64-bit counter which holds the number of nanoseconds that -have elapsed since system boot. - - -\item[Wall clock time.] - -This is the time of day in a Unix-style {\tt struct timeval} (seconds -and microseconds since 1 January 1970, adjusted by leap seconds). An -NTP client hosted by {\it domain 0} can keep this value accurate. - - -\item[Domain virtual time.] - -This progresses at the same pace as system time, but only while a -domain is executing --- it stops while a domain is de-scheduled. -Therefore the share of the CPU that a domain receives is indicated by -the rate at which its virtual time increases. - -\end{description} - - -Xen exports timestamps for system time and wall-clock time to guest -operating systems through a shared page of memory. Xen also provides -the cycle counter time at the instant the timestamps were calculated, -and the CPU frequency in Hertz. This allows the guest to extrapolate -system and wall-clock times accurately based on the current cycle -counter time. - -Since all time stamps need to be updated and read \emph{atomically} -two version numbers are also stored in the shared info page. The -first is incremented prior to an update, while the second is only -incremented afterwards. Thus a guest can be sure that it read a consistent -state by checking the two version numbers are equal. - -Xen includes a periodic ticker which sends a timer event to the -currently executing domain every 10ms. The Xen scheduler also sends a -timer event whenever a domain is scheduled; this allows the guest OS -to adjust for the time that has passed while it has been inactive. In -addition, Xen allows each domain to request that they receive a timer -event sent at a specified system time by using the {\tt -set\_timer\_op()} hypercall. Guest OSes may use this timer to -implement timeout values when they block. - - - -%% % akw: demoting this to a section -- not sure if there is any point -%% % though, maybe just remove it. - -\section{Xen CPU Scheduling} - -Xen offers a uniform API for CPU schedulers. It is possible to choose -from a number of schedulers at boot and it should be easy to add more. -The BVT, Atropos and Round Robin schedulers are part of the normal -Xen distribution. BVT provides proportional fair shares of the CPU to -the running domains. Atropos can be used to reserve absolute shares -of the CPU for each domain. Round-robin is provided as an example of -Xen's internal scheduler API. - -\paragraph*{Note: SMP host support} -Xen has always supported SMP host systems. Domains are statically assigned to -CPUs, either at creation time or when manually pinning to a particular CPU. -The current schedulers then run locally on each CPU to decide which of the -assigned domains should be run there. The user-level control software -can be used to perform coarse-grain load-balancing between CPUs. - - -%% More information on the characteristics and use of these schedulers is -%% available in {\tt Sched-HOWTO.txt}. - - -\section{Privileged operations} - -Xen exports an extended interface to privileged domains (viz.\ {\it - Domain 0}). This allows such domains to build and boot other domains -on the server, and provides control interfaces for managing -scheduling, memory, networking, and block devices. - - -\chapter{Memory} -\label{c:memory} - -Xen is responsible for managing the allocation of physical memory to -domains, and for ensuring safe use of the paging and segmentation -hardware. - - -\section{Memory Allocation} - - -Xen resides within a small fixed portion of physical memory; it also -reserves the top 64MB of every virtual address space. The remaining -physical memory is available for allocation to domains at a page -granularity. Xen tracks the ownership and use of each page, which -allows it to enforce secure partitioning between domains. - -Each domain has a maximum and current physical memory allocation. -A guest OS may run a `balloon driver' to dynamically adjust its -current memory allocation up to its limit. - - -%% XXX SMH: I use machine and physical in the next section (which -%% is kinda required for consistency with code); wonder if this -%% section should use same terms? -%% -%% Probably. -%% -%% Merging this and below section at some point prob makes sense. - -\section{Pseudo-Physical Memory} - -Since physical memory is allocated and freed on a page granularity, -there is no guarantee that a domain will receive a contiguous stretch -of physical memory. However most operating systems do not have good -support for operating in a fragmented physical address space. To aid -porting such operating systems to run on top of Xen, we make a -distinction between \emph{machine memory} and \emph{pseudo-physical -memory}. - -Put simply, machine memory refers to the entire amount of memory -installed in the machine, including that reserved by Xen, in use by -various domains, or currently unallocated. We consider machine memory -to comprise a set of 4K \emph{machine page frames} numbered -consecutively starting from 0. Machine frame numbers mean the same -within Xen or any domain. - -Pseudo-physical memory, on the other hand, is a per-domain -abstraction. It allows a guest operating system to consider its memory -allocation to consist of a contiguous range of physical page frames -starting at physical frame 0, despite the fact that the underlying -machine page frames may be sparsely allocated and in any order. - -To achieve this, Xen maintains a globally readable {\it -machine-to-physical} table which records the mapping from machine page -frames to pseudo-physical ones. In addition, each domain is supplied -with a {\it physical-to-machine} table which performs the inverse -mapping. Clearly the machine-to-physical table has size proportional -to the amount of RAM installed in the machine, while each -physical-to-machine table has size proportional to the memory -allocation of the given domain. - -Architecture dependent code in guest operating systems can then use -the two tables to provide the abstraction of pseudo-physical -memory. In general, only certain specialized parts of the operating -system (such as page table management) needs to understand the -difference between machine and pseudo-physical addresses. - -\section{Page Table Updates} - -In the default mode of operation, Xen enforces read-only access to -page tables and requires guest operating systems to explicitly request -any modifications. Xen validates all such requests and only applies -updates that it deems safe. This is necessary to prevent domains from -adding arbitrary mappings to their page tables. - -To aid validation, Xen associates a type and reference count with each -memory page. A page has one of the following -mutually-exclusive types at any point in time: page directory ({\sf -PD}), page table ({\sf PT}), local descriptor table ({\sf LDT}), -global descriptor table ({\sf GDT}), or writable ({\sf RW}). Note that -a guest OS may always create readable mappings of its own memory -regardless of its current type. -%%% XXX: possibly explain more about ref count 'lifecyle' here? -This mechanism is used to -maintain the invariants required for safety; for example, a domain -cannot have a writable mapping to any part of a page table as this -would require the page concerned to simultaneously be of types {\sf - PT} and {\sf RW}. - - -%\section{Writable Page Tables} - -Xen also provides an alternative mode of operation in which guests be -have the illusion that their page tables are directly writable. Of -course this is not really the case, since Xen must still validate -modifications to ensure secure partitioning. To this end, Xen traps -any write attempt to a memory page of type {\sf PT} (i.e., that is -currently part of a page table). If such an access occurs, Xen -temporarily allows write access to that page while at the same time -{\em disconnecting} it from the page table that is currently in -use. This allows the guest to safely make updates to the page because -the newly-updated entries cannot be used by the MMU until Xen -revalidates and reconnects the page. -Reconnection occurs automatically in a number of situations: for -example, when the guest modifies a different page-table page, when the -domain is preempted, or whenever the guest uses Xen's explicit -page-table update interfaces. - -Finally, Xen also supports a form of \emph{shadow page tables} in -which the guest OS uses a independent copy of page tables which are -unknown to the hardware (i.e.\ which are never pointed to by {\tt -cr3}). Instead Xen propagates changes made to the guest's tables to the -real ones, and vice versa. This is useful for logging page writes -(e.g.\ for live migration or checkpoint). A full version of the shadow -page tables also allows guest OS porting with less effort. - -\section{Segment Descriptor Tables} - -On boot a guest is supplied with a default GDT, which does not reside -within its own memory allocation. If the guest wishes to use other -than the default `flat' ring-1 and ring-3 segments that this GDT -provides, it must register a custom GDT and/or LDT with Xen, -allocated from its own memory. Note that a number of GDT -entries are reserved by Xen -- any custom GDT must also include -sufficient space for these entries. - -For example, the following hypercall is used to specify a new GDT: - -\begin{quote} -int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em entries}) - -{\em frame\_list}: An array of up to 16 machine page frames within -which the GDT resides. Any frame registered as a GDT frame may only -be mapped read-only within the guest's address space (e.g., no -writable mappings, no use as a page-table page, and so on). - -{\em entries}: The number of descriptor-entry slots in the GDT. Note -that the table must be large enough to contain Xen's reserved entries; -thus we must have `{\em entries $>$ LAST\_RESERVED\_GDT\_ENTRY}\ '. -Note also that, after registering the GDT, slots {\em FIRST\_} through -{\em LAST\_RESERVED\_GDT\_ENTRY} are no longer usable by the guest and -may be overwritten by Xen. -\end{quote} - -The LDT is updated via the generic MMU update mechanism (i.e., via -the {\tt mmu\_update()} hypercall. - -\section{Start of Day} - -The start-of-day environment for guest operating systems is rather -different to that provided by the underlying hardware. In particular, -the processor is already executing in protected mode with paging -enabled. - -{\it Domain 0} is created and booted by Xen itself. For all subsequent -domains, the analogue of the boot-loader is the {\it domain builder}, -user-space software running in {\it domain 0}. The domain builder -is responsible for building the initial page tables for a domain -and loading its kernel image at the appropriate virtual address. - - - -\chapter{Devices} -\label{c:devices} - -Devices such as network and disk are exported to guests using a -split device driver. The device driver domain, which accesses the -physical device directly also runs a {\em backend} driver, serving -requests to that device from guests. Each guest will use a simple -{\em frontend} driver, to access the backend. Communication between these -domains is composed of two parts: First, data is placed onto a shared -memory page between the domains. Second, an event channel between the -two domains is used to pass notification that data is outstanding. -This separation of notification from data transfer allows message -batching, and results in very efficient device access. - -Event channels are used extensively in device virtualization; each -domain has a number of end-points or \emph{ports} each of which -may be bound to one of the following \emph{event sources}: -\begin{itemize} - \item a physical interrupt from a real device, - \item a virtual interrupt (callback) from Xen, or - \item a signal from another domain -\end{itemize} - -Events are lightweight and do not carry much information beyond -the source of the notification. Hence when performing bulk data -transfer, events are typically used as synchronization primitives -over a shared memory transport. Event channels are managed via -the {\tt event\_channel\_op()} hypercall; for more details see -Section~\ref{s:idc}. - -This chapter focuses on some individual device interfaces -available to Xen guests. - -\section{Network I/O} - -Virtual network device services are provided by shared memory -communication with a backend domain. From the point of view of -other domains, the backend may be viewed as a virtual ethernet switch -element with each domain having one or more virtual network interfaces -connected to it. - -\subsection{Backend Packet Handling} - -The backend driver is responsible for a variety of actions relating to -the transmission and reception of packets from the physical device. -With regard to transmission, the backend performs these key actions: - -\begin{itemize} -\item {\bf Validation:} To ensure that domains do not attempt to - generate invalid (e.g. spoofed) traffic, the backend driver may - validate headers ensuring that source MAC and IP addresses match the - interface that they have been sent from. - - Validation functions can be configured using standard firewall rules - ({\small{\tt iptables}} in the case of Linux). - -\item {\bf Scheduling:} Since a number of domains can share a single - physical network interface, the backend must mediate access when - several domains each have packets queued for transmission. This - general scheduling function subsumes basic shaping or rate-limiting - schemes. - -\item {\bf Logging and Accounting:} The backend domain can be - configured with classifier rules that control how packets are - accounted or logged. For example, log messages might be generated - whenever a domain attempts to send a TCP packet containing a SYN. -\end{itemize} - -On receipt of incoming packets, the backend acts as a simple -demultiplexer: Packets are passed to the appropriate virtual -interface after any necessary logging and accounting have been carried -out. - -\subsection{Data Transfer} - -Each virtual interface uses two ``descriptor rings'', one for transmit, -the other for receive. Each descriptor identifies a block of contiguous -physical memory allocated to the domain. - -The transmit ring carries packets to transmit from the guest to the -backend domain. The return path of the transmit ring carries messages -indicating that the contents have been physically transmitted and the -backend no longer requires the associated pages of memory. - -To receive packets, the guest places descriptors of unused pages on -the receive ring. The backend will return received packets by -exchanging these pages in the domain's memory with new pages -containing the received data, and passing back descriptors regarding -the new packets on the ring. This zero-copy approach allows the -backend to maintain a pool of free pages to receive packets into, and -then deliver them to appropriate domains after examining their -headers. - -% -%Real physical addresses are used throughout, with the domain performing -%translation from pseudo-physical addresses if that is necessary. - -If a domain does not keep its receive ring stocked with empty buffers then -packets destined to it may be dropped. This provides some defence against -receive livelock problems because an overload domain will cease to receive -further data. Similarly, on the transmit path, it provides the application -with feedback on the rate at which packets are able to leave the system. - - -Flow control on rings is achieved by including a pair of producer -indexes on the shared ring page. Each side will maintain a private -consumer index indicating the next outstanding message. In this -manner, the domains cooperate to divide the ring into two message -lists, one in each direction. Notification is decoupled from the -immediate placement of new messages on the ring; the event channel -will be used to generate notification when {\em either} a certain -number of outstanding messages are queued, {\em or} a specified number -of nanoseconds have elapsed since the oldest message was placed on the -ring. - -% Not sure if my version is any better -- here is what was here before: -%% Synchronization between the backend domain and the guest is achieved using -%% counters held in shared memory that is accessible to both. Each ring has -%% associated producer and consumer indices indicating the area in the ring -%% that holds descriptors that contain data. After receiving {\it n} packets -%% or {\t nanoseconds} after receiving the first packet, the hypervisor sends -%% an event to the domain. - -\section{Block I/O} - -All guest OS disk access goes through the virtual block device VBD -interface. This interface allows domains access to portions of block -storage devices visible to the the block backend device. The VBD -interface is a split driver, similar to the network interface -described above. A single shared memory ring is used between the -frontend and backend drivers, across which read and write messages are -sent. - -Any block device accessible to the backend domain, including -network-based block (iSCSI, *NBD, etc), loopback and LVM/MD devices, -can be exported as a VBD. Each VBD is mapped to a device node in the -guest, specified in the guest's startup configuration. - -Old (Xen 1.2) virtual disks are not supported under Xen 2.0, since -similar functionality can be achieved using the more complete LVM -system, which is already in widespread use. - -\subsection{Data Transfer} - -The single ring between the guest and the block backend supports three -messages: - -\begin{description} -\item [{\small {\tt PROBE}}:] Return a list of the VBDs available to this guest - from the backend. The request includes a descriptor of a free page - into which the reply will be written by the backend. - -\item [{\small {\tt READ}}:] Read data from the specified block device. The - front end identifies the device and location to read from and - attaches pages for the data to be copied to (typically via DMA from - the device). The backend acknowledges completed read requests as - they finish. - -\item [{\small {\tt WRITE}}:] Write data to the specified block device. This - functions essentially as {\small {\tt READ}}, except that the data moves to - the device instead of from it. -\end{description} - -% um... some old text -%% In overview, the same style of descriptor-ring that is used for -%% network packets is used here. Each domain has one ring that carries -%% operation requests to the hypervisor and carries the results back -%% again. - -%% Rather than copying data, the backend simply maps the domain's buffers -%% in order to enable direct DMA to them. The act of mapping the buffers -%% also increases the reference counts of the underlying pages, so that -%% the unprivileged domain cannot try to return them to the hypervisor, -%% install them as page tables, or any other unsafe behaviour. -%% %block API here - - -\chapter{Further Information} - - -If you have questions that are not answered by this manual, the -sources of information listed below may be of interest to you. Note -that bug reports, suggestions and contributions related to the -software (or the documentation) should be sent to the Xen developers' -mailing list (address below). - -\section{Other documentation} - -If you are mainly interested in using (rather than developing for) -Xen, the {\em Xen Users' Manual} is distributed in the {\tt docs/} -directory of the Xen source distribution. - -% Various HOWTOs are also available in {\tt docs/HOWTOS}. - -\section{Online references} - -The official Xen web site is found at: -\begin{quote} -{\tt http://www.cl.cam.ac.uk/Research/SRG/netos/xen/} -\end{quote} - -This contains links to the latest versions of all on-line -documentation. - -\section{Mailing lists} - -There are currently four official Xen mailing lists: - -\begin{description} -\item[xen-devel@xxxxxxxxxxxxxxxxxxx] Used for development -discussions and bug reports. Subscribe at: \\ -{\small {\tt http://lists.xensource.com/xen-devel}} -\item[xen-users@xxxxxxxxxxxxxxxxxxx] Used for installation and usage -discussions and requests for help. Subscribe at: \\ -{\small {\tt http://lists.xensource.com/xen-users}} -\item[xen-announce@xxxxxxxxxxxxxxxxxxx] Used for announcements only. -Subscribe at: \\ -{\small {\tt http://lists.xensource.com/xen-announce}} -\item[xen-changelog@xxxxxxxxxxxxxxxxxxx] Changelog feed -from the unstable and 2.0 trees - developer oriented. Subscribe at: \\ -{\small {\tt http://lists.xensource.com/xen-changelog}} -\end{description} - -Of these, xen-devel is the most active. - - +%% chapter Further Information moved to further_info.tex +\include{src/interface/further_info} \appendix -%\newcommand{\hypercall}[1]{\vspace{5mm}{\large\sf #1}} - - - - - -\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}} - - - - - - -\chapter{Xen Hypercalls} -\label{a:hypercalls} - -Hypercalls represent the procedural interface to Xen; this appendix -categorizes and describes the current set of hypercalls. - -\section{Invoking Hypercalls} - -Hypercalls are invoked in a manner analogous to system calls in a -conventional operating system; a software interrupt is issued which -vectors to an entry point within Xen. On x86\_32 machines the -instruction required is {\tt int \$82}; the (real) IDT is setup so -that this may only be issued from within ring 1. The particular -hypercall to be invoked is contained in {\tt EAX} --- a list -mapping these values to symbolic hypercall names can be found -in {\tt xen/include/public/xen.h}. - -On some occasions a set of hypercalls will be required to carry -out a higher-level function; a good example is when a guest -operating wishes to context switch to a new process which -requires updating various privileged CPU state. As an optimization -for these cases, there is a generic mechanism to issue a set of -hypercalls as a batch: - -\begin{quote} -\hypercall{multicall(void *call\_list, int nr\_calls)} - -Execute a series of hypervisor calls; {\tt nr\_calls} is the length of -the array of {\tt multicall\_entry\_t} structures pointed to be {\tt -call\_list}. Each entry contains the hypercall operation code followed -by up to 7 word-sized arguments. -\end{quote} - -Note that multicalls are provided purely as an optimization; there is -no requirement to use them when first porting a guest operating -system. - - -\section{Virtual CPU Setup} - -At start of day, a guest operating system needs to setup the virtual -CPU it is executing on. This includes installing vectors for the -virtual IDT so that the guest OS can handle interrupts, page faults, -etc. However the very first thing a guest OS must setup is a pair -of hypervisor callbacks: these are the entry points which Xen will -use when it wishes to notify the guest OS of an occurrence. - -\begin{quote} -\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long - event\_address, unsigned long failsafe\_selector, unsigned long - failsafe\_address) } - -Register the normal (``event'') and failsafe callbacks for -event processing. In each case the code segment selector and -address within that segment are provided. The selectors must -have RPL 1; in XenLinux we simply use the kernel's CS for both -{\tt event\_selector} and {\tt failsafe\_selector}. - -The value {\tt event\_address} specifies the address of the guest OSes -event handling and dispatch routine; the {\tt failsafe\_address} -specifies a separate entry point which is used only if a fault occurs -when Xen attempts to use the normal callback. -\end{quote} - - -After installing the hypervisor callbacks, the guest OS can -install a `virtual IDT' by using the following hypercall: - -\begin{quote} -\hypercall{set\_trap\_table(trap\_info\_t *table)} - -Install one or more entries into the per-domain -trap handler table (essentially a software version of the IDT). -Each entry in the array pointed to by {\tt table} includes the -exception vector number with the corresponding segment selector -and entry point. Most guest OSes can use the same handlers on -Xen as when running on the real hardware; an exception is the -page fault handler (exception vector 14) where a modified -stack-frame layout is used. - - -\end{quote} - - - -\section{Scheduling and Timer} - -Domains are preemptively scheduled by Xen according to the -parameters installed by domain 0 (see Section~\ref{s:dom0ops}). -In addition, however, a domain may choose to explicitly -control certain behavior with the following hypercall: - -\begin{quote} -\hypercall{sched\_op(unsigned long op)} - -Request scheduling operation from hypervisor. The options are: {\it -yield}, {\it block}, and {\it shutdown}. {\it yield} keeps the -calling domain runnable but may cause a reschedule if other domains -are runnable. {\it block} removes the calling domain from the run -queue and cause is to sleeps until an event is delivered to it. {\it -shutdown} is used to end the domain's execution; the caller can -additionally specify whether the domain should reboot, halt or -suspend. -\end{quote} - -To aid the implementation of a process scheduler within a guest OS, -Xen provides a virtual programmable timer: - -\begin{quote} -\hypercall{set\_timer\_op(uint64\_t timeout)} - -Request a timer event to be sent at the specified system time (time -in nanoseconds since system boot). The hypercall actually passes the -64-bit timeout value as a pair of 32-bit values. - -\end{quote} - -Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op} -allows block-with-timeout semantics. - - -\section{Page Table Management} - -Since guest operating systems have read-only access to their page -tables, Xen must be involved when making any changes. The following -multi-purpose hypercall can be used to modify page-table entries, -update the machine-to-physical mapping table, flush the TLB, install -a new page-table base pointer, and more. - -\begin{quote} -\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} - -Update the page table for the domain; a set of {\tt count} updates are -submitted for processing in a batch, with {\tt success\_count} being -updated to report the number of successful updates. - -Each element of {\tt req[]} contains a pointer (address) and value; -the least significant 2-bits of the pointer are used to distinguish -the type of update requested as follows: -\begin{description} - -\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or -page table entry to the associated value; Xen will check that the -update is safe, as described in Chapter~\ref{c:memory}. - -\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the - machine-to-physical table. The calling domain must own the machine - page in question (or be privileged). - -\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations. -The set of additional MMU operations is considerable, and includes -updating {\tt cr3} (or just re-installing it for a TLB flush), -flushing the cache, installing a new LDT, or pinning \& unpinning -page-table pages (to ensure their reference count doesn't drop to zero -which would require a revalidation of all entries). - -Further extended commands are used to deal with granting and -acquiring page ownership; see Section~\ref{s:idc}. - - -\end{description} - -More details on the precise format of all commands can be -found in {\tt xen/include/public/xen.h}. - - -\end{quote} - -Explicitly updating batches of page table entries is extremely -efficient, but can require a number of alterations to the guest -OS. Using the writable page table mode (Chapter~\ref{c:memory}) is -recommended for new OS ports. - -Regardless of which page table update mode is being used, however, -there are some occasions (notably handling a demand page fault) where -a guest OS will wish to modify exactly one PTE rather than a -batch. This is catered for by the following: - -\begin{quote} -\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long -val, \\ unsigned long flags)} - -Update the currently installed PTE for the page {\tt page\_nr} to -{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification -is safe before applying it. The {\tt flags} determine which kind -of TLB flush, if any, should follow the update. - -\end{quote} - -Finally, sufficiently privileged domains may occasionally wish to manipulate -the pages of others: -\begin{quote} - -\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr, -unsigned long val, unsigned long flags, uint16\_t domid)} - -Identical to {\tt update\_va\_mapping()} save that the pages being -mapped must belong to the domain {\tt domid}. - -\end{quote} - -This privileged operation is currently used by backend virtual device -drivers to safely map pages containing I/O data. - - - -\section{Segmentation Support} - -Xen allows guest OSes to install a custom GDT if they require it; -this is context switched transparently whenever a domain is -[de]scheduled. The following hypercall is effectively a -`safe' version of {\tt lgdt}: - -\begin{quote} -\hypercall{set\_gdt(unsigned long *frame\_list, int entries)} - -Install a global descriptor table for a domain; {\tt frame\_list} is -an array of up to 16 machine page frames within which the GDT resides, -with {\tt entries} being the actual number of descriptor-entry -slots. All page frames must be mapped read-only within the guest's -address space, and the table must be large enough to contain Xen's -reserved entries (see {\tt xen/include/public/arch-x86\_32.h}). - -\end{quote} - -Many guest OSes will also wish to install LDTs; this is achieved by -using {\tt mmu\_update()} with an extended command, passing the -linear address of the LDT base along with the number of entries. No -special safety checks are required; Xen needs to perform this task -simply since {\tt lldt} requires CPL 0. - - -Xen also allows guest operating systems to update just an -individual segment descriptor in the GDT or LDT: - -\begin{quote} -\hypercall{update\_descriptor(unsigned long ma, unsigned long word1, -unsigned long word2)} - -Update the GDT/LDT entry at machine address {\tt ma}; the new -8-byte descriptor is stored in {\tt word1} and {\tt word2}. -Xen performs a number of checks to ensure the descriptor is -valid. - -\end{quote} - -Guest OSes can use the above in place of context switching entire -LDTs (or the GDT) when the number of changing descriptors is small. - -\section{Context Switching} - -When a guest OS wishes to context switch between two processes, -it can use the page table and segmentation hypercalls described -above to perform the the bulk of the privileged work. In addition, -however, it will need to invoke Xen to switch the kernel (ring 1) -stack pointer: - -\begin{quote} -\hypercall{stack\_switch(unsigned long ss, unsigned long esp)} - -Request kernel stack switch from hypervisor; {\tt ss} is the new -stack segment, which {\tt esp} is the new stack pointer. - -\end{quote} - -A final useful hypercall for context switching allows ``lazy'' -save and restore of floating point state: - -\begin{quote} -\hypercall{fpu\_taskswitch(void)} - -This call instructs Xen to set the {\tt TS} bit in the {\tt cr0} -control register; this means that the next attempt to use floating -point will cause a trap which the guest OS can trap. Typically it will -then save/restore the FP state, and clear the {\tt TS} bit. -\end{quote} - -This is provided as an optimization only; guest OSes can also choose -to save and restore FP state on all context switches for simplicity. - - -\section{Physical Memory Management} - -As mentioned previously, each domain has a maximum and current -memory allocation. The maximum allocation, set at domain creation -time, cannot be modified. However a domain can choose to reduce -and subsequently grow its current allocation by using the -following call: - -\begin{quote} -\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list, - unsigned long nr\_extents, unsigned int extent\_order)} - -Increase or decrease current memory allocation (as determined by -the value of {\tt op}). Each invocation provides a list of -extents each of which is $2^s$ pages in size, -where $s$ is the value of {\tt extent\_order}. - -\end{quote} - -In addition to simply reducing or increasing the current memory -allocation via a `balloon driver', this call is also useful for -obtaining contiguous regions of machine memory when required (e.g. -for certain PCI devices, or if using superpages). - - -\section{Inter-Domain Communication} -\label{s:idc} - -Xen provides a simple asynchronous notification mechanism via -\emph{event channels}. Each domain has a set of end-points (or -\emph{ports}) which may be bound to an event source (e.g. a physical -IRQ, a virtual IRQ, or an port in another domain). When a pair of -end-points in two different domains are bound together, then a `send' -operation on one will cause an event to be received by the destination -domain. - -The control and use of event channels involves the following hypercall: - -\begin{quote} -\hypercall{event\_channel\_op(evtchn\_op\_t *op)} - -Inter-domain event-channel management; {\tt op} is a discriminated -union which allows the following 7 operations: - -\begin{description} - -\item[\it alloc\_unbound:] allocate a free (unbound) local - port and prepare for connection from a specified domain. -\item[\it bind\_virq:] bind a local port to a virtual -IRQ; any particular VIRQ can be bound to at most one port per domain. -\item[\it bind\_pirq:] bind a local port to a physical IRQ; -once more, a given pIRQ can be bound to at most one port per -domain. Furthermore the calling domain must be sufficiently -privileged. -\item[\it bind\_interdomain:] construct an interdomain event -channel; in general, the target domain must have previously allocated -an unbound port for this channel, although this can be bypassed by -privileged domains during domain setup. -\item[\it close:] close an interdomain event channel. -\item[\it send:] send an event to the remote end of a -interdomain event channel. -\item[\it status:] determine the current status of a local port. -\end{description} - -For more details see -{\tt xen/include/public/event\_channel.h}. - -\end{quote} - -Event channels are the fundamental communication primitive between -Xen domains and seamlessly support SMP. However they provide little -bandwidth for communication {\sl per se}, and hence are typically -married with a piece of shared memory to produce effective and -high-performance inter-domain communication. - -Safe sharing of memory pages between guest OSes is carried out by -granting access on a per page basis to individual domains. This is -achieved by using the {\tt grant\_table\_op()} hypercall. - -\begin{quote} -\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)} - -Grant or remove access to a particular page to a particular domain. - -\end{quote} - -This is not currently widely in use by guest operating systems, but -we intend to integrate support more fully in the near future. - -\section{PCI Configuration} - -Domains with physical device access (i.e.\ driver domains) receive -limited access to certain PCI devices (bus address space and -interrupts). However many guest operating systems attempt to -determine the PCI configuration by directly access the PCI BIOS, -which cannot be allowed for safety. - -Instead, Xen provides the following hypercall: - -\begin{quote} -\hypercall{physdev\_op(void *physdev\_op)} - -Perform a PCI configuration option; depending on the value -of {\tt physdev\_op} this can be a PCI config read, a PCI config -write, or a small number of other queries. - -\end{quote} - - -For examples of using {\tt physdev\_op()}, see the -Xen-specific PCI code in the linux sparse tree. - -\section{Administrative Operations} -\label{s:dom0ops} - -A large number of control operations are available to a sufficiently -privileged domain (typically domain 0). These allow the creation and -management of new domains, for example. A complete list is given -below: for more details on any or all of these, please see -{\tt xen/include/public/dom0\_ops.h} - - -\begin{quote} -\hypercall{dom0\_op(dom0\_op\_t *op)} - -Administrative domain operations for domain management. The options are: - -\begin{description} -\item [\it DOM0\_CREATEDOMAIN:] create a new domain - -\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run -queue. - -\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable - once again. - -\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated -with a domain - -\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain - -\item [\it DOM0\_SCHEDCTL:] - -\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain - -\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain - -\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain - -\item [\it DOM0\_GETPAGEFRAMEINFO:] - -\item [\it DOM0\_GETPAGEFRAMEINFO2:] - -\item [\it DOM0\_IOPL:] set I/O privilege level - -\item [\it DOM0\_MSR:] read or write model specific registers - -\item [\it DOM0\_DEBUG:] interactively invoke the debugger - -\item [\it DOM0\_SETTIME:] set system time - -\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring - -\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU - -\item [\it DOM0\_GETTBUFS:] get information about the size and location of - the trace buffers (only on trace-buffer enabled builds) - -\item [\it DOM0\_PHYSINFO:] get information about the host machine - -\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions - -\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler - -\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes - -\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain - -\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain - -\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options -\end{description} -\end{quote} - -Most of the above are best understood by looking at the code -implementing them (in {\tt xen/common/dom0\_ops.c}) and in -the user-space tools that use them (mostly in {\tt tools/libxc}). - -\section{Debugging Hypercalls} - -A few additional hypercalls are mainly useful for debugging: - -\begin{quote} -\hypercall{console\_io(int cmd, int count, char *str)} - -Use Xen to interact with the console; operations are: - -{\it CONSOLEIO\_write}: Output count characters from buffer str. - -{\it CONSOLEIO\_read}: Input at most count characters into buffer str. -\end{quote} - -A pair of hypercalls allows access to the underlying debug registers: -\begin{quote} -\hypercall{set\_debugreg(int reg, unsigned long value)} - -Set debug register {\tt reg} to {\tt value} - -\hypercall{get\_debugreg(int reg)} - -Return the contents of the debug register {\tt reg} -\end{quote} - -And finally: -\begin{quote} -\hypercall{xen\_version(int cmd)} - -Request Xen version number. -\end{quote} - -This is useful to ensure that user-space tools are in sync -with the underlying hypervisor. - -\section{Deprecated Hypercalls} - -Xen is under constant development and refinement; as such there -are plans to improve the way in which various pieces of functionality -are exposed to guest OSes. - -\begin{quote} -\hypercall{vm\_assist(unsigned int cmd, unsigned int type)} - -Toggle various memory management modes (in particular wrritable page -tables and superpage support). - -\end{quote} - -This is likely to be replaced with mode values in the shared -information page since this is more resilient for resumption -after migration or checkpoint. - - - - - - +%% chapter hypercalls moved to hypercalls.tex +\include{src/interface/hypercalls} %% @@ -1173,279 +112,9 @@ %% new scheduler... not clear how many of them there are... %% -\begin{comment} - -\chapter{Scheduling API} - -The scheduling API is used by both the schedulers described above and should -also be used by any new schedulers. It provides a generic interface and also -implements much of the ``boilerplate'' code. - -Schedulers conforming to this API are described by the following -structure: - -\begin{verbatim} -struct scheduler -{ - char *name; /* full name for this scheduler */ - char *opt_name; /* option name for this scheduler */ - unsigned int sched_id; /* ID for this scheduler */ - - int (*init_scheduler) (); - int (*alloc_task) (struct task_struct *); - void (*add_task) (struct task_struct *); - void (*free_task) (struct task_struct *); - void (*rem_task) (struct task_struct *); - void (*wake_up) (struct task_struct *); - void (*do_block) (struct task_struct *); - task_slice_t (*do_schedule) (s_time_t); - int (*control) (struct sched_ctl_cmd *); - int (*adjdom) (struct task_struct *, - struct sched_adjdom_cmd *); - s32 (*reschedule) (struct task_struct *); - void (*dump_settings) (void); - void (*dump_cpu_state) (int); - void (*dump_runq_el) (struct task_struct *); -}; -\end{verbatim} - -The only method that {\em must} be implemented is -{\tt do\_schedule()}. However, if there is not some implementation for the -{\tt wake\_up()} method then waking tasks will not get put on the runqueue! - -The fields of the above structure are described in more detail below. - -\subsubsection{name} - -The name field should point to a descriptive ASCII string. - -\subsubsection{opt\_name} - -This field is the value of the {\tt sched=} boot-time option that will select -this scheduler. - -\subsubsection{sched\_id} - -This is an integer that uniquely identifies this scheduler. There should be a -macro corrsponding to this scheduler ID in {\tt <xen/sched-if.h>}. - -\subsubsection{init\_scheduler} - -\paragraph*{Purpose} - -This is a function for performing any scheduler-specific initialisation. For -instance, it might allocate memory for per-CPU scheduler data and initialise it -appropriately. - -\paragraph*{Call environment} - -This function is called after the initialisation performed by the generic -layer. The function is called exactly once, for the scheduler that has been -selected. - -\paragraph*{Return values} - -This should return negative on failure --- this will cause an -immediate panic and the system will fail to boot. - -\subsubsection{alloc\_task} - -\paragraph*{Purpose} -Called when a {\tt task\_struct} is allocated by the generic scheduler -layer. A particular scheduler implementation may use this method to -allocate per-task data for this task. It may use the {\tt -sched\_priv} pointer in the {\tt task\_struct} to point to this data. - -\paragraph*{Call environment} -The generic layer guarantees that the {\tt sched\_priv} field will -remain intact from the time this method is called until the task is -deallocated (so long as the scheduler implementation does not change -it explicitly!). - -\paragraph*{Return values} -Negative on failure. - -\subsubsection{add\_task} - -\paragraph*{Purpose} - -Called when a task is initially added by the generic layer. - -\paragraph*{Call environment} - -The fields in the {\tt task\_struct} are now filled out and available for use. -Schedulers should implement appropriate initialisation of any per-task private -information in this method. - -\subsubsection{free\_task} - -\paragraph*{Purpose} - -Schedulers should free the space used by any associated private data -structures. - -\paragraph*{Call environment} - -This is called when a {\tt task\_struct} is about to be deallocated. -The generic layer will have done generic task removal operations and -(if implemented) called the scheduler's {\tt rem\_task} method before -this method is called. - -\subsubsection{rem\_task} - -\paragraph*{Purpose} - -This is called when a task is being removed from scheduling (but is -not yet being freed). - -\subsubsection{wake\_up} - -\paragraph*{Purpose} - -Called when a task is woken up, this method should put the task on the runqueue -(or do the scheduler-specific equivalent action). - -\paragraph*{Call environment} - -The task is already set to state RUNNING. - -\subsubsection{do\_block} - -\paragraph*{Purpose} - -This function is called when a task is blocked. This function should -not remove the task from the runqueue. - -\paragraph*{Call environment} - -The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to -TASK\_INTERRUPTIBLE on entry to this method. A call to the {\tt - do\_schedule} method will be made after this method returns, in -order to select the next task to run. - -\subsubsection{do\_schedule} - -This method must be implemented. - -\paragraph*{Purpose} - -The method is called each time a new task must be chosen for scheduling on the -current CPU. The current time as passed as the single argument (the current -task can be found using the {\tt current} macro). - -This method should select the next task to run on this CPU and set it's minimum -time to run as well as returning the data described below. - -This method should also take the appropriate action if the previous -task has blocked, e.g. removing it from the runqueue. - -\paragraph*{Call environment} - -The other fields in the {\tt task\_struct} are updated by the generic layer, -which also performs all Xen-specific tasks and performs the actual task switch -(unless the previous task has been chosen again). - -This method is called with the {\tt schedule\_lock} held for the current CPU -and local interrupts disabled. - -\paragraph*{Return values} - -Must return a {\tt struct task\_slice} describing what task to run and how long -for (at maximum). - -\subsubsection{control} - -\paragraph*{Purpose} - -This method is called for global scheduler control operations. It takes a -pointer to a {\tt struct sched\_ctl\_cmd}, which it should either -source data from or populate with data, depending on the value of the -{\tt direction} field. - -\paragraph*{Call environment} - -The generic layer guarantees that when this method is called, the -caller selected the correct scheduler ID, hence the scheduler's -implementation does not need to sanity-check these parts of the call. - -\paragraph*{Return values} - -This function should return the value to be passed back to user space, hence it -should either be 0 or an appropriate errno value. - -\subsubsection{sched\_adjdom} - -\paragraph*{Purpose} - -This method is called to adjust the scheduling parameters of a particular -domain, or to query their current values. The function should check -the {\tt direction} field of the {\tt sched\_adjdom\_cmd} it receives in -order to determine which of these operations is being performed. - -\paragraph*{Call environment} - -The generic layer guarantees that the caller has specified the correct -control interface version and scheduler ID and that the supplied {\tt -task\_struct} will not be deallocated during the call (hence it is not -necessary to {\tt get\_task\_struct}). - -\paragraph*{Return values} - -This function should return the value to be passed back to user space, hence it -should either be 0 or an appropriate errno value. - -\subsubsection{reschedule} - -\paragraph*{Purpose} - -This method is called to determine if a reschedule is required as a result of a -particular task. - -\paragraph*{Call environment} -The generic layer will cause a reschedule if the current domain is the idle -task or it has exceeded its minimum time slice before a reschedule. The -generic layer guarantees that the task passed is not currently running but is -on the runqueue. - -\paragraph*{Return values} - -Should return a mask of CPUs to cause a reschedule on. - -\subsubsection{dump\_settings} - -\paragraph*{Purpose} - -If implemented, this should dump any private global settings for this -scheduler to the console. - -\paragraph*{Call environment} - -This function is called with interrupts enabled. - -\subsubsection{dump\_cpu\_state} - -\paragraph*{Purpose} - -This method should dump any private settings for the specified CPU. - -\paragraph*{Call environment} - -This function is called with interrupts disabled and the {\tt schedule\_lock} -for the specified CPU held. - -\subsubsection{dump\_runq\_el} - -\paragraph*{Purpose} - -This method should dump any private settings for the specified task. - -\paragraph*{Call environment} - -This function is called with interrupts disabled and the {\tt schedule\_lock} -for the task's CPU held. - -\end{comment} - +%% \include{src/interface/scheduling} +%% scheduling information moved to scheduling.tex +%% still commented out @@ -1457,74 +126,9 @@ %% (and/or kip's stuff?) and write about that instead? %% -\begin{comment} - -\chapter{Debugging} - -Xen provides tools for debugging both Xen and guest OSes. Currently, the -Pervasive Debugger provides a GDB stub, which provides facilities for symbolic -debugging of Xen itself and of OS kernels running on top of Xen. The Trace -Buffer provides a lightweight means to log data about Xen's internal state and -behaviour at runtime, for later analysis. - -\section{Pervasive Debugger} - -Information on using the pervasive debugger is available in pdb.txt. - - -\section{Trace Buffer} - -The trace buffer provides a means to observe Xen's operation from domain 0. -Trace events, inserted at key points in Xen's code, record data that can be -read by the {\tt xentrace} tool. Recording these events has a low overhead -and hence the trace buffer may be useful for debugging timing-sensitive -behaviours. - -\subsection{Internal API} - -To use the trace buffer functionality from within Xen, you must {\tt \#include -<xen/trace.h>}, which contains definitions related to the trace buffer. Trace -events are inserted into the buffer using the {\tt TRACE\_xD} ({\tt x} = 0, 1, -2, 3, 4 or 5) macros. These all take an event number, plus {\tt x} additional -(32-bit) data as their arguments. For trace buffer-enabled builds of Xen these -will insert the event ID and data into the trace buffer, along with the current -value of the CPU cycle-counter. For builds without the trace buffer enabled, -the macros expand to no-ops and thus can be left in place without incurring -overheads. - -\subsection{Trace-enabled builds} - -By default, the trace buffer is enabled only in debug builds (i.e. {\tt NDEBUG} -is not defined). It can be enabled separately by defining {\tt TRACE\_BUFFER}, -either in {\tt <xen/config.h>} or on the gcc command line. - -The size (in pages) of the per-CPU trace buffers can be specified using the -{\tt tbuf\_size=n } boot parameter to Xen. If the size is set to 0, the trace -buffers will be disabled. - -\subsection{Dumping trace data} - -When running a trace buffer build of Xen, trace data are written continuously -into the buffer data areas, with newer data overwriting older data. This data -can be captured using the {\tt xentrace} program in domain 0. - -The {\tt xentrace} tool uses {\tt /dev/mem} in domain 0 to map the trace -buffers into its address space. It then periodically polls all the buffers for -new data, dumping out any new records from each buffer in turn. As a result, -for machines with multiple (logical) CPUs, the trace buffer output will not be -in overall chronological order. - -The output from {\tt xentrace} can be post-processed using {\tt -xentrace\_cpusplit} (used to split trace data out into per-cpu log files) and -{\tt xentrace\_format} (used to pretty-print trace data). For the predefined -trace points, there is an example format file in {\tt tools/xentrace/formats }. - -For more information, see the manual pages for {\tt xentrace}, {\tt -xentrace\_format} and {\tt xentrace\_cpusplit}. - -\end{comment} - - +%% \include{src/interface/debugging} +%% debugging information moved to debugging.tex +%% still commented out \end{document} diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user.tex --- a/docs/src/user.tex Tue Sep 20 09:43:29 2005 +++ b/docs/src/user.tex Tue Sep 20 09:43:46 2005 @@ -59,1803 +59,36 @@ \renewcommand{\floatpagefraction}{.8} \setstretch{1.1} + \part{Introduction and Tutorial} -\chapter{Introduction} - -Xen is a {\em paravirtualising} virtual machine monitor (VMM), or -`hypervisor', for the x86 processor architecture. Xen can securely -execute multiple virtual machines on a single physical system with -close-to-native performance. The virtual machine technology -facilitates enterprise-grade functionality, including: - -\begin{itemize} -\item Virtual machines with performance close to native - hardware. -\item Live migration of running virtual machines between physical hosts. -\item Excellent hardware support (supports most Linux device drivers). -\item Sandboxed, restartable device drivers. -\end{itemize} - -Paravirtualisation permits very high performance virtualisation, -even on architectures like x86 that are traditionally -very hard to virtualise. -The drawback of this approach is that it requires operating systems to -be {\em ported} to run on Xen. Porting an OS to run on Xen is similar -to supporting a new hardware platform, however the process -is simplified because the paravirtual machine architecture is very -similar to the underlying native hardware. Even though operating system -kernels must explicitly support Xen, a key feature is that user space -applications and libraries {\em do not} require modification. - -Xen support is available for increasingly many operating systems: -right now, Linux 2.4, Linux 2.6 and NetBSD are available for Xen 2.0. -A FreeBSD port is undergoing testing and will be incorporated into the -release soon. Other OS ports, including Plan 9, are in progress. We -hope that that arch-xen patches will be incorporated into the -mainstream releases of these operating systems in due course (as has -already happened for NetBSD). - -Possible usage scenarios for Xen include: -\begin{description} -\item [Kernel development.] Test and debug kernel modifications in a - sandboxed virtual machine --- no need for a separate test - machine. -\item [Multiple OS configurations.] Run multiple operating systems - simultaneously, for instance for compatibility or QA purposes. -\item [Server consolidation.] Move multiple servers onto a single - physical host with performance and fault isolation provided at - virtual machine boundaries. -\item [Cluster computing.] Management at VM granularity provides more - flexibility than separately managing each physical host, but - better control and isolation than single-system image solutions, - particularly by using live migration for load balancing. -\item [Hardware support for custom OSes.] Allow development of new OSes - while benefiting from the wide-ranging hardware support of - existing OSes such as Linux. -\end{description} - -\section{Structure of a Xen-Based System} - -A Xen system has multiple layers, the lowest and most privileged of -which is Xen itself. -Xen in turn may host multiple {\em guest} operating systems, each of -which is executed within a secure virtual machine (in Xen terminology, -a {\em domain}). Domains are scheduled by Xen to make effective use of -the available physical CPUs. Each guest OS manages its own -applications, which includes responsibility for scheduling each -application within the time allotted to the VM by Xen. - -The first domain, {\em domain 0}, is created automatically when the -system boots and has special management privileges. Domain 0 builds -other domains and manages their virtual devices. It also performs -administrative tasks such as suspending, resuming and migrating other -virtual machines. - -Within domain 0, a process called \emph{xend} runs to manage the system. -\Xend is responsible for managing virtual machines and providing access -to their consoles. Commands are issued to \xend over an HTTP -interface, either from a command-line tool or from a web browser. - -\section{Hardware Support} - -Xen currently runs only on the x86 architecture, requiring a `P6' or -newer processor (e.g. Pentium Pro, Celeron, Pentium II, Pentium III, -Pentium IV, Xeon, AMD Athlon, AMD Duron). Multiprocessor machines are -supported, and we also have basic support for HyperThreading (SMT), -although this remains a topic for ongoing research. A port -specifically for x86/64 is in progress, although Xen already runs on -such systems in 32-bit legacy mode. In addition a port to the IA64 -architecture is approaching completion. We hope to add other -architectures such as PPC and ARM in due course. - - -Xen can currently use up to 4GB of memory. It is possible for x86 -machines to address up to 64GB of physical memory but there are no -current plans to support these systems: The x86/64 port is the -planned route to supporting larger memory sizes. - -Xen offloads most of the hardware support issues to the guest OS -running in Domain~0. Xen itself contains only the code required to -detect and start secondary processors, set up interrupt routing, and -perform PCI bus enumeration. Device drivers run within a privileged -guest OS rather than within Xen itself. This approach provides -compatibility with the majority of device hardware supported by Linux. -The default XenLinux build contains support for relatively modern -server-class network and disk hardware, but you can add support for -other hardware by configuring your XenLinux kernel in the normal way. - -\section{History} - -Xen was originally developed by the Systems Research Group at the -University of Cambridge Computer Laboratory as part of the XenoServers -project, funded by the UK-EPSRC. -XenoServers aim to provide a `public infrastructure for -global distributed computing', and Xen plays a key part in that, -allowing us to efficiently partition a single machine to enable -multiple independent clients to run their operating systems and -applications in an environment providing protection, resource -isolation and accounting. The project web page contains further -information along with pointers to papers and technical reports: -\path{http://www.cl.cam.ac.uk/xeno} - -Xen has since grown into a fully-fledged project in its own right, -enabling us to investigate interesting research issues regarding the -best techniques for virtualising resources such as the CPU, memory, -disk and network. The project has been bolstered by support from -Intel Research Cambridge, and HP Labs, who are now working closely -with us. - -Xen was first described in a paper presented at SOSP in -2003\footnote{\tt -http://www.cl.cam.ac.uk/netos/papers/2003-xensosp.pdf}, and the first -public release (1.0) was made that October. Since then, Xen has -significantly matured and is now used in production scenarios on -many sites. - -Xen 2.0 features greatly enhanced hardware support, configuration -flexibility, usability and a larger complement of supported operating -systems. This latest release takes Xen a step closer to becoming the -definitive open source solution for virtualisation. - -\chapter{Installation} - -The Xen distribution includes three main components: Xen itself, ports -of Linux 2.4 and 2.6 and NetBSD to run on Xen, and the user-space -tools required to manage a Xen-based system. This chapter describes -how to install the Xen 2.0 distribution from source. Alternatively, -there may be pre-built packages available as part of your operating -system distribution. - -\section{Prerequisites} -\label{sec:prerequisites} - -The following is a full list of prerequisites. Items marked `$\dag$' -are required by the \xend control tools, and hence required if you -want to run more than one virtual machine; items marked `$*$' are only -required if you wish to build from source. -\begin{itemize} -\item A working Linux distribution using the GRUB bootloader and -running on a P6-class (or newer) CPU. -\item [$\dag$] The \path{iproute2} package. -\item [$\dag$] The Linux bridge-utils\footnote{Available from -{\tt http://bridge.sourceforge.net}} (e.g., \path{/sbin/brctl}) -\item [$\dag$] An installation of Twisted v1.3 or -above\footnote{Available from {\tt -http://www.twistedmatrix.com}}. There may be a binary package -available for your distribution; alternatively it can be installed by -running `{\sl make install-twisted}' in the root of the Xen source -tree. -\item [$*$] Build tools (gcc v3.2.x or v3.3.x, binutils, GNU make). -\item [$*$] Development installation of libcurl (e.g., libcurl-devel) -\item [$*$] Development installation of zlib (e.g., zlib-dev). -\item [$*$] Development installation of Python v2.2 or later (e.g., python-dev). -\item [$*$] \LaTeX and transfig are required to build the documentation. -\end{itemize} - -Once you have satisfied the relevant prerequisites, you can -now install either a binary or source distribution of Xen. - -\section{Installing from Binary Tarball} - -Pre-built tarballs are available for download from the Xen -download page -\begin{quote} -{\tt http://xen.sf.net} -\end{quote} - -Once you've downloaded the tarball, simply unpack and install: -\begin{verbatim} -# tar zxvf xen-2.0-install.tgz -# cd xen-2.0-install -# sh ./install.sh -\end{verbatim} - -Once you've installed the binaries you need to configure -your system as described in Section~\ref{s:configure}. - -\section{Installing from Source} - -This section describes how to obtain, build, and install -Xen from source. - -\subsection{Obtaining the Source} - -The Xen source tree is available as either a compressed source tar -ball or as a clone of our master BitKeeper repository. - -\begin{description} -\item[Obtaining the Source Tarball]\mbox{} \\ -Stable versions (and daily snapshots) of the Xen source tree are -available as compressed tarballs from the Xen download page -\begin{quote} -{\tt http://xen.sf.net} -\end{quote} - -\item[Using BitKeeper]\mbox{} \\ -If you wish to install Xen from a clone of our latest BitKeeper -repository then you will need to install the BitKeeper tools. -Download instructions for BitKeeper can be obtained by filling out the -form at: - -\begin{quote} -{\tt http://www.bitmover.com/cgi-bin/download.cgi} -\end{quote} -The public master BK repository for the 2.0 release lives at: -\begin{quote} -{\tt bk://xen.bkbits.net/xen-2.0.bk} -\end{quote} -You can use BitKeeper to -download it and keep it updated with the latest features and fixes. - -Change to the directory in which you want to put the source code, then -run: -\begin{verbatim} -# bk clone bk://xen.bkbits.net/xen-2.0.bk -\end{verbatim} - -Under your current directory, a new directory named \path{xen-2.0.bk} -has been created, which contains all the source code for Xen, the OS -ports, and the control tools. You can update your repository with the -latest changes at any time by running: -\begin{verbatim} -# cd xen-2.0.bk # to change into the local repository -# bk pull # to update the repository -\end{verbatim} -\end{description} - -%\section{The distribution} -% -%The Xen source code repository is structured as follows: -% -%\begin{description} -%\item[\path{tools/}] Xen node controller daemon (Xend), command line tools, -% control libraries -%\item[\path{xen/}] The Xen VMM. -%\item[\path{linux-*-xen-sparse/}] Xen support for Linux. -%\item[\path{linux-*-patches/}] Experimental patches for Linux. -%\item[\path{netbsd-*-xen-sparse/}] Xen support for NetBSD. -%\item[\path{docs/}] Various documentation files for users and developers. -%\item[\path{extras/}] Bonus extras. -%\end{description} - -\subsection{Building from Source} - -The top-level Xen Makefile includes a target `world' that will do the -following: - -\begin{itemize} -\item Build Xen -\item Build the control tools, including \xend -\item Download (if necessary) and unpack the Linux 2.6 source code, - and patch it for use with Xen -\item Build a Linux kernel to use in domain 0 and a smaller - unprivileged kernel, which can optionally be used for - unprivileged virtual machines. -\end{itemize} - - -After the build has completed you should have a top-level -directory called \path{dist/} in which all resulting targets -will be placed; of particular interest are the two kernels -XenLinux kernel images, one with a `-xen0' extension -which contains hardware device drivers and drivers for Xen's virtual -devices, and one with a `-xenU' extension that just contains the -virtual ones. These are found in \path{dist/install/boot/} along -with the image for Xen itself and the configuration files used -during the build. - -The NetBSD port can be built using: -\begin{quote} -\begin{verbatim} -# make netbsd20 -\end{verbatim} -\end{quote} -NetBSD port is built using a snapshot of the netbsd-2-0 cvs branch. -The snapshot is downloaded as part of the build process, if it is not -yet present in the \path{NETBSD\_SRC\_PATH} search path. The build -process also downloads a toolchain which includes all the tools -necessary to build the NetBSD kernel under Linux. - -To customize further the set of kernels built you need to edit -the top-level Makefile. Look for the line: - -\begin{quote} -\begin{verbatim} -KERNELS ?= mk.linux-2.6-xen0 mk.linux-2.6-xenU -\end{verbatim} -\end{quote} - -You can edit this line to include any set of operating system kernels -which have configurations in the top-level \path{buildconfigs/} -directory, for example \path{mk.linux-2.4-xenU} to build a Linux 2.4 -kernel containing only virtual device drivers. - -%% Inspect the Makefile if you want to see what goes on during a build. -%% Building Xen and the tools is straightforward, but XenLinux is more -%% complicated. The makefile needs a `pristine' Linux kernel tree to which -%% it will then add the Xen architecture files. You can tell the -%% makefile the location of the appropriate Linux compressed tar file by -%% setting the LINUX\_SRC environment variable, e.g. \\ -%% \verb!# LINUX_SRC=/tmp/linux-2.6.11.tar.bz2 make world! \\ or by -%% placing the tar file somewhere in the search path of {\tt -%% LINUX\_SRC\_PATH} which defaults to `{\tt .:..}'. If the makefile -%% can't find a suitable kernel tar file it attempts to download it from -%% kernel.org (this won't work if you're behind a firewall). - -%% After untaring the pristine kernel tree, the makefile uses the {\tt -%% mkbuildtree} script to add the Xen patches to the kernel. - - -%% The procedure is similar to build the Linux 2.4 port: \\ -%% \verb!# LINUX_SRC=/path/to/linux2.4/source make linux24! - - -%% \framebox{\parbox{5in}{ -%% {\bf Distro specific:} \\ -%% {\it Gentoo} --- if not using udev (most installations, currently), you'll need -%% to enable devfs and devfs mount at boot time in the xen0 config. -%% }} - -\subsection{Custom XenLinux Builds} - -% If you have an SMP machine you may wish to give the {\tt '-j4'} -% argument to make to get a parallel build. - -If you wish to build a customized XenLinux kernel (e.g. to support -additional devices or enable distribution-required features), you can -use the standard Linux configuration mechanisms, specifying that the -architecture being built for is \path{xen}, e.g: -\begin{quote} -\begin{verbatim} -# cd linux-2.6.11-xen0 -# make ARCH=xen xconfig -# cd .. -# make -\end{verbatim} -\end{quote} - -You can also copy an existing Linux configuration (\path{.config}) -into \path{linux-2.6.11-xen0} and execute: -\begin{quote} -\begin{verbatim} -# make ARCH=xen oldconfig -\end{verbatim} -\end{quote} - -You may be prompted with some Xen-specific options; we -advise accepting the defaults for these options. - -Note that the only difference between the two types of Linux kernel -that are built is the configuration file used for each. The "U" -suffixed (unprivileged) versions don't contain any of the physical -hardware device drivers, leading to a 30\% reduction in size; hence -you may prefer these for your non-privileged domains. The `0' -suffixed privileged versions can be used to boot the system, as well -as in driver domains and unprivileged domains. - - -\subsection{Installing the Binaries} - - -The files produced by the build process are stored under the -\path{dist/install/} directory. To install them in their default -locations, do: -\begin{quote} -\begin{verbatim} -# make install -\end{verbatim} -\end{quote} - - -Alternatively, users with special installation requirements may wish -to install them manually by copying the files to their appropriate -destinations. - -%% Files in \path{install/boot/} include: -%% \begin{itemize} -%% \item \path{install/boot/xen-2.0.gz} Link to the Xen 'kernel' -%% \item \path{install/boot/vmlinuz-2.6-xen0} Link to domain 0 XenLinux kernel -%% \item \path{install/boot/vmlinuz-2.6-xenU} Link to unprivileged XenLinux kernel -%% \end{itemize} - -The \path{dist/install/boot} directory will also contain the config files -used for building the XenLinux kernels, and also versions of Xen and -XenLinux kernels that contain debug symbols (\path{xen-syms-2.0.6} and -\path{vmlinux-syms-2.6.11.11-xen0}) which are essential for interpreting crash -dumps. Retain these files as the developers may wish to see them if -you post on the mailing list. - - - - - -\section{Configuration} -\label{s:configure} -Once you have built and installed the Xen distribution, it is -simple to prepare the machine for booting and running Xen. - -\subsection{GRUB Configuration} - -An entry should be added to \path{grub.conf} (often found under -\path{/boot/} or \path{/boot/grub/}) to allow Xen / XenLinux to boot. -This file is sometimes called \path{menu.lst}, depending on your -distribution. The entry should look something like the following: - -{\small -\begin{verbatim} -title Xen 2.0 / XenLinux 2.6 - kernel /boot/xen-2.0.gz dom0_mem=131072 - module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro console=tty0 -\end{verbatim} -} - -The kernel line tells GRUB where to find Xen itself and what boot -parameters should be passed to it (in this case, setting domain 0's -memory allocation in kilobytes and the settings for the serial port). For more -details on the various Xen boot parameters see Section~\ref{s:xboot}. - -The module line of the configuration describes the location of the -XenLinux kernel that Xen should start and the parameters that should -be passed to it (these are standard Linux parameters, identifying the -root device and specifying it be initially mounted read only and -instructing that console output be sent to the screen). Some -distributions such as SuSE do not require the \path{ro} parameter. - -%% \framebox{\parbox{5in}{ -%% {\bf Distro specific:} \\ -%% {\it SuSE} --- Omit the {\tt ro} option from the XenLinux kernel -%% command line, since the partition won't be remounted rw during boot. -%% }} - - -If you want to use an initrd, just add another \path{module} line to -the configuration, as usual: -{\small -\begin{verbatim} - module /boot/my_initrd.gz -\end{verbatim} -} - -As always when installing a new kernel, it is recommended that you do -not delete existing menu options from \path{menu.lst} --- you may want -to boot your old Linux kernel in future, particularly if you -have problems. - - -\subsection{Serial Console (optional)} - -%% kernel /boot/xen-2.0.gz dom0_mem=131072 com1=115200,8n1 -%% module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro - - -In order to configure Xen serial console output, it is necessary to add -an boot option to your GRUB config; e.g. replace the above kernel line -with: -\begin{quote} -{\small -\begin{verbatim} - kernel /boot/xen.gz dom0_mem=131072 com1=115200,8n1 -\end{verbatim}} -\end{quote} - -This configures Xen to output on COM1 at 115,200 baud, 8 data bits, -1 stop bit and no parity. Modify these parameters for your set up. - -One can also configure XenLinux to share the serial console; to -achieve this append ``\path{console=ttyS0}'' to your -module line. - - -If you wish to be able to log in over the XenLinux serial console it -is necessary to add a line into \path{/etc/inittab}, just as per -regular Linux. Simply add the line: -\begin{quote} -{\small -{\tt c:2345:respawn:/sbin/mingetty ttyS0} -} -\end{quote} - -and you should be able to log in. Note that to successfully log in -as root over the serial line will require adding \path{ttyS0} to -\path{/etc/securetty} in most modern distributions. - -\subsection{TLS Libraries} - -Users of the XenLinux 2.6 kernel should disable Thread Local Storage -(e.g.\ by doing a \path{mv /lib/tls /lib/tls.disabled}) before -attempting to run with a XenLinux kernel\footnote{If you boot without first -disabling TLS, you will get a warning message during the boot -process. In this case, simply perform the rename after the machine is -up and then run \texttt{/sbin/ldconfig} to make it take effect.}. You can -always reenable it by restoring the directory to its original location -(i.e.\ \path{mv /lib/tls.disabled /lib/tls}). - -The reason for this is that the current TLS implementation uses -segmentation in a way that is not permissible under Xen. If TLS is -not disabled, an emulation mode is used within Xen which reduces -performance substantially. - -We hope that this issue can be resolved by working with Linux -distribution vendors to implement a minor backward-compatible change -to the TLS library. - -\section{Booting Xen} - -It should now be possible to restart the system and use Xen. Reboot -as usual but choose the new Xen option when the Grub screen appears. - -What follows should look much like a conventional Linux boot. The -first portion of the output comes from Xen itself, supplying low level -information about itself and the machine it is running on. The -following portion of the output comes from XenLinux. - -You may see some errors during the XenLinux boot. These are not -necessarily anything to worry about --- they may result from kernel -configuration differences between your XenLinux kernel and the one you -usually use. - -When the boot completes, you should be able to log into your system as -usual. If you are unable to log in to your system running Xen, you -should still be able to reboot with your normal Linux kernel. - - -\chapter{Starting Additional Domains} - -The first step in creating a new domain is to prepare a root -filesystem for it to boot off. Typically, this might be stored in a -normal partition, an LVM or other volume manager partition, a disk -file or on an NFS server. A simple way to do this is simply to boot -from your standard OS install CD and install the distribution into -another partition on your hard drive. - -To start the \xend control daemon, type -\begin{quote} -\verb!# xend start! -\end{quote} -If you -wish the daemon to start automatically, see the instructions in -Section~\ref{s:xend}. Once the daemon is running, you can use the -\path{xm} tool to monitor and maintain the domains running on your -system. This chapter provides only a brief tutorial: we provide full -details of the \path{xm} tool in the next chapter. - -%\section{From the web interface} -% -%Boot the Xen machine and start Xensv (see Chapter~\ref{cha:xensv} for -%more details) using the command: \\ -%\verb_# xensv start_ \\ -%This will also start Xend (see Chapter~\ref{cha:xend} for more information). -% -%The domain management interface will then be available at {\tt -%http://your\_machine:8080/}. This provides a user friendly wizard for -%starting domains and functions for managing running domains. -% -%\section{From the command line} - - -\section{Creating a Domain Configuration File} - -Before you can start an additional domain, you must create a -configuration file. We provide two example files which you -can use as a starting point: -\begin{itemize} - \item \path{/etc/xen/xmexample1} is a simple template configuration file - for describing a single VM. - - \item \path{/etc/xen/xmexample2} file is a template description that - is intended to be reused for multiple virtual machines. Setting - the value of the \path{vmid} variable on the \path{xm} command line - fills in parts of this template. -\end{itemize} - -Copy one of these files and edit it as appropriate. -Typical values you may wish to edit include: - -\begin{quote} -\begin{description} -\item[kernel] Set this to the path of the kernel you compiled for use - with Xen (e.g.\ \path{kernel = '/boot/vmlinuz-2.6-xenU'}) -\item[memory] Set this to the size of the domain's memory in -megabytes (e.g.\ \path{memory = 64}) -\item[disk] Set the first entry in this list to calculate the offset -of the domain's root partition, based on the domain ID. Set the -second to the location of \path{/usr} if you are sharing it between -domains (e.g.\ \path{disk = ['phy:your\_hard\_drive\%d,sda1,w' \% -(base\_partition\_number + vmid), 'phy:your\_usr\_partition,sda6,r' ]} -\item[dhcp] Uncomment the dhcp variable, so that the domain will -receive its IP address from a DHCP server (e.g.\ \path{dhcp='dhcp'}) -\end{description} -\end{quote} - -You may also want to edit the {\bf vif} variable in order to choose -the MAC address of the virtual ethernet interface yourself. For -example: -\begin{quote} -\verb_vif = ['mac=00:06:AA:F6:BB:B3']_ -\end{quote} -If you do not set this variable, \xend will automatically generate a -random MAC address from an unused range. - - -\section{Booting the Domain} - -The \path{xm} tool provides a variety of commands for managing domains. -Use the \path{create} command to start new domains. Assuming you've -created a configuration file \path{myvmconf} based around -\path{/etc/xen/xmexample2}, to start a domain with virtual -machine ID~1 you should type: - -\begin{quote} -\begin{verbatim} -# xm create -c myvmconf vmid=1 -\end{verbatim} -\end{quote} - - -The \path{-c} switch causes \path{xm} to turn into the domain's -console after creation. The \path{vmid=1} sets the \path{vmid} -variable used in the \path{myvmconf} file. - - -You should see the console boot messages from the new domain -appearing in the terminal in which you typed the command, -culminating in a login prompt. - - -\section{Example: ttylinux} - -Ttylinux is a very small Linux distribution, designed to require very -few resources. We will use it as a concrete example of how to start a -Xen domain. Most users will probably want to install a full-featured -distribution once they have mastered the basics\footnote{ttylinux is -maintained by Pascal Schmidt. You can download source packages from -the distribution's home page: {\tt http://www.minimalinux.org/ttylinux/}}. - -\begin{enumerate} -\item Download and extract the ttylinux disk image from the Files -section of the project's SourceForge site (see -\path{http://sf.net/projects/xen/}). -\item Create a configuration file like the following: -\begin{verbatim} -kernel = "/boot/vmlinuz-2.6-xenU" -memory = 64 -name = "ttylinux" -nics = 1 -ip = "1.2.3.4" -disk = ['file:/path/to/ttylinux/rootfs,sda1,w'] -root = "/dev/sda1 ro" -\end{verbatim} -\item Now start the domain and connect to its console: -\begin{verbatim} -xm create configfile -c -\end{verbatim} -\item Login as root, password root. -\end{enumerate} - - -\section{Starting / Stopping Domains Automatically} - -It is possible to have certain domains start automatically at boot -time and to have dom0 wait for all running domains to shutdown before -it shuts down the system. - -To specify a domain is to start at boot-time, place its -configuration file (or a link to it) under \path{/etc/xen/auto/}. - -A Sys-V style init script for RedHat and LSB-compliant systems is -provided and will be automatically copied to \path{/etc/init.d/} -during install. You can then enable it in the appropriate way for -your distribution. - -For instance, on RedHat: - -\begin{quote} -\verb_# chkconfig --add xendomains_ -\end{quote} - -By default, this will start the boot-time domains in runlevels 3, 4 -and 5. - -You can also use the \path{service} command to run this script -manually, e.g: - -\begin{quote} -\verb_# service xendomains start_ - -Starts all the domains with config files under /etc/xen/auto/. -\end{quote} - - -\begin{quote} -\verb_# service xendomains stop_ - -Shuts down ALL running Xen domains. -\end{quote} - -\chapter{Domain Management Tools} - -The previous chapter described a simple example of how to configure -and start a domain. This chapter summarises the tools available to -manage running domains. - -\section{Command-line Management} - -Command line management tasks are also performed using the \path{xm} -tool. For online help for the commands available, type: -\begin{quote} -\verb_# xm help_ -\end{quote} - -You can also type \path{xm help $<$command$>$} for more information -on a given command. - -\subsection{Basic Management Commands} - -The most important \path{xm} commands are: -\begin{quote} -\verb_# xm list_: Lists all domains running.\\ -\verb_# xm consoles_ : Gives information about the domain consoles.\\ -\verb_# xm console_: Opens a console to a domain (e.g.\ - \verb_# xm console myVM_ -\end{quote} - -\subsection{\tt xm list} - -The output of \path{xm list} is in rows of the following format: -\begin{center} -{\tt name domid memory cpu state cputime console} -\end{center} - -\begin{quote} -\begin{description} -\item[name] The descriptive name of the virtual machine. -\item[domid] The number of the domain ID this virtual machine is running in. -\item[memory] Memory size in megabytes. -\item[cpu] The CPU this domain is running on. -\item[state] Domain state consists of 5 fields: - \begin{description} - \item[r] running - \item[b] blocked - \item[p] paused - \item[s] shutdown - \item[c] crashed - \end{description} -\item[cputime] How much CPU time (in seconds) the domain has used so far. -\item[console] TCP port accepting connections to the domain's console. -\end{description} -\end{quote} - -The \path{xm list} command also supports a long output format when the -\path{-l} switch is used. This outputs the fulls details of the -running domains in \xend's SXP configuration format. - -For example, suppose the system is running the ttylinux domain as -described earlier. The list command should produce output somewhat -like the following: -\begin{verbatim} -# xm list -Name Id Mem(MB) CPU State Time(s) Console -Domain-0 0 251 0 r---- 172.2 -ttylinux 5 63 0 -b--- 3.0 9605 -\end{verbatim} - -Here we can see the details for the ttylinux domain, as well as for -domain 0 (which, of course, is always running). Note that the console -port for the ttylinux domain is 9605. This can be connected to by TCP -using a terminal program (e.g. \path{telnet} or, better, -\path{xencons}). The simplest way to connect is to use the \path{xm console} -command, specifying the domain name or ID. To connect to the console -of the ttylinux domain, we could use any of the following: -\begin{verbatim} -# xm console ttylinux -# xm console 5 -# xencons localhost 9605 -\end{verbatim} - -\section{Domain Save and Restore} - -The administrator of a Xen system may suspend a virtual machine's -current state into a disk file in domain 0, allowing it to be resumed -at a later time. - -The ttylinux domain described earlier can be suspended to disk using -the command: -\begin{verbatim} -# xm save ttylinux ttylinux.xen -\end{verbatim} - -This will stop the domain named `ttylinux' and save its current state -into a file called \path{ttylinux.xen}. - -To resume execution of this domain, use the \path{xm restore} command: -\begin{verbatim} -# xm restore ttylinux.xen -\end{verbatim} - -This will restore the state of the domain and restart it. The domain -will carry on as before and the console may be reconnected using the -\path{xm console} command, as above. - -\section{Live Migration} - -Live migration is used to transfer a domain between physical hosts -whilst that domain continues to perform its usual activities --- from -the user's perspective, the migration should be imperceptible. - -To perform a live migration, both hosts must be running Xen / \xend and -the destination host must have sufficient resources (e.g. memory -capacity) to accommodate the domain after the move. Furthermore we -currently require both source and destination machines to be on the -same L2 subnet. - -Currently, there is no support for providing automatic remote access -to filesystems stored on local disk when a domain is migrated. -Administrators should choose an appropriate storage solution -(i.e. SAN, NAS, etc.) to ensure that domain filesystems are also -available on their destination node. GNBD is a good method for -exporting a volume from one machine to another. iSCSI can do a similar -job, but is more complex to set up. - -When a domain migrates, it's MAC and IP address move with it, thus it -is only possible to migrate VMs within the same layer-2 network and IP -subnet. If the destination node is on a different subnet, the -administrator would need to manually configure a suitable etherip or -IP tunnel in the domain 0 of the remote node. - -A domain may be migrated using the \path{xm migrate} command. To -live migrate a domain to another machine, we would use -the command: - -\begin{verbatim} -# xm migrate --live mydomain destination.ournetwork.com -\end{verbatim} - -Without the \path{--live} flag, \xend simply stops the domain and -copies the memory image over to the new node and restarts it. Since -domains can have large allocations this can be quite time consuming, -even on a Gigabit network. With the \path{--live} flag \xend attempts -to keep the domain running while the migration is in progress, -resulting in typical `downtimes' of just 60--300ms. - -For now it will be necessary to reconnect to the domain's console on -the new machine using the \path{xm console} command. If a migrated -domain has any open network connections then they will be preserved, -so SSH connections do not have this limitation. - -\section{Managing Domain Memory} - -XenLinux domains have the ability to relinquish / reclaim machine -memory at the request of the administrator or the user of the domain. - -\subsection{Setting memory footprints from dom0} - -The machine administrator can request that a domain alter its memory -footprint using the \path{xm set-mem} command. For instance, we can -request that our example ttylinux domain reduce its memory footprint -to 32 megabytes. - -\begin{verbatim} -# xm set-mem ttylinux 32 -\end{verbatim} - -We can now see the result of this in the output of \path{xm list}: - -\begin{verbatim} -# xm list -Name Id Mem(MB) CPU State Time(s) Console -Domain-0 0 251 0 r---- 172.2 -ttylinux 5 31 0 -b--- 4.3 9605 -\end{verbatim} - -The domain has responded to the request by returning memory to Xen. We -can restore the domain to its original size using the command line: - -\begin{verbatim} -# xm set-mem ttylinux 64 -\end{verbatim} - -\subsection{Setting memory footprints from within a domain} - -The virtual file \path{/proc/xen/balloon} allows the owner of a -domain to adjust their own memory footprint. Reading the file -(e.g. \path{cat /proc/xen/balloon}) prints out the current -memory footprint of the domain. Writing the file -(e.g. \path{echo new\_target > /proc/xen/balloon}) requests -that the kernel adjust the domain's memory footprint to a new value. - -\subsection{Setting memory limits} - -Xen associates a memory size limit with each domain. By default, this -is the amount of memory the domain is originally started with, -preventing the domain from ever growing beyond this size. To permit a -domain to grow beyond its original allocation or to prevent a domain -you've shrunk from reclaiming the memory it relinquished, use the -\path{xm maxmem} command. - -\chapter{Domain Filesystem Storage} - -It is possible to directly export any Linux block device in dom0 to -another domain, or to export filesystems / devices to virtual machines -using standard network protocols (e.g. NBD, iSCSI, NFS, etc). This -chapter covers some of the possibilities. - - -\section{Exporting Physical Devices as VBDs} -\label{s:exporting-physical-devices-as-vbds} - -One of the simplest configurations is to directly export -individual partitions from domain 0 to other domains. To -achieve this use the \path{phy:} specifier in your domain -configuration file. For example a line like -\begin{quote} -\verb_disk = ['phy:hda3,sda1,w']_ -\end{quote} -specifies that the partition \path{/dev/hda3} in domain 0 -should be exported read-write to the new domain as \path{/dev/sda1}; -one could equally well export it as \path{/dev/hda} or -\path{/dev/sdb5} should one wish. - -In addition to local disks and partitions, it is possible to export -any device that Linux considers to be ``a disk'' in the same manner. -For example, if you have iSCSI disks or GNBD volumes imported into -domain 0 you can export these to other domains using the \path{phy:} -disk syntax. E.g.: -\begin{quote} -\verb_disk = ['phy:vg/lvm1,sda2,w']_ -\end{quote} - - - -\begin{center} -\framebox{\bf Warning: Block device sharing} -\end{center} -\begin{quote} -Block devices should typically only be shared between domains in a -read-only fashion otherwise the Linux kernel's file systems will get -very confused as the file system structure may change underneath them -(having the same ext3 partition mounted rw twice is a sure fire way to -cause irreparable damage)! \Xend will attempt to prevent you from -doing this by checking that the device is not mounted read-write in -domain 0, and hasn't already been exported read-write to another -domain. -If you want read-write sharing, export the directory to other domains -via NFS from domain0 (or use a cluster file system such as GFS or -ocfs2). - -\end{quote} - - -\section{Using File-backed VBDs} - -It is also possible to use a file in Domain 0 as the primary storage -for a virtual machine. As well as being convenient, this also has the -advantage that the virtual block device will be {\em sparse} --- space -will only really be allocated as parts of the file are used. So if a -virtual machine uses only half of its disk space then the file really -takes up half of the size allocated. - -For example, to create a 2GB sparse file-backed virtual block device -(actually only consumes 1KB of disk): -\begin{quote} -\verb_# dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=1_ -\end{quote} - -Make a file system in the disk file: -\begin{quote} -\verb_# mkfs -t ext3 vm1disk_ -\end{quote} - -(when the tool asks for confirmation, answer `y') - -Populate the file system e.g. by copying from the current root: -\begin{quote} -\begin{verbatim} -# mount -o loop vm1disk /mnt -# cp -ax /{root,dev,var,etc,usr,bin,sbin,lib} /mnt -# mkdir /mnt/{proc,sys,home,tmp} -\end{verbatim} -\end{quote} - -Tailor the file system by editing \path{/etc/fstab}, -\path{/etc/hostname}, etc (don't forget to edit the files in the -mounted file system, instead of your domain 0 filesystem, e.g. you -would edit \path{/mnt/etc/fstab} instead of \path{/etc/fstab} ). For -this example put \path{/dev/sda1} to root in fstab. - -Now unmount (this is important!): -\begin{quote} -\verb_# umount /mnt_ -\end{quote} - -In the configuration file set: -\begin{quote} -\verb_disk = ['file:/full/path/to/vm1disk,sda1,w']_ -\end{quote} - -As the virtual machine writes to its `disk', the sparse file will be -filled in and consume more space up to the original 2GB. - -{\bf Note that file-backed VBDs may not be appropriate for backing -I/O-intensive domains.} File-backed VBDs are known to experience -substantial slowdowns under heavy I/O workloads, due to the I/O handling -by the loopback block device used to support file-backed VBDs in dom0. -Better I/O performance can be achieved by using either LVM-backed VBDs -(Section~\ref{s:using-lvm-backed-vbds}) or physical devices as VBDs -(Section~\ref{s:exporting-physical-devices-as-vbds}). - -Linux supports a maximum of eight file-backed VBDs across all domains by -default. This limit can be statically increased by using the {\em -max\_loop} module parameter if CONFIG\_BLK\_DEV\_LOOP is compiled as a -module in the dom0 kernel, or by using the {\em max\_loop=n} boot option -if CONFIG\_BLK\_DEV\_LOOP is compiled directly into the dom0 kernel. - - -\section{Using LVM-backed VBDs} -\label{s:using-lvm-backed-vbds} - -A particularly appealing solution is to use LVM volumes -as backing for domain file-systems since this allows dynamic -growing/shrinking of volumes as well as snapshot and other -features. - -To initialise a partition to support LVM volumes: -\begin{quote} -\begin{verbatim} -# pvcreate /dev/sda10 -\end{verbatim} -\end{quote} - -Create a volume group named `vg' on the physical partition: -\begin{quote} -\begin{verbatim} -# vgcreate vg /dev/sda10 -\end{verbatim} -\end{quote} - -Create a logical volume of size 4GB named `myvmdisk1': -\begin{quote} -\begin{verbatim} -# lvcreate -L4096M -n myvmdisk1 vg -\end{verbatim} -\end{quote} - -You should now see that you have a \path{/dev/vg/myvmdisk1} -Make a filesystem, mount it and populate it, e.g.: -\begin{quote} -\begin{verbatim} -# mkfs -t ext3 /dev/vg/myvmdisk1 -# mount /dev/vg/myvmdisk1 /mnt -# cp -ax / /mnt -# umount /mnt -\end{verbatim} -\end{quote} - -Now configure your VM with the following disk configuration: -\begin{quote} -\begin{verbatim} - disk = [ 'phy:vg/myvmdisk1,sda1,w' ] -\end{verbatim} -\end{quote} - -LVM enables you to grow the size of logical volumes, but you'll need -to resize the corresponding file system to make use of the new -space. Some file systems (e.g. ext3) now support on-line resize. See -the LVM manuals for more details. - -You can also use LVM for creating copy-on-write clones of LVM -volumes (known as writable persistent snapshots in LVM -terminology). This facility is new in Linux 2.6.8, so isn't as -stable as one might hope. In particular, using lots of CoW LVM -disks consumes a lot of dom0 memory, and error conditions such as -running out of disk space are not handled well. Hopefully this -will improve in future. - -To create two copy-on-write clone of the above file system you -would use the following commands: - -\begin{quote} -\begin{verbatim} -# lvcreate -s -L1024M -n myclonedisk1 /dev/vg/myvmdisk1 -# lvcreate -s -L1024M -n myclonedisk2 /dev/vg/myvmdisk1 -\end{verbatim} -\end{quote} - -Each of these can grow to have 1GB of differences from the master -volume. You can grow the amount of space for storing the -differences using the lvextend command, e.g.: -\begin{quote} -\begin{verbatim} -# lvextend +100M /dev/vg/myclonedisk1 -\end{verbatim} -\end{quote} - -Don't let the `differences volume' ever fill up otherwise LVM gets -rather confused. It may be possible to automate the growing -process by using \path{dmsetup wait} to spot the volume getting full -and then issue an \path{lvextend}. - -In principle, it is possible to continue writing to the volume -that has been cloned (the changes will not be visible to the -clones), but we wouldn't recommend this: have the cloned volume -as a `pristine' file system install that isn't mounted directly -by any of the virtual machines. - - -\section{Using NFS Root} - -First, populate a root filesystem in a directory on the server -machine. This can be on a distinct physical machine, or simply -run within a virtual machine on the same node. - -Now configure the NFS server to export this filesystem over the -network by adding a line to \path{/etc/exports}, for instance: - -\begin{quote} -\begin{small} -\begin{verbatim} -/export/vm1root 1.2.3.4/24 (rw,sync,no_root_squash) -\end{verbatim} -\end{small} -\end{quote} - -Finally, configure the domain to use NFS root. In addition to the -normal variables, you should make sure to set the following values in -the domain's configuration file: - -\begin{quote} -\begin{small} -\begin{verbatim} -root = '/dev/nfs' -nfs_server = '2.3.4.5' # substitute IP address of server -nfs_root = '/path/to/root' # path to root FS on the server -\end{verbatim} -\end{small} -\end{quote} - -The domain will need network access at boot time, so either statically -configure an IP address (Using the config variables \path{ip}, -\path{netmask}, \path{gateway}, \path{hostname}) or enable DHCP ( -\path{dhcp='dhcp'}). - -Note that the Linux NFS root implementation is known to have stability -problems under high load (this is not a Xen-specific problem), so this -configuration may not be appropriate for critical servers. + +%% Chapter Introduction moved to introduction.tex +\include{src/user/introduction} + +%% Chapter Installation moved to installation.tex +\include{src/user/installation} + +%% Chapter Starting Additional Domains moved to start_addl_dom.tex +\include{src/user/start_addl_dom} + +%% Chapter Domain Management Tools moved to domain_mgmt.tex +\include{src/user/domain_mgmt} + +%% Chapter Domain Filesystem Storage moved to domain_filesystem.tex +\include{src/user/domain_filesystem} + \part{User Reference Documentation} -\chapter{Control Software} - -The Xen control software includes the \xend node control daemon (which -must be running), the xm command line tools, and the prototype -xensv web interface. - -\section{\Xend (node control daemon)} -\label{s:xend} - -The Xen Daemon (\Xend) performs system management functions related to -virtual machines. It forms a central point of control for a machine -and can be controlled using an HTTP-based protocol. \Xend must be -running in order to start and manage virtual machines. - -\Xend must be run as root because it needs access to privileged system -management functions. A small set of commands may be issued on the -\xend command line: - -\begin{tabular}{ll} -\verb!# xend start! & start \xend, if not already running \\ -\verb!# xend stop! & stop \xend if already running \\ -\verb!# xend restart! & restart \xend if running, otherwise start it \\ -% \verb!# xend trace_start! & start \xend, with very detailed debug logging \\ -\verb!# xend status! & indicates \xend status by its return code -\end{tabular} - -A SysV init script called {\tt xend} is provided to start \xend at boot -time. {\tt make install} installs this script in {\path{/etc/init.d}. -To enable it, you have to make symbolic links in the appropriate -runlevel directories or use the {\tt chkconfig} tool, where available. - -Once \xend is running, more sophisticated administration can be done -using the xm tool (see Section~\ref{s:xm}) and the experimental -Xensv web interface (see Section~\ref{s:xensv}). - -As \xend runs, events will be logged to \path{/var/log/xend.log} and, -if the migration assistant daemon (\path{xfrd}) has been started, -\path{/var/log/xfrd.log}. These may be of use for troubleshooting -problems. - -\section{Xm (command line interface)} -\label{s:xm} - -The xm tool is the primary tool for managing Xen from the console. -The general format of an xm command line is: - -\begin{verbatim} -# xm command [switches] [arguments] [variables] -\end{verbatim} - -The available {\em switches} and {\em arguments} are dependent on the -{\em command} chosen. The {\em variables} may be set using -declarations of the form {\tt variable=value} and command line -declarations override any of the values in the configuration file -being used, including the standard variables described above and any -custom variables (for instance, the \path{xmdefconfig} file uses a -{\tt vmid} variable). - -The available commands are as follows: - -\begin{description} -\item[set-mem] Request a domain to adjust its memory footprint. -\item[create] Create a new domain. -\item[destroy] Kill a domain immediately. -\item[list] List running domains. -\item[shutdown] Ask a domain to shutdown. -\item[dmesg] Fetch the Xen (not Linux!) boot output. -\item[consoles] Lists the available consoles. -\item[console] Connect to the console for a domain. -\item[help] Get help on xm commands. -\item[save] Suspend a domain to disk. -\item[restore] Restore a domain from disk. -\item[pause] Pause a domain's execution. -\item[unpause] Unpause a domain. -\item[pincpu] Pin a domain to a CPU. -\item[bvt] Set BVT scheduler parameters for a domain. -\item[bvt\_ctxallow] Set the BVT context switching allowance for the system. -\item[atropos] Set the atropos parameters for a domain. -\item[rrobin] Set the round robin time slice for the system. -\item[info] Get information about the Xen host. -\item[call] Call a \xend HTTP API function directly. -\end{description} - -For a detailed overview of switches, arguments and variables to each command -try -\begin{quote} -\begin{verbatim} -# xm help command -\end{verbatim} -\end{quote} - -\section{Xensv (web control interface)} -\label{s:xensv} - -Xensv is the experimental web control interface for managing a Xen -machine. It can be used to perform some (but not yet all) of the -management tasks that can be done using the xm tool. - -It can be started using: -\begin{quote} -\verb_# xensv start_ -\end{quote} -and stopped using: -\begin{quote} -\verb_# xensv stop_ -\end{quote} - -By default, Xensv will serve out the web interface on port 8080. This -can be changed by editing -\path{/usr/lib/python2.3/site-packages/xen/sv/params.py}. - -Once Xensv is running, the web interface can be used to create and -manage running domains. - - - - -\chapter{Domain Configuration} -\label{cha:config} - -The following contains the syntax of the domain configuration -files and description of how to further specify networking, -driver domain and general scheduling behaviour. - -\section{Configuration Files} -\label{s:cfiles} - -Xen configuration files contain the following standard variables. -Unless otherwise stated, configuration items should be enclosed in -quotes: see \path{/etc/xen/xmexample1} and \path{/etc/xen/xmexample2} -for concrete examples of the syntax. - -\begin{description} -\item[kernel] Path to the kernel image -\item[ramdisk] Path to a ramdisk image (optional). -% \item[builder] The name of the domain build function (e.g. {\tt'linux'} or {\tt'netbsd'}. -\item[memory] Memory size in megabytes. -\item[cpu] CPU to run this domain on, or {\tt -1} for - auto-allocation. -\item[console] Port to export the domain console on (default 9600 + domain ID). -\item[nics] Number of virtual network interfaces. -\item[vif] List of MAC addresses (random addresses are assigned if not - given) and bridges to use for the domain's network interfaces, e.g. -\begin{verbatim} -vif = [ 'mac=aa:00:00:00:00:11, bridge=xen-br0', - 'bridge=xen-br1' ] -\end{verbatim} - to assign a MAC address and bridge to the first interface and assign - a different bridge to the second interface, leaving \xend to choose - the MAC address. -\item[disk] List of block devices to export to the domain, e.g. \\ - \verb_disk = [ 'phy:hda1,sda1,r' ]_ \\ - exports physical device \path{/dev/hda1} to the domain - as \path{/dev/sda1} with read-only access. Exporting a disk read-write - which is currently mounted is dangerous -- if you are \emph{certain} - you wish to do this, you can specify \path{w!} as the mode. -\item[dhcp] Set to {\tt 'dhcp'} if you want to use DHCP to configure - networking. -\item[netmask] Manually configured IP netmask. -\item[gateway] Manually configured IP gateway. -\item[hostname] Set the hostname for the virtual machine. -\item[root] Specify the root device parameter on the kernel command - line. -\item[nfs\_server] IP address for the NFS server (if any). -\item[nfs\_root] Path of the root filesystem on the NFS server (if any). -\item[extra] Extra string to append to the kernel command line (if - any) -\item[restart] Three possible options: - \begin{description} - \item[always] Always restart the domain, no matter what - its exit code is. - \item[never] Never restart the domain. - \item[onreboot] Restart the domain iff it requests reboot. - \end{description} -\end{description} - -For additional flexibility, it is also possible to include Python -scripting commands in configuration files. An example of this is the -\path{xmexample2} file, which uses Python code to handle the -\path{vmid} variable. - - -%\part{Advanced Topics} - -\section{Network Configuration} - -For many users, the default installation should work `out of the box'. -More complicated network setups, for instance with multiple ethernet -interfaces and/or existing bridging setups will require some -special configuration. - -The purpose of this section is to describe the mechanisms provided by -\xend to allow a flexible configuration for Xen's virtual networking. - -\subsection{Xen virtual network topology} - -Each domain network interface is connected to a virtual network -interface in dom0 by a point to point link (effectively a `virtual -crossover cable'). These devices are named {\tt -vif$<$domid$>$.$<$vifid$>$} (e.g. {\tt vif1.0} for the first interface -in domain 1, {\tt vif3.1} for the second interface in domain 3). - -Traffic on these virtual interfaces is handled in domain 0 using -standard Linux mechanisms for bridging, routing, rate limiting, etc. -Xend calls on two shell scripts to perform initial configuration of -the network and configuration of new virtual interfaces. By default, -these scripts configure a single bridge for all the virtual -interfaces. Arbitrary routing / bridging configurations can be -configured by customising the scripts, as described in the following -section. - -\subsection{Xen networking scripts} - -Xen's virtual networking is configured by two shell scripts (by -default \path{network} and \path{vif-bridge}). These are -called automatically by \xend when certain events occur, with -arguments to the scripts providing further contextual information. -These scripts are found by default in \path{/etc/xen/scripts}. The -names and locations of the scripts can be configured in -\path{/etc/xen/xend-config.sxp}. - -\begin{description} - -\item[network:] This script is called whenever \xend is started or -stopped to respectively initialise or tear down the Xen virtual -network. In the default configuration initialisation creates the -bridge `xen-br0' and moves eth0 onto that bridge, modifying the -routing accordingly. When \xend exits, it deletes the Xen bridge and -removes eth0, restoring the normal IP and routing configuration. - -%% In configurations where the bridge already exists, this script could -%% be replaced with a link to \path{/bin/true} (for instance). - -\item[vif-bridge:] This script is called for every domain virtual -interface and can configure firewalling rules and add the vif -to the appropriate bridge. By default, this adds and removes -VIFs on the default Xen bridge. - -\end{description} - -For more complex network setups (e.g. where routing is required or -integrate with existing bridges) these scripts may be replaced with -customised variants for your site's preferred configuration. - -%% There are two possible types of privileges: IO privileges and -%% administration privileges. - -\section{Driver Domain Configuration} - -I/O privileges can be assigned to allow a domain to directly access -PCI devices itself. This is used to support driver domains. - -Setting backend privileges is currently only supported in SXP format -config files. To allow a domain to function as a backend for others, -somewhere within the {\tt vm} element of its configuration file must -be a {\tt backend} element of the form {\tt (backend ({\em type}))} -where {\tt \em type} may be either {\tt netif} or {\tt blkif}, -according to the type of virtual device this domain will service. -%% After this domain has been built, \xend will connect all new and -%% existing {\em virtual} devices (of the appropriate type) to that -%% backend. - -Note that a block backend cannot currently import virtual block -devices from other domains, and a network backend cannot import -virtual network devices from other domains. Thus (particularly in the -case of block backends, which cannot import a virtual block device as -their root filesystem), you may need to boot a backend domain from a -ramdisk or a network device. - -Access to PCI devices may be configured on a per-device basis. Xen -will assign the minimal set of hardware privileges to a domain that -are required to control its devices. This can be configured in either -format of configuration file: - -\begin{itemize} -\item SXP Format: Include device elements of the form: \\ -\centerline{ {\tt (device (pci (bus {\em x}) (dev {\em y}) (func {\em z})))}} \\ - inside the top-level {\tt vm} element. Each one specifies the address - of a device this domain is allowed to access --- - the numbers {\em x},{\em y} and {\em z} may be in either decimal or - hexadecimal format. -\item Flat Format: Include a list of PCI device addresses of the - format: \\ -\centerline{{\tt pci = ['x,y,z', ...]}} \\ -where each element in the - list is a string specifying the components of the PCI device - address, separated by commas. The components ({\tt \em x}, {\tt \em - y} and {\tt \em z}) of the list may be formatted as either decimal - or hexadecimal. -\end{itemize} - -%% \section{Administration Domains} - -%% Administration privileges allow a domain to use the `dom0 -%% operations' (so called because they are usually available only to -%% domain 0). A privileged domain can build other domains, set scheduling -%% parameters, etc. - -% Support for other administrative domains is not yet available... perhaps -% we should plumb it in some time - - - - - -\section{Scheduler Configuration} -\label{s:sched} - - -Xen offers a boot time choice between multiple schedulers. To select -a scheduler, pass the boot parameter {\em sched=sched\_name} to Xen, -substituting the appropriate scheduler name. Details of the schedulers -and their parameters are included below; future versions of the tools -will provide a higher-level interface to these tools. - -It is expected that system administrators configure their system to -use the scheduler most appropriate to their needs. Currently, the BVT -scheduler is the recommended choice. - -\subsection{Borrowed Virtual Time} - -{\tt sched=bvt} (the default) \\ - -BVT provides proportional fair shares of the CPU time. It has been -observed to penalise domains that block frequently (e.g. I/O intensive -domains), but this can be compensated for by using warping. - -\subsubsection{Global Parameters} - -\begin{description} -\item[ctx\_allow] - the context switch allowance is similar to the `quantum' - in traditional schedulers. It is the minimum time that - a scheduled domain will be allowed to run before being - pre-empted. -\end{description} - -\subsubsection{Per-domain parameters} - -\begin{description} -\item[mcuadv] - the MCU (Minimum Charging Unit) advance determines the - proportional share of the CPU that a domain receives. It - is set inversely proportionally to a domain's sharing weight. -\item[warp] - the amount of `virtual time' the domain is allowed to warp - backwards -\item[warpl] - the warp limit is the maximum time a domain can run warped for -\item[warpu] - the unwarp requirement is the minimum time a domain must - run unwarped for before it can warp again -\end{description} - -\subsection{Atropos} - -{\tt sched=atropos} \\ - -Atropos is a soft real time scheduler. It provides guarantees about -absolute shares of the CPU, with a facility for sharing -slack CPU time on a best-effort basis. It can provide timeliness -guarantees for latency-sensitive domains. - -Every domain has an associated period and slice. The domain should -receive `slice' nanoseconds every `period' nanoseconds. This allows -the administrator to configure both the absolute share of the CPU a -domain receives and the frequency with which it is scheduled. - -%% When -%% domains unblock, their period is reduced to the value of the latency -%% hint (the slice is scaled accordingly so that they still get the same -%% proportion of the CPU). For each subsequent period, the slice and -%% period times are doubled until they reach their original values. - -Note: don't overcommit the CPU when using Atropos (i.e. don't reserve -more CPU than is available --- the utilisation should be kept to -slightly less than 100\% in order to ensure predictable behaviour). - -\subsubsection{Per-domain parameters} - -\begin{description} -\item[period] The regular time interval during which a domain is - guaranteed to receive its allocation of CPU time. -\item[slice] - The length of time per period that a domain is guaranteed to run - for (in the absence of voluntary yielding of the CPU). -\item[latency] - The latency hint is used to control how soon after - waking up a domain it should be scheduled. -\item[xtratime] This is a boolean flag that specifies whether a domain - should be allowed a share of the system slack time. -\end{description} - -\subsection{Round Robin} - -{\tt sched=rrobin} \\ - -The round robin scheduler is included as a simple demonstration of -Xen's internal scheduler API. It is not intended for production use. - -\subsubsection{Global Parameters} - -\begin{description} -\item[rr\_slice] - The maximum time each domain runs before the next - scheduling decision is made. -\end{description} - - - - - - - - - - - - -\chapter{Build, Boot and Debug options} - -This chapter describes the build- and boot-time options -which may be used to tailor your Xen system. - -\section{Xen Build Options} - -Xen provides a number of build-time options which should be -set as environment variables or passed on make's command-line. - -\begin{description} -\item[verbose=y] Enable debugging messages when Xen detects an unexpected condition. -Also enables console output from all domains. -\item[debug=y] -Enable debug assertions. Implies {\bf verbose=y}. -(Primarily useful for tracing bugs in Xen). -\item[debugger=y] -Enable the in-Xen debugger. This can be used to debug -Xen, guest OSes, and applications. -\item[perfc=y] -Enable performance counters for significant events -within Xen. The counts can be reset or displayed -on Xen's console via console control keys. -\item[trace=y] -Enable per-cpu trace buffers which log a range of -events within Xen for collection by control -software. -\end{description} - -\section{Xen Boot Options} -\label{s:xboot} - -These options are used to configure Xen's behaviour at runtime. They -should be appended to Xen's command line, either manually or by -editing \path{grub.conf}. - -\begin{description} -\item [noreboot ] - Don't reboot the machine automatically on errors. This is - useful to catch debug output if you aren't catching console messages - via the serial line. - -\item [nosmp ] - Disable SMP support. - This option is implied by `ignorebiostables'. - -\item [watchdog ] - Enable NMI watchdog which can report certain failures. - -\item [noirqbalance ] - Disable software IRQ balancing and affinity. This can be used on - systems such as Dell 1850/2850 that have workarounds in hardware for - IRQ-routing issues. - -\item [badpage=$<$page number$>$,$<$page number$>$, \ldots ] - Specify a list of pages not to be allocated for use - because they contain bad bytes. For example, if your - memory tester says that byte 0x12345678 is bad, you would - place `badpage=0x12345' on Xen's command line. - -\item [com1=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ - com2=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ ] \mbox{}\\ - Xen supports up to two 16550-compatible serial ports. - For example: `com1=9600, 8n1, 0x408, 5' maps COM1 to a - 9600-baud port, 8 data bits, no parity, 1 stop bit, - I/O port base 0x408, IRQ 5. - If some configuration options are standard (e.g., I/O base and IRQ), - then only a prefix of the full configuration string need be - specified. If the baud rate is pre-configured (e.g., by the - bootloader) then you can specify `auto' in place of a numeric baud - rate. - -\item [console=$<$specifier list$>$ ] - Specify the destination for Xen console I/O. - This is a comma-separated list of, for example: -\begin{description} - \item[vga] use VGA console and allow keyboard input - \item[com1] use serial port com1 - \item[com2H] use serial port com2. Transmitted chars will - have the MSB set. Received chars must have - MSB set. - \item[com2L] use serial port com2. Transmitted chars will - have the MSB cleared. Received chars must - have MSB cleared. -\end{description} - The latter two examples allow a single port to be - shared by two subsystems (e.g. console and - debugger). Sharing is controlled by MSB of each - transmitted/received character. - [NB. Default for this option is `com1,vga'] - -\item [sync\_console ] - Force synchronous console output. This is useful if you system fails - unexpectedly before it has sent all available output to the - console. In most cases Xen will automatically enter synchronous mode - when an exceptional event occurs, but this option provides a manual - fallback. - -\item [conswitch=$<$switch-char$><$auto-switch-char$>$ ] - Specify how to switch serial-console input between - Xen and DOM0. The required sequence is CTRL-$<$switch-char$>$ - pressed three times. Specifying the backtick character - disables switching. - The $<$auto-switch-char$>$ specifies whether Xen should - auto-switch input to DOM0 when it boots --- if it is `x' - then auto-switching is disabled. Any other value, or - omitting the character, enables auto-switching. - [NB. default switch-char is `a'] - -\item [nmi=xxx ] - Specify what to do with an NMI parity or I/O error. \\ - `nmi=fatal': Xen prints a diagnostic and then hangs. \\ - `nmi=dom0': Inform DOM0 of the NMI. \\ - `nmi=ignore': Ignore the NMI. - -\item [mem=xxx ] - Set the physical RAM address limit. Any RAM appearing beyond this - physical address in the memory map will be ignored. This parameter - may be specified with a B, K, M or G suffix, representing bytes, - kilobytes, megabytes and gigabytes respectively. The - default unit, if no suffix is specified, is kilobytes. - -\item [dom0\_mem=xxx ] - Set the amount of memory to be allocated to domain0. In Xen 3.x the parameter - may be specified with a B, K, M or G suffix, representing bytes, - kilobytes, megabytes and gigabytes respectively; if no suffix is specified, - the parameter defaults to kilobytes. In previous versions of Xen, suffixes - were not supported and the value is always interpreted as kilobytes. - -\item [tbuf\_size=xxx ] - Set the size of the per-cpu trace buffers, in pages - (default 1). Note that the trace buffers are only - enabled in debug builds. Most users can ignore - this feature completely. - -\item [sched=xxx ] - Select the CPU scheduler Xen should use. The current - possibilities are `bvt' (default), `atropos' and `rrobin'. - For more information see Section~\ref{s:sched}. - -\item [apic\_verbosity=debug,verbose ] - Print more detailed information about local APIC and IOAPIC configuration. - -\item [lapic ] - Force use of local APIC even when left disabled by uniprocessor BIOS. - -\item [nolapic ] - Ignore local APIC in a uniprocessor system, even if enabled by the BIOS. - -\item [apic=bigsmp,default,es7000,summit ] - Specify NUMA platform. This can usually be probed automatically. - -\end{description} - -In addition, the following options may be specified on the Xen command -line. Since domain 0 shares responsibility for booting the platform, -Xen will automatically propagate these options to its command -line. These options are taken from Linux's command-line syntax with -unchanged semantics. - -\begin{description} -\item [acpi=off,force,strict,ht,noirq,\ldots ] - Modify how Xen (and domain 0) parses the BIOS ACPI tables. - -\item [acpi\_skip\_timer\_override ] - Instruct Xen (and domain 0) to ignore timer-interrupt override - instructions specified by the BIOS ACPI tables. - -\item [noapic ] - Instruct Xen (and domain 0) to ignore any IOAPICs that are present in - the system, and instead continue to use the legacy PIC. - -\end{description} - -\section{XenLinux Boot Options} - -In addition to the standard Linux kernel boot options, we support: -\begin{description} -\item[xencons=xxx ] Specify the device node to which the Xen virtual -console driver is attached. The following options are supported: -\begin{center} -\begin{tabular}{l} -`xencons=off': disable virtual console \\ -`xencons=tty': attach console to /dev/tty1 (tty0 at boot-time) \\ -`xencons=ttyS': attach console to /dev/ttyS0 -\end{tabular} -\end{center} -The default is ttyS for dom0 and tty for all other domains. -\end{description} - - - -\section{Debugging} -\label{s:keys} - -Xen has a set of debugging features that can be useful to try and -figure out what's going on. Hit 'h' on the serial line (if you -specified a baud rate on the Xen command line) or ScrollLock-h on the -keyboard to get a list of supported commands. - -If you have a crash you'll likely get a crash dump containing an EIP -(PC) which, along with an \path{objdump -d image}, can be useful in -figuring out what's happened. Debug a Xenlinux image just as you -would any other Linux kernel. - -%% We supply a handy debug terminal program which you can find in -%% \path{/usr/local/src/xen-2.0.bk/tools/misc/miniterm/} -%% This should be built and executed on another machine that is connected -%% via a null modem cable. Documentation is included. -%% Alternatively, if the Xen machine is connected to a serial-port server -%% then we supply a dumb TCP terminal client, {\tt xencons}. - - +%% Chapter Control Software moved to control_software.tex +\include{src/user/control_software} + +%% Chapter Domain Configuration moved to domain_configuration.tex +\include{src/user/domain_configuration} + +%% Chapter Build, Boot and Debug Options moved to build.tex +\include{src/user/build} \chapter{Further Support} @@ -1875,6 +108,7 @@ %Various HOWTOs are available in \path{docs/HOWTOS} but this content is %being integrated into this manual. + \section{Online References} The official Xen web site is found at: @@ -1884,6 +118,7 @@ This contains links to the latest versions of all on-line documentation (including the lateset version of the FAQ). + \section{Mailing Lists} @@ -1905,326 +140,18 @@ \end{description} + \appendix - -\chapter{Installing Xen / XenLinux on Debian} - -The Debian project provides a tool called \path{debootstrap} which -allows a base Debian system to be installed into a filesystem without -requiring the host system to have any Debian-specific software (such -as \path{apt}. - -Here's some info how to install Debian 3.1 (Sarge) for an unprivileged -Xen domain: - -\begin{enumerate} -\item Set up Xen 2.0 and test that it's working, as described earlier in - this manual. - -\item Create disk images for root-fs and swap (alternatively, you - might create dedicated partitions, LVM logical volumes, etc. if - that suits your setup). -\begin{small}\begin{verbatim} -dd if=/dev/zero of=/path/diskimage bs=1024k count=size_in_mbytes -dd if=/dev/zero of=/path/swapimage bs=1024k count=size_in_mbytes -\end{verbatim}\end{small} - If you're going to use this filesystem / disk image only as a - `template' for other vm disk images, something like 300 MB should - be enough.. (of course it depends what kind of packages you are - planning to install to the template) - -\item Create the filesystem and initialise the swap image -\begin{small}\begin{verbatim} -mkfs.ext3 /path/diskimage -mkswap /path/swapimage -\end{verbatim}\end{small} - -\item Mount the disk image for installation -\begin{small}\begin{verbatim} -mount -o loop /path/diskimage /mnt/disk -\end{verbatim}\end{small} - -\item Install \path{debootstrap} - -Make sure you have debootstrap installed on the host. If you are -running Debian sarge (3.1 / testing) or unstable you can install it by -running \path{apt-get install debootstrap}. Otherwise, it can be -downloaded from the Debian project website. - -\item Install Debian base to the disk image: -\begin{small}\begin{verbatim} -debootstrap --arch i386 sarge /mnt/disk \ - http://ftp.<countrycode>.debian.org/debian -\end{verbatim}\end{small} - -You can use any other Debian http/ftp mirror you want. - -\item When debootstrap completes successfully, modify settings: -\begin{small}\begin{verbatim} -chroot /mnt/disk /bin/bash -\end{verbatim}\end{small} - -Edit the following files using vi or nano and make needed changes: -\begin{small}\begin{verbatim} -/etc/hostname -/etc/hosts -/etc/resolv.conf -/etc/network/interfaces -/etc/networks -\end{verbatim}\end{small} - -Set up access to the services, edit: -\begin{small}\begin{verbatim} -/etc/hosts.deny -/etc/hosts.allow -/etc/inetd.conf -\end{verbatim}\end{small} - -Add Debian mirror to: -\begin{small}\begin{verbatim} -/etc/apt/sources.list -\end{verbatim}\end{small} - -Create fstab like this: -\begin{small}\begin{verbatim} -/dev/sda1 / ext3 errors=remount-ro 0 1 -/dev/sda2 none swap sw 0 0 -proc /proc proc defaults 0 0 -\end{verbatim}\end{small} - -Logout - -\item Unmount the disk image -\begin{small}\begin{verbatim} -umount /mnt/disk -\end{verbatim}\end{small} - -\item Create Xen 2.0 configuration file for the new domain. You can - use the example-configurations coming with Xen as a template. - - Make sure you have the following set up: -\begin{small}\begin{verbatim} -disk = [ 'file:/path/diskimage,sda1,w', 'file:/path/swapimage,sda2,w' ] -root = "/dev/sda1 ro" -\end{verbatim}\end{small} - -\item Start the new domain -\begin{small}\begin{verbatim} -xm create -f domain_config_file -\end{verbatim}\end{small} - -Check that the new domain is running: -\begin{small}\begin{verbatim} -xm list -\end{verbatim}\end{small} - -\item Attach to the console of the new domain. - You should see something like this when starting the new domain: - -\begin{small}\begin{verbatim} -Started domain testdomain2, console on port 9626 -\end{verbatim}\end{small} - - There you can see the ID of the console: 26. You can also list - the consoles with \path{xm consoles} (ID is the last two - digits of the port number.) - - Attach to the console: - -\begin{small}\begin{verbatim} -xm console 26 -\end{verbatim}\end{small} - - or by telnetting to the port 9626 of localhost (the xm console - program works better). - -\item Log in and run base-config - - As a default there's no password for the root. - - Check that everything looks OK, and the system started without - errors. Check that the swap is active, and the network settings are - correct. - - Run \path{/usr/sbin/base-config} to set up the Debian settings. - - Set up the password for root using passwd. - -\item Done. You can exit the console by pressing \path{Ctrl + ]} - -\end{enumerate} - -If you need to create new domains, you can just copy the contents of -the `template'-image to the new disk images, either by mounting the -template and the new image, and using \path{cp -a} or \path{tar} or by -simply copying the image file. Once this is done, modify the -image-specific settings (hostname, network settings, etc). - -\chapter{Installing Xen / XenLinux on Redhat or Fedora Core} - -When using Xen / XenLinux on a standard Linux distribution there are -a couple of things to watch out for: - -Note that, because domains>0 don't have any privileged access at all, -certain commands in the default boot sequence will fail e.g. attempts -to update the hwclock, change the console font, update the keytable -map, start apmd (power management), or gpm (mouse cursor). Either -ignore the errors (they should be harmless), or remove them from the -startup scripts. Deleting the following links are a good start: -{\path{S24pcmcia}}, {\path{S09isdn}}, -{\path{S17keytable}}, {\path{S26apmd}}, -{\path{S85gpm}}. - -If you want to use a single root file system that works cleanly for -both domain 0 and unprivileged domains, a useful trick is to use -different 'init' run levels. For example, use -run level 3 for domain 0, and run level 4 for other domains. This -enables different startup scripts to be run in depending on the run -level number passed on the kernel command line. - -If using NFS root files systems mounted either from an -external server or from domain0 there are a couple of other gotchas. -The default {\path{/etc/sysconfig/iptables}} rules block NFS, so part -way through the boot sequence things will suddenly go dead. - -If you're planning on having a separate NFS {\path{/usr}} partition, the -RH9 boot scripts don't make life easy - they attempt to mount NFS file -systems way to late in the boot process. The easiest way I found to do -this was to have a {\path{/linuxrc}} script run ahead of -{\path{/sbin/init}} that mounts {\path{/usr}}: - -\begin{quote} -\begin{small}\begin{verbatim} - #!/bin/bash - /sbin/ipconfig lo 127.0.0.1 - /sbin/portmap - /bin/mount /usr - exec /sbin/init "$@" <>/dev/console 2>&1 -\end{verbatim}\end{small} -\end{quote} - -%$ XXX SMH: font lock fix :-) - -The one slight complication with the above is that -{\path{/sbin/portmap}} is dynamically linked against -{\path{/usr/lib/libwrap.so.0}} Since this is in -{\path{/usr}}, it won't work. This can be solved by copying the -file (and link) below the /usr mount point, and just let the file be -'covered' when the mount happens. - -In some installations, where a shared read-only {\path{/usr}} is -being used, it may be desirable to move other large directories over -into the read-only {\path{/usr}}. For example, you might replace -{\path{/bin}}, {\path{/lib}} and {\path{/sbin}} with -links into {\path{/usr/root/bin}}, {\path{/usr/root/lib}} -and {\path{/usr/root/sbin}} respectively. This creates other -problems for running the {\path{/linuxrc}} script, requiring -bash, portmap, mount, ifconfig, and a handful of other shared -libraries to be copied below the mount point --- a simple -statically-linked C program would solve this problem. - - - - -\chapter{Glossary of Terms} - -\begin{description} -\item[Atropos] One of the CPU schedulers provided by Xen. - Atropos provides domains with absolute shares - of the CPU, with timeliness guarantees and a - mechanism for sharing out `slack time'. - -\item[BVT] The BVT scheduler is used to give proportional - fair shares of the CPU to domains. - -\item[Exokernel] A minimal piece of privileged code, similar to - a {\bf microkernel} but providing a more - `hardware-like' interface to the tasks it - manages. This is similar to a paravirtualising - VMM like {\bf Xen} but was designed as a new - operating system structure, rather than - specifically to run multiple conventional OSs. - -\item[Domain] A domain is the execution context that - contains a running {\bf virtual machine}. - The relationship between virtual machines - and domains on Xen is similar to that between - programs and processes in an operating - system: a virtual machine is a persistent - entity that resides on disk (somewhat like - a program). When it is loaded for execution, - it runs in a domain. Each domain has a - {\bf domain ID}. - -\item[Domain 0] The first domain to be started on a Xen - machine. Domain 0 is responsible for managing - the system. - -\item[Domain ID] A unique identifier for a {\bf domain}, - analogous to a process ID in an operating - system. - -\item[Full virtualisation] An approach to virtualisation which - requires no modifications to the hosted - operating system, providing the illusion of - a complete system of real hardware devices. - -\item[Hypervisor] An alternative term for {\bf VMM}, used - because it means `beyond supervisor', - since it is responsible for managing multiple - `supervisor' kernels. - -\item[Live migration] A technique for moving a running virtual - machine to another physical host, without - stopping it or the services running on it. - -\item[Microkernel] A small base of code running at the highest - hardware privilege level. A microkernel is - responsible for sharing CPU and memory (and - sometimes other devices) between less - privileged tasks running on the system. - This is similar to a VMM, particularly a - {\bf paravirtualising} VMM but typically - addressing a different problem space and - providing different kind of interface. - -\item[NetBSD/Xen] A port of NetBSD to the Xen architecture. - -\item[Paravirtualisation] An approach to virtualisation which requires - modifications to the operating system in - order to run in a virtual machine. Xen - uses paravirtualisation but preserves - binary compatibility for user space - applications. - -\item[Shadow pagetables] A technique for hiding the layout of machine - memory from a virtual machine's operating - system. Used in some {\bf VMMs} to provide - the illusion of contiguous physical memory, - in Xen this is used during - {\bf live migration}. - -\item[Virtual Machine] The environment in which a hosted operating - system runs, providing the abstraction of a - dedicated machine. A virtual machine may - be identical to the underlying hardware (as - in {\bf full virtualisation}, or it may - differ, as in {\bf paravirtualisation}. - -\item[VMM] Virtual Machine Monitor - the software that - allows multiple virtual machines to be - multiplexed on a single physical machine. - -\item[Xen] Xen is a paravirtualising virtual machine - monitor, developed primarily by the - Systems Research Group at the University - of Cambridge Computer Laboratory. - -\item[XenLinux] Official name for the port of the Linux kernel - that runs on Xen. - -\end{description} +%% Chapter Installing Xen / XenLinux on Debian moved to debian.tex +\include{src/user/debian} + +%% Chapter Installing Xen on Red Hat moved to redhat.tex +\include{src/user/redhat} + + +%% Chapter Glossary of Terms moved to glossary.tex +\include{src/user/glossary} \end{document} diff -r a6b72464a042 -r f8e7af29daa1 tools/firmware/acpi/acpi_madt.c --- a/tools/firmware/acpi/acpi_madt.c Tue Sep 20 09:43:29 2005 +++ b/tools/firmware/acpi/acpi_madt.c Tue Sep 20 09:43:46 2005 @@ -37,44 +37,7 @@ ACPI_LOCAL_APIC_ADDRESS, ACPI_MULTIPLE_APIC_FLAGS, }, - // - // LOCAL APIC Entries for 4 processors. - // - { - { - ACPI_PROCESSOR_LOCAL_APIC, - sizeof (ACPI_LOCAL_APIC_STRUCTURE), - 0x00, - 0x00, - 0x00000001, - }, - - { - ACPI_PROCESSOR_LOCAL_APIC, - sizeof (ACPI_LOCAL_APIC_STRUCTURE), - 0x01, - 0x00, - 0x00000000 - }, - - { - ACPI_PROCESSOR_LOCAL_APIC, - sizeof (ACPI_LOCAL_APIC_STRUCTURE), - 0x02, - 0x00, - 0x00000000 - }, - - { - ACPI_PROCESSOR_LOCAL_APIC, - sizeof (ACPI_LOCAL_APIC_STRUCTURE), - 0x03, - 0x00, - 0x00000000 - } - } - , - + // // IO APIC // @@ -87,5 +50,19 @@ ACPI_IO_APIC_ADDRESS_1, 0x0000 } + }, + + // + // LOCAL APIC Entries for up to 32 processors. + // + { + { + ACPI_PROCESSOR_LOCAL_APIC, + sizeof (ACPI_LOCAL_APIC_STRUCTURE), + 0x00, + 0x00, + 0x00000001, + } + } }; diff -r a6b72464a042 -r f8e7af29daa1 tools/firmware/acpi/acpi_madt.h --- a/tools/firmware/acpi/acpi_madt.h Tue Sep 20 09:43:29 2005 +++ b/tools/firmware/acpi/acpi_madt.h Tue Sep 20 09:43:46 2005 @@ -35,9 +35,9 @@ // #pragma pack (1) typedef struct { - ACPI_2_0_MADT Header; - ACPI_LOCAL_APIC_STRUCTURE LocalApic[4]; - ACPI_IO_APIC_STRUCTURE IoApic[1]; + ACPI_2_0_MADT Header; + ACPI_IO_APIC_STRUCTURE IoApic[1]; + ACPI_LOCAL_APIC_STRUCTURE LocalApic[32]; } ACPI_MULTIPLE_APIC_DESCRIPTION_TABLE; #pragma pack () diff -r a6b72464a042 -r f8e7af29daa1 tools/firmware/vmxassist/Makefile --- a/tools/firmware/vmxassist/Makefile Tue Sep 20 09:43:29 2005 +++ b/tools/firmware/vmxassist/Makefile Tue Sep 20 09:43:46 2005 @@ -41,9 +41,9 @@ all: vmxloader -vmxloader: roms.h vmxloader.c acpi.h - ${CC} ${CFLAGS} ${DEFINES} -c vmxloader.c - $(CC) -o vmxloader.tmp -m32 -nostdlib -Wl,-N -Wl,-Ttext -Wl,0x100000 vmxloader.o +vmxloader: roms.h vmxloader.c acpi.h acpi_madt.c + ${CC} ${CFLAGS} ${DEFINES} -c vmxloader.c -c acpi_madt.c + $(CC) -o vmxloader.tmp -m32 -nostdlib -Wl,-N -Wl,-Ttext -Wl,0x100000 vmxloader.o acpi_madt.o objcopy --change-addresses=0xC0000000 vmxloader.tmp vmxloader rm -f vmxloader.tmp diff -r a6b72464a042 -r f8e7af29daa1 tools/firmware/vmxassist/vmxloader.c --- a/tools/firmware/vmxassist/vmxloader.c Tue Sep 20 09:43:29 2005 +++ b/tools/firmware/vmxassist/vmxloader.c Tue Sep 20 09:43:46 2005 @@ -27,6 +27,7 @@ #ifdef _ACPI_ #include "acpi.h" #include "../acpi/acpi2_0.h" // for ACPI_PHYSICAL_ADDRESS +int acpi_madt_update(unsigned char* acpi_start); #endif @@ -110,7 +111,10 @@ } #ifdef _ACPI_ puts("Loading ACPI ...\n"); - if (ACPI_PHYSICAL_ADDRESS+sizeof(acpi) <= 0xF0000 ){ + + acpi_madt_update(acpi); + + if (ACPI_PHYSICAL_ADDRESS+sizeof(acpi) <= 0xF0000) { /* make sure acpi table does not overlap rombios * currently acpi less than 8K will be OK. */ diff -r a6b72464a042 -r f8e7af29daa1 tools/libxc/xc_vmx_build.c --- a/tools/libxc/xc_vmx_build.c Tue Sep 20 09:43:29 2005 +++ b/tools/libxc/xc_vmx_build.c Tue Sep 20 09:43:46 2005 @@ -105,6 +105,33 @@ nr_map++; mem_mapp->nr_map = nr_map; +} + +/* + * Use E820 reserved memory 0x9F800 to pass number of vcpus to vmxloader + * vmxloader will use it to config ACPI MADT table + */ +#define VCPU_MAGIC 0x76637075 /* "vcpu" */ +static int +set_nr_vcpus(int xc_handle, u32 dom, unsigned long *pfn_list, + struct domain_setup_info *dsi, unsigned long vcpus) +{ + char *va_map; + unsigned long *va_vcpus; + + va_map = xc_map_foreign_range( + xc_handle, dom, PAGE_SIZE, PROT_READ|PROT_WRITE, + pfn_list[(0x9F000 - dsi->v_start) >> PAGE_SHIFT]); + if ( va_map == NULL ) + return -1; + + va_vcpus = (unsigned long *)(va_map + 0x800); + *va_vcpus++ = VCPU_MAGIC; + *va_vcpus++ = vcpus; + + munmap(va_map, PAGE_SIZE); + + return 0; } #ifdef __i386__ @@ -496,7 +523,8 @@ MMU_MACHPHYS_UPDATE, count) ) goto error_out; } - + + set_nr_vcpus(xc_handle, dom, page_array, &dsi, vcpus); if ((boot_paramsp = xc_map_foreign_range( xc_handle, dom, PAGE_SIZE, PROT_READ|PROT_WRITE, diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm/Makefile --- a/tools/vtpm/Makefile Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm/Makefile Tue Sep 20 09:43:46 2005 @@ -4,7 +4,7 @@ include $(XEN_ROOT)/tools/vtpm/Rules.mk # Dir name for emulator (as dom0 tpm driver) -TPM_EMULATOR_DIR = tpm_emulator-0.2 +TPM_EMULATOR_DIR = tpm_emulator # Dir name for vtpm instance VTPM_DIR = vtpm @@ -13,7 +13,7 @@ all: build -build: $(TPM_EMULATOR_TARFILE) extract patch build_sub +build: $(TPM_EMULATOR_DIR) $(VTPM_DIR) build_sub install: build $(MAKE) -C $(TPM_EMULATOR_DIR) $@ @@ -26,36 +26,32 @@ if [ -d $(VTPM_DIR) ]; \ then $(MAKE) -C $(VTPM_DIR) clean; \ fi + +mrproper: + rm -f $(TPM_EMULATOR_TARFILE) rm -rf $(TPM_EMULATOR_DIR) rm -rf $(VTPM_DIR) - -mrproper: clean - rm -f $(TPM_EMULATOR_TARFILE) # Download Swiss emulator $(TPM_EMULATOR_TARFILE): wget http://download.berlios.de/tpm-emulator/$(TPM_EMULATOR_TARFILE) # Create vtpm and TPM emulator dirs -extract: $(TPM_EMULATOR_DIR)/README $(VTPM_DIR)/README - -$(TPM_EMULATOR_DIR)/README: - -rm -rf $(TPM_EMULATOR_DIR) - tar -xzf $(TPM_EMULATOR_TARFILE) - -$(VTPM_DIR)/README: - -rm -rf $(VTPM_DIR) - cp -r --preserve $(TPM_EMULATOR_DIR) $(VTPM_DIR) - # apply patches for 1) used as dom0 tpm driver 2) used as vtpm device instance -patch: $(TPM_EMULATOR_DIR)/Makefile $(VTPM_DIR)/Makefile - -$(TPM_EMULATOR_DIR)/Makefile: tpm_emulator.patch +$(TPM_EMULATOR_DIR): $(TPM_EMULATOR_TARFILE) + tar -xzf $(TPM_EMULATOR_TARFILE); + mv tpm_emulator-0.2 $(TPM_EMULATOR_DIR); + -cd $(TPM_EMULATOR_DIR); \ + patch -p1 < ../tpm_emulator-0.2b-x86_64.patch; \ patch -p1 <../tpm_emulator.patch -$(VTPM_DIR)/Makefile: vtpm.patch +$(VTPM_DIR): $(TPM_EMULATOR_TARFILE) + tar -xzf $(TPM_EMULATOR_TARFILE); + mv tpm_emulator-0.2 $(VTPM_DIR); + -cd $(VTPM_DIR); \ + patch -p1 < ../tpm_emulator-0.2b-x86_64.patch; \ patch -p1 <../vtpm.patch build_sub: diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm/README --- a/tools/vtpm/README Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm/README Tue Sep 20 09:43:46 2005 @@ -23,6 +23,7 @@ - xen-unstable - IBM frontend/backend vtpm driver patch - vtpm_managerd +- GNU MP Big number library (GMP) vtpmd Flow (for vtpm_manager. vtpmd never run by default) ============================ diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm/tpm_emulator.patch --- a/tools/vtpm/tpm_emulator.patch Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm/tpm_emulator.patch Tue Sep 20 09:43:46 2005 @@ -1,12 +1,12 @@ -diff -uprN orig/tpm_emulator-0.2/AUTHORS tpm_emulator-0.2/AUTHORS ---- orig/tpm_emulator-0.2/AUTHORS 2005-08-17 10:58:36.000000000 -0700 -+++ tpm_emulator-0.2/AUTHORS 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/AUTHORS tpm_emulator/AUTHORS +--- orig/tpm_emulator-0.2-x86_64/AUTHORS 2005-08-15 00:58:57.000000000 -0700 ++++ tpm_emulator/AUTHORS 2005-09-14 20:27:22.000000000 -0700 @@ -1 +1,2 @@ Mario Strasser <mast@xxxxxxx> +INTEL Corp <> -diff -uprN orig/tpm_emulator-0.2/ChangeLog tpm_emulator-0.2/ChangeLog ---- orig/tpm_emulator-0.2/ChangeLog 2005-08-17 10:58:36.000000000 -0700 -+++ tpm_emulator-0.2/ChangeLog 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/ChangeLog tpm_emulator/ChangeLog +--- orig/tpm_emulator-0.2-x86_64/ChangeLog 2005-08-15 00:58:57.000000000 -0700 ++++ tpm_emulator/ChangeLog 2005-09-14 20:27:22.000000000 -0700 @@ -1,3 +1,7 @@ +2005-08-16: INTEL Corp + * Set default permissions to PCRs @@ -15,10 +15,29 @@ 2005-08-15 Mario Strasser <mast@xxxxxxx> * all: some typos corrected * tpm_integrity.c: bug in TPM_Extend fixed -diff -uprN orig/tpm_emulator-0.2/Makefile tpm_emulator-0.2/Makefile ---- orig/tpm_emulator-0.2/Makefile 2005-08-17 10:58:36.000000000 -0700 -+++ tpm_emulator-0.2/Makefile 2005-08-17 10:55:52.000000000 -0700 -@@ -1,15 +1,19 @@ +diff -uprN orig/tpm_emulator-0.2-x86_64/linux_module.h tpm_emulator/linux_module.h +--- orig/tpm_emulator-0.2-x86_64/linux_module.h 2005-09-15 19:21:14.844078720 -0700 ++++ tpm_emulator/linux_module.h 2005-09-14 20:27:22.000000000 -0700 +@@ -1,5 +1,6 @@ + /* Software-Based Trusted Platform Module (TPM) Emulator for Linux + * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, ++ * Copyright (C) 2005 INTEL Corp. + * + * This module is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published +@@ -35,7 +36,7 @@ + #include "tpm_version.h" + + #define TPM_DEVICE_MINOR 224 +-#define TPM_DEVICE_NAME "tpm" ++#define TPM_DEVICE_NAME "tpm0" + #define TPM_MODULE_NAME "tpm_emulator" + + /* debug and log output functions */ +diff -uprN orig/tpm_emulator-0.2-x86_64/Makefile tpm_emulator/Makefile +--- orig/tpm_emulator-0.2-x86_64/Makefile 2005-09-15 19:21:14.845078568 -0700 ++++ tpm_emulator/Makefile 2005-09-14 20:27:22.000000000 -0700 +@@ -1,16 +1,20 @@ # Software-Based Trusted Platform Module (TPM) Emulator for Linux # Copyright (C) 2004 Mario Strasser <mast@xxxxxxx> +# Copyright (C) 2005 INTEL Corp. @@ -33,6 +52,7 @@ -KERNEL_BUILD := /lib/modules/$(KERNEL_RELEASE)/build +KERNEL_BUILD := $(XEN_ROOT)/linux-2.6.12-xen0 MOD_SUBDIR := misc + COMPILE_ARCH ?= $(shell uname -m | sed -e s/i.86/x86_32/) # module settings -MODULE_NAME := tpm_emulator @@ -40,7 +60,7 @@ VERSION_MAJOR := 0 VERSION_MINOR := 2 VERSION_BUILD := $(shell date +"%s") -@@ -27,11 +30,9 @@ DIRS := . crypto tpm +@@ -34,11 +38,9 @@ DIRS := . crypto tpm SRCS := $(foreach dir, $(DIRS), $(wildcard $(src)/$(dir)/*.c)) OBJS := $(patsubst %.c, %.o, $(SRCS)) SRCS += $(foreach dir, $(DIRS), $(wildcard $(src)/$(dir)/*.h)) @@ -54,7 +74,7 @@ EXTRA_CFLAGS += -I$(src) -I$(src)/crypto -I$(src)/tpm -@@ -42,23 +43,17 @@ all: $(src)/crypto/gmp.h $(src)/crypto/l +@@ -49,23 +51,17 @@ all: $(src)/crypto/gmp.h $(src)/crypto/l @$(MAKE) -C $(KERNEL_BUILD) M=$(CURDIR) modules install: @@ -84,9 +104,9 @@ $(src)/crypto/libgmp.a: test -f $(src)/crypto/libgmp.a || ln -s $(GMP_LIB) $(src)/crypto/libgmp.a -diff -uprN orig/tpm_emulator-0.2/README tpm_emulator-0.2/README ---- orig/tpm_emulator-0.2/README 2005-08-17 10:58:36.000000000 -0700 -+++ tpm_emulator-0.2/README 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/README tpm_emulator/README +--- orig/tpm_emulator-0.2-x86_64/README 2005-08-15 00:58:57.000000000 -0700 ++++ tpm_emulator/README 2005-09-14 20:27:22.000000000 -0700 @@ -13,7 +13,8 @@ $Id: README 8 2005-01-25 21:11:45Z jmoli Copyright -------------------------------------------------------------------------- @@ -97,28 +117,9 @@ This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by -diff -uprN orig/tpm_emulator-0.2/linux_module.h tpm_emulator-0.2/linux_module.h ---- orig/tpm_emulator-0.2/linux_module.h 2005-08-17 10:58:36.000000000 -0700 -+++ tpm_emulator-0.2/linux_module.h 2005-08-17 10:55:52.000000000 -0700 -@@ -1,5 +1,6 @@ - /* Software-Based Trusted Platform Module (TPM) Emulator for Linux - * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, -+ * Copyright (C) 2005 INTEL Corp. - * - * This module is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published -@@ -33,7 +34,7 @@ - #include "tpm_version.h" - - #define TPM_DEVICE_MINOR 224 --#define TPM_DEVICE_NAME "tpm" -+#define TPM_DEVICE_NAME "tpm0" - #define TPM_MODULE_NAME "tpm_emulator" - - /* debug and log output functions */ -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_data.c tpm_emulator-0.2/tpm/tpm_data.c ---- orig/tpm_emulator-0.2/tpm/tpm_data.c 2005-08-17 10:58:36.000000000 -0700 -+++ tpm_emulator-0.2/tpm/tpm_data.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_data.c tpm_emulator/tpm/tpm_data.c +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_data.c 2005-09-15 19:21:14.847078264 -0700 ++++ tpm_emulator/tpm/tpm_data.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -139,13 +140,3 @@ tpmData.permanent.data.pcrAttrib[i].pcrReset = TRUE; } /* set tick type */ -diff -uprN orig/tpm_emulator-0.2/tpm_version.h tpm_emulator-0.2/tpm_version.h ---- orig/tpm_emulator-0.2/tpm_version.h 2005-08-17 10:58:36.000000000 -0700 -+++ tpm_emulator-0.2/tpm_version.h 2005-08-17 10:55:53.000000000 -0700 -@@ -2,5 +2,5 @@ - #define _TPM_VERSION_H_ - #define VERSION_MAJOR 0 - #define VERSION_MINOR 2 --#define VERSION_BUILD 1123950310 -+#define VERSION_BUILD 1124301353 - #endif /* _TPM_VERSION_H_ */ diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm/vtpm.patch --- a/tools/vtpm/vtpm.patch Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm/vtpm.patch Tue Sep 20 09:43:46 2005 @@ -1,12 +1,12 @@ -diff -uprN orig/tpm_emulator-0.2/AUTHORS vtpm/AUTHORS ---- orig/tpm_emulator-0.2/AUTHORS 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/AUTHORS 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/AUTHORS vtpm/AUTHORS +--- orig/tpm_emulator-0.2-x86_64/AUTHORS 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/AUTHORS 2005-09-14 20:27:22.000000000 -0700 @@ -1 +1,2 @@ Mario Strasser <mast@xxxxxxx> +INTEL Corp <> -diff -uprN orig/tpm_emulator-0.2/ChangeLog vtpm/ChangeLog ---- orig/tpm_emulator-0.2/ChangeLog 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/ChangeLog 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/ChangeLog vtpm/ChangeLog +--- orig/tpm_emulator-0.2-x86_64/ChangeLog 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/ChangeLog 2005-09-14 20:27:22.000000000 -0700 @@ -1,3 +1,7 @@ +2005-08-16 Intel Corp + Moved module out of kernel to run as a ring 3 app @@ -15,115 +15,9 @@ 2005-08-15 Mario Strasser <mast@xxxxxxx> * all: some typos corrected * tpm_integrity.c: bug in TPM_Extend fixed -diff -uprN orig/tpm_emulator-0.2/Makefile vtpm/Makefile ---- orig/tpm_emulator-0.2/Makefile 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/Makefile 2005-08-17 10:55:52.000000000 -0700 -@@ -1,21 +1,29 @@ - # Software-Based Trusted Platform Module (TPM) Emulator for Linux - # Copyright (C) 2004 Mario Strasser <mast@xxxxxxx> -+# Copyright (C) 2005 INTEL Corp. - # - # $Id: Makefile 10 2005-04-26 20:59:50Z mast $ - --# kernel settings --KERNEL_RELEASE := $(shell uname -r) --KERNEL_BUILD := /lib/modules/$(KERNEL_RELEASE)/build --MOD_SUBDIR := misc -- - # module settings --MODULE_NAME := tpm_emulator -+BIN := vtpmd - VERSION_MAJOR := 0 - VERSION_MINOR := 2 - VERSION_BUILD := $(shell date +"%s") - --# enable/disable DEBUG messages --EXTRA_CFLAGS += -DDEBUG -g -+# Installation program and options -+INSTALL = install -+INSTALL_PROG = $(INSTALL) -m0755 -+INSTALL_DIR = $(INSTALL) -d -m0755 -+ -+# Xen tools installation directory -+TOOLS_INSTALL_DIR = $(DESTDIR)/usr/bin -+ -+CC := gcc -+CFLAGS += -g -Wall $(INCLUDE) -DDEBUG -+CFLAGS += -I. -Itpm -+ -+# Is the simulator running in it's own vm? -+#CFLAGS += -DVTPM_MULTI_VM - - # GNU MP configuration - GMP_LIB := /usr/lib/libgmp.a -@@ -27,38 +35,31 @@ DIRS := . crypto tpm - SRCS := $(foreach dir, $(DIRS), $(wildcard $(src)/$(dir)/*.c)) - OBJS := $(patsubst %.c, %.o, $(SRCS)) - SRCS += $(foreach dir, $(DIRS), $(wildcard $(src)/$(dir)/*.h)) --DISTSRC := ./README ./AUTHORS ./ChangeLog ./Makefile $(SRCS) --DISTDIR := tpm_emulator-$(VERSION_MAJOR).$(VERSION_MINOR) - --obj-m := $(MODULE_NAME).o --$(MODULE_NAME)-objs := $(patsubst $(src)/%.o, %.o, $(OBJS)) crypto/libgmp.a -+obj-m := $(BIN) -+$(BIN)-objs := $(patsubst $(src)/%.o, %.o, $(OBJS)) crypto/libgmp.a - - EXTRA_CFLAGS += -I$(src) -I$(src)/crypto -I$(src)/tpm - - # do not print "Entering directory ..." - MAKEFLAGS += --no-print-directory - --all: $(src)/crypto/gmp.h $(src)/crypto/libgmp.a version -- @$(MAKE) -C $(KERNEL_BUILD) M=$(CURDIR) modules -+all: $(BIN) -+ -+$(BIN): $(src)/crypto/gmp.h $(src)/crypto/libgmp.a version $(SRCS) $(OBJS) -+ $(CC) $(CFLAGS) $(OBJS) $(src)/crypto/libgmp.a -o $(BIN) -+ -+%.o: %.c -+ $(CC) $(CFLAGS) -c $< -o $@ - - install: -- @$(MAKE) -C $(KERNEL_BUILD) M=$(CURDIR) modules_install -- test -d /var/tpm || mkdir /var/tpm -- test -c /dev/tpm || mknod /dev/tpm c 10 224 -- chmod 666 /dev/tpm -- depmod -a -+ $(INSTALL_PROG) $(BIN) $(TOOLS_INSTALL_DIR) - - clean: -- @$(MAKE) -C $(KERNEL_BUILD) M=$(CURDIR) clean -- rm -f $(src)/crypto/gmp.h $(src)/crypto/libgmp.a -+ rm -f $(src)/crypto/gmp.h $(src)/crypto/libgmp.a $(OBJS) - --dist: $(DISTSRC) -- rm -rf $(DISTDIR) -- mkdir $(DISTDIR) -- cp --parents $(DISTSRC) $(DISTDIR)/ -- rm -f $(DISTDIR)/crypto/gmp.h -- tar -chzf $(DISTDIR).tar.gz $(DISTDIR) -- rm -rf $(DISTDIR) -+mrproper: clean -+ rm -f $(BIN) - - $(src)/crypto/libgmp.a: - test -f $(src)/crypto/libgmp.a || ln -s $(GMP_LIB) $(src)/crypto/libgmp.a -diff -uprN orig/tpm_emulator-0.2/README vtpm/README ---- orig/tpm_emulator-0.2/README 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/README 2005-08-17 10:55:52.000000000 -0700 -@@ -13,7 +13,8 @@ $Id: README 8 2005-01-25 21:11:45Z jmoli - Copyright - -------------------------------------------------------------------------- - Copyright (C) 2004 Mario Strasser <mast@xxxxxxx> and Swiss Federal --Institute of Technology (ETH) Zurich. -+ Institute of Technology (ETH) Zurich. -+Copyright (C) 2005 INTEL Corp - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by -diff -uprN orig/tpm_emulator-0.2/crypto/gmp_kernel_wrapper.c vtpm/crypto/gmp_kernel_wrapper.c ---- orig/tpm_emulator-0.2/crypto/gmp_kernel_wrapper.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/crypto/gmp_kernel_wrapper.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/crypto/gmp_kernel_wrapper.c vtpm/crypto/gmp_kernel_wrapper.c +--- orig/tpm_emulator-0.2-x86_64/crypto/gmp_kernel_wrapper.c 2005-09-15 19:21:42.508873032 -0700 ++++ vtpm/crypto/gmp_kernel_wrapper.c 2005-09-15 19:25:37.319176440 -0700 @@ -1,5 +1,6 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -154,9 +48,9 @@ { - void *ret = (void*)kmalloc(size, GFP_KERNEL); - if (!ret) panic(KERN_CRIT TPM_MODULE_NAME -- "GMP: cannot allocate memory (size=%u)\n", size); +- "GMP: cannot allocate memory (size=%Zu)\n", size); + void *ret = (void*)malloc(size); -+ if (!ret) error("GMP: cannot allocate memory (size=%u)\n", size); ++ if (!ret) error("GMP: cannot allocate memory (size=%Zu)\n", size); return ret; } @@ -165,9 +59,10 @@ { - void *ret = (void*)kmalloc(new_size, GFP_KERNEL); - if (!ret) panic(KERN_CRIT TPM_MODULE_NAME "GMP: Cannot reallocate memory " +- "(old_size=%Zu new_size=%Zu)\n", old_size, new_size); + void *ret = (void*)malloc(new_size); + if (!ret) error("GMP: Cannot reallocate memory " - "(old_size=%u new_size=%u)\n", old_size, new_size); ++ "(old_size=%Zu new_size=%Zu)\n", old_size, new_size); memcpy(ret, oldptr, old_size); - kfree(oldptr); + free(oldptr); @@ -183,9 +78,9 @@ } } -diff -uprN orig/tpm_emulator-0.2/crypto/rsa.c vtpm/crypto/rsa.c ---- orig/tpm_emulator-0.2/crypto/rsa.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/crypto/rsa.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/crypto/rsa.c vtpm/crypto/rsa.c +--- orig/tpm_emulator-0.2-x86_64/crypto/rsa.c 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/crypto/rsa.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,5 +1,6 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -211,8 +106,8 @@ sha1_final(&ctx, &msg[1]); if (memcmp(&msg[1], &msg[1 + SHA1_DIGEST_LENGTH], SHA1_DIGEST_LENGTH) != 0) return -1; -diff -uprN orig/tpm_emulator-0.2/linux_module.c vtpm/linux_module.c ---- orig/tpm_emulator-0.2/linux_module.c 2005-08-17 10:58:36.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/linux_module.c vtpm/linux_module.c +--- orig/tpm_emulator-0.2-x86_64/linux_module.c 2005-09-15 19:22:40.343080896 -0700 +++ vtpm/linux_module.c 1969-12-31 16:00:00.000000000 -0800 @@ -1,163 +0,0 @@ -/* Software-Based Trusted Platform Module (TPM) Emulator for Linux @@ -283,7 +178,7 @@ - -static ssize_t tpm_read(struct file *file, char *buf, size_t count, loff_t *ppos) -{ -- debug("%s(%d)", __FUNCTION__, count); +- debug("%s(%Zu)", __FUNCTION__, count); - down(&tpm_mutex); - if (tpm_response.data != NULL) { - count = min(count, (size_t)tpm_response.size - (size_t)*ppos); @@ -298,7 +193,7 @@ - -static ssize_t tpm_write(struct file *file, const char *buf, size_t count, loff_t *ppos) -{ -- debug("%s(%d)", __FUNCTION__, count); +- debug("%s(%Zu)", __FUNCTION__, count); - down(&tpm_mutex); - *ppos = 0; - if (tpm_response.data != NULL) kfree(tpm_response.data); @@ -378,9 +273,9 @@ - return (ticks > 0) ? ticks : 1; -} - -diff -uprN orig/tpm_emulator-0.2/linux_module.h vtpm/linux_module.h ---- orig/tpm_emulator-0.2/linux_module.h 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/linux_module.h 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/linux_module.h vtpm/linux_module.h +--- orig/tpm_emulator-0.2-x86_64/linux_module.h 2005-09-15 19:21:14.844078720 -0700 ++++ vtpm/linux_module.h 2005-09-14 20:27:22.000000000 -0700 @@ -1,5 +1,6 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -416,17 +311,20 @@ +/* module settings */ +#define min(A,B) ((A)<(B)?(A):(B)) + #ifndef STR #define STR(s) __STR__(s) #define __STR__(s) #s - #include "tpm_version.h" -@@ -39,32 +45,35 @@ +@@ -39,34 +45,38 @@ + #define TPM_MODULE_NAME "tpm_emulator" + /* debug and log output functions */ ++extern int dmi_id; #ifdef DEBUG -#define debug(fmt, ...) printk(KERN_DEBUG "%s %s:%d: Debug: " fmt "\n", \ - TPM_MODULE_NAME, __FILE__, __LINE__, ## __VA_ARGS__) -+#define debug(fmt, ...) printf("%s:%d: Debug: " fmt "\n", \ -+ __FILE__, __LINE__, ## __VA_ARGS__) ++#define debug(fmt, ...) printf("TPMD[%d]: %s:%d: Debug: " fmt "\n", \ ++ dmi_id, __FILE__, __LINE__, ## __VA_ARGS__) #else #define debug(fmt, ...) #endif @@ -436,12 +334,12 @@ - TPM_MODULE_NAME, __FILE__, __LINE__, ## __VA_ARGS__) -#define alert(fmt, ...) printk(KERN_ALERT "%s %s:%d: Alert: " fmt "\n", \ - TPM_MODULE_NAME, __FILE__, __LINE__, ## __VA_ARGS__) -+#define info(fmt, ...) printf("%s:%d: Info: " fmt "\n", \ -+ __FILE__, __LINE__, ## __VA_ARGS__) -+#define error(fmt, ...) printf("%s:%d: Error: " fmt "\n", \ -+ __FILE__, __LINE__, ## __VA_ARGS__) -+#define alert(fmt, ...) printf("%s:%d: Alert: " fmt "\n", \ -+ __FILE__, __LINE__, ## __VA_ARGS__) ++#define info(fmt, ...) printf("TPMD[%d]: %s:%d: Info: " fmt "\n", \ ++ dmi_id, __FILE__, __LINE__, ## __VA_ARGS__) ++#define error(fmt, ...) printf("TPMD[%d]: %s:%d: Error: " fmt "\n", \ ++ dmi_id, __FILE__, __LINE__, ## __VA_ARGS__) ++#define alert(fmt, ...) printf("TPMD[%d]: %s:%d: Alert: " fmt "\n", \ ++ dmi_id, __FILE__, __LINE__, ## __VA_ARGS__) /* memory allocation */ @@ -465,7 +363,7 @@ static inline void tpm_get_random_bytes(void *buf, int nbytes) { get_random_bytes(buf, nbytes); -@@ -84,9 +93,9 @@ uint64_t tpm_get_ticks(void); +@@ -86,9 +96,9 @@ uint64_t tpm_get_ticks(void); #define CPU_TO_LE16(x) __cpu_to_le16(x) #define BE64_TO_CPU(x) __be64_to_cpu(x) @@ -477,9 +375,116 @@ #define BE16_TO_CPU(x) __be16_to_cpu(x) #define LE16_TO_CPU(x) __le16_to_cpu(x) -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_audit.c vtpm/tpm/tpm_audit.c ---- orig/tpm_emulator-0.2/tpm/tpm_audit.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_audit.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/Makefile vtpm/Makefile +--- orig/tpm_emulator-0.2-x86_64/Makefile 2005-09-15 19:21:14.845078568 -0700 ++++ vtpm/Makefile 2005-09-14 20:27:22.000000000 -0700 +@@ -1,22 +1,31 @@ + # Software-Based Trusted Platform Module (TPM) Emulator for Linux + # Copyright (C) 2004 Mario Strasser <mast@xxxxxxx> ++# Copyright (C) 2005 INTEL Corp. + # + # $Id: Makefile 10 2005-04-26 20:59:50Z mast $ + +-# kernel settings +-KERNEL_RELEASE := $(shell uname -r) +-KERNEL_BUILD := /lib/modules/$(KERNEL_RELEASE)/build +-MOD_SUBDIR := misc + COMPILE_ARCH ?= $(shell uname -m | sed -e s/i.86/x86_32/) + + # module settings +-MODULE_NAME := tpm_emulator ++BIN := vtpmd + VERSION_MAJOR := 0 + VERSION_MINOR := 2 + VERSION_BUILD := $(shell date +"%s") + +-# enable/disable DEBUG messages +-EXTRA_CFLAGS += -DDEBUG -g ++# Installation program and options ++INSTALL = install ++INSTALL_PROG = $(INSTALL) -m0755 ++INSTALL_DIR = $(INSTALL) -d -m0755 ++ ++# Xen tools installation directory ++TOOLS_INSTALL_DIR = $(DESTDIR)/usr/bin ++ ++CC := gcc ++CFLAGS += -g -Wall $(INCLUDE) -DDEBUG ++CFLAGS += -I. -Itpm ++ ++# Is the simulator running in it's own vm? ++#CFLAGS += -DVTPM_MULTI_VM + + ifeq ($(COMPILE_ARCH),x86_64) + LIBDIR = lib64 +@@ -34,38 +43,31 @@ DIRS := . crypto tpm + SRCS := $(foreach dir, $(DIRS), $(wildcard $(src)/$(dir)/*.c)) + OBJS := $(patsubst %.c, %.o, $(SRCS)) + SRCS += $(foreach dir, $(DIRS), $(wildcard $(src)/$(dir)/*.h)) +-DISTSRC := ./README ./AUTHORS ./ChangeLog ./Makefile $(SRCS) +-DISTDIR := tpm_emulator-$(VERSION_MAJOR).$(VERSION_MINOR) + +-obj-m := $(MODULE_NAME).o +-$(MODULE_NAME)-objs := $(patsubst $(src)/%.o, %.o, $(OBJS)) crypto/libgmp.a ++obj-m := $(BIN) ++$(BIN)-objs := $(patsubst $(src)/%.o, %.o, $(OBJS)) crypto/libgmp.a + + EXTRA_CFLAGS += -I$(src) -I$(src)/crypto -I$(src)/tpm + + # do not print "Entering directory ..." + MAKEFLAGS += --no-print-directory + +-all: $(src)/crypto/gmp.h $(src)/crypto/libgmp.a version +- @$(MAKE) -C $(KERNEL_BUILD) M=$(CURDIR) modules ++all: $(BIN) ++ ++$(BIN): $(src)/crypto/gmp.h $(src)/crypto/libgmp.a version $(SRCS) $(OBJS) ++ $(CC) $(CFLAGS) $(OBJS) $(src)/crypto/libgmp.a -o $(BIN) ++ ++%.o: %.c ++ $(CC) $(CFLAGS) -c $< -o $@ + + install: +- @$(MAKE) -C $(KERNEL_BUILD) M=$(CURDIR) modules_install +- test -d /var/tpm || mkdir /var/tpm +- test -c /dev/tpm || mknod /dev/tpm c 10 224 +- chmod 666 /dev/tpm +- depmod -a ++ $(INSTALL_PROG) $(BIN) $(TOOLS_INSTALL_DIR) + + clean: +- @$(MAKE) -C $(KERNEL_BUILD) M=$(CURDIR) clean +- rm -f $(src)/crypto/gmp.h $(src)/crypto/libgmp.a ++ rm -f $(src)/crypto/gmp.h $(src)/crypto/libgmp.a $(OBJS) + +-dist: $(DISTSRC) +- rm -rf $(DISTDIR) +- mkdir $(DISTDIR) +- cp --parents $(DISTSRC) $(DISTDIR)/ +- rm -f $(DISTDIR)/crypto/gmp.h +- tar -chzf $(DISTDIR).tar.gz $(DISTDIR) +- rm -rf $(DISTDIR) ++mrproper: clean ++ rm -f $(BIN) tpm_version.h + + $(src)/crypto/libgmp.a: + test -f $(src)/crypto/libgmp.a || ln -s $(GMP_LIB) $(src)/crypto/libgmp.a +diff -uprN orig/tpm_emulator-0.2-x86_64/README vtpm/README +--- orig/tpm_emulator-0.2-x86_64/README 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/README 2005-09-14 20:27:22.000000000 -0700 +@@ -13,7 +13,8 @@ $Id: README 8 2005-01-25 21:11:45Z jmoli + Copyright + -------------------------------------------------------------------------- + Copyright (C) 2004 Mario Strasser <mast@xxxxxxx> and Swiss Federal +-Institute of Technology (ETH) Zurich. ++ Institute of Technology (ETH) Zurich. ++Copyright (C) 2005 INTEL Corp + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_audit.c vtpm/tpm/tpm_audit.c +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_audit.c 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/tpm/tpm_audit.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -542,9 +547,9 @@ return TPM_SUCCESS; } - -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_authorization.c vtpm/tpm/tpm_authorization.c ---- orig/tpm_emulator-0.2/tpm/tpm_authorization.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_authorization.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_authorization.c vtpm/tpm/tpm_authorization.c +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_authorization.c 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/tpm/tpm_authorization.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -568,9 +573,9 @@ } - - -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_capability.c vtpm/tpm/tpm_capability.c ---- orig/tpm_emulator-0.2/tpm/tpm_capability.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_capability.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_capability.c vtpm/tpm/tpm_capability.c +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_capability.c 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/tpm/tpm_capability.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -593,9 +598,9 @@ } } - -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_cmd_handler.c vtpm/tpm/tpm_cmd_handler.c ---- orig/tpm_emulator-0.2/tpm/tpm_cmd_handler.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_cmd_handler.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_cmd_handler.c vtpm/tpm/tpm_cmd_handler.c +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_cmd_handler.c 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/tpm/tpm_cmd_handler.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -658,9 +663,9 @@ return 0; } - -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_crypto.c vtpm/tpm/tpm_crypto.c ---- orig/tpm_emulator-0.2/tpm/tpm_crypto.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_crypto.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_crypto.c vtpm/tpm/tpm_crypto.c +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_crypto.c 2005-09-15 19:21:14.846078416 -0700 ++++ vtpm/tpm/tpm_crypto.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -678,14 +683,14 @@ memcpy(&buf[30], areaToSign, areaToSignSize); if (rsa_sign(&key->key, RSA_SSA_PKCS1_SHA1, buf, areaToSignSize + 30, *sig)) { -@@ -379,4 +380,3 @@ TPM_RESULT TPM_CertifyKey2(TPM_KEY_HANDL +@@ -383,4 +384,3 @@ TPM_RESULT TPM_CertifyKey2(TPM_KEY_HANDL } return TPM_SUCCESS; } - -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_data.c vtpm/tpm/tpm_data.c ---- orig/tpm_emulator-0.2/tpm/tpm_data.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_data.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_data.c vtpm/tpm/tpm_data.c +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_data.c 2005-09-15 19:21:14.847078264 -0700 ++++ vtpm/tpm/tpm_data.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -1005,7 +1010,7 @@ } #else -@@ -231,7 +431,6 @@ int tpm_restore_permanent_data(void) +@@ -232,7 +432,6 @@ int tpm_restore_permanent_data(void) int tpm_erase_permanent_data(void) { @@ -1014,9 +1019,9 @@ return res; } - -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_deprecated.c vtpm/tpm/tpm_deprecated.c ---- orig/tpm_emulator-0.2/tpm/tpm_deprecated.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_deprecated.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_deprecated.c vtpm/tpm/tpm_deprecated.c +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_deprecated.c 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/tpm/tpm_deprecated.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -1043,9 +1048,9 @@ authContextSize, &contextBlob); if (res != TPM_SUCCESS) return res; len = *authContextSize; -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_emulator.h vtpm/tpm/tpm_emulator.h ---- orig/tpm_emulator-0.2/tpm/tpm_emulator.h 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_emulator.h 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_emulator.h vtpm/tpm/tpm_emulator.h +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_emulator.h 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/tpm/tpm_emulator.h 2005-09-14 20:27:22.000000000 -0700 @@ -1,5 +1,6 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -1063,9 +1068,9 @@ /** * tpm_emulator_init - initialises and starts the TPM emulator -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_integrity.c vtpm/tpm/tpm_integrity.c ---- orig/tpm_emulator-0.2/tpm/tpm_integrity.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_integrity.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_integrity.c vtpm/tpm/tpm_integrity.c +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_integrity.c 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/tpm/tpm_integrity.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -1079,9 +1084,9 @@ return TPM_SUCCESS; } - -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_structures.h vtpm/tpm/tpm_structures.h ---- orig/tpm_emulator-0.2/tpm/tpm_structures.h 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_structures.h 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_structures.h vtpm/tpm/tpm_structures.h +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_structures.h 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/tpm/tpm_structures.h 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -1099,9 +1104,9 @@ #include "crypto/rsa.h" /* -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_testing.c vtpm/tpm/tpm_testing.c ---- orig/tpm_emulator-0.2/tpm/tpm_testing.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_testing.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_testing.c vtpm/tpm/tpm_testing.c +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_testing.c 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/tpm/tpm_testing.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -1217,9 +1222,9 @@ rsa_private_key_t priv_key; rsa_public_key_t pub_key; -diff -uprN orig/tpm_emulator-0.2/tpm/tpm_ticks.c vtpm/tpm/tpm_ticks.c ---- orig/tpm_emulator-0.2/tpm/tpm_ticks.c 2005-08-17 10:58:36.000000000 -0700 -+++ vtpm/tpm/tpm_ticks.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/tpm_ticks.c vtpm/tpm/tpm_ticks.c +--- orig/tpm_emulator-0.2-x86_64/tpm/tpm_ticks.c 2005-08-15 00:58:57.000000000 -0700 ++++ vtpm/tpm/tpm_ticks.c 2005-09-14 20:27:22.000000000 -0700 @@ -1,6 +1,7 @@ /* Software-Based Trusted Platform Module (TPM) Emulator for Linux * Copyright (C) 2004 Mario Strasser <mast@xxxxxxx>, @@ -1302,9 +1307,9 @@ } -diff -uprN orig/tpm_emulator-0.2/tpm/vtpm_manager.h vtpm/tpm/vtpm_manager.h ---- orig/tpm_emulator-0.2/tpm/vtpm_manager.h 1969-12-31 16:00:00.000000000 -0800 -+++ vtpm/tpm/vtpm_manager.h 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpm/vtpm_manager.h vtpm/tpm/vtpm_manager.h +--- orig/tpm_emulator-0.2-x86_64/tpm/vtpm_manager.h 1969-12-31 16:00:00.000000000 -0800 ++++ vtpm/tpm/vtpm_manager.h 2005-09-14 20:27:22.000000000 -0700 @@ -0,0 +1,126 @@ +// =================================================================== +// @@ -1432,9 +1437,9 @@ +*********************************************************************/ + +#endif //_VTPM_MANAGER_H_ -diff -uprN orig/tpm_emulator-0.2/tpmd.c vtpm/tpmd.c ---- orig/tpm_emulator-0.2/tpmd.c 1969-12-31 16:00:00.000000000 -0800 -+++ vtpm/tpmd.c 2005-08-17 10:55:52.000000000 -0700 +diff -uprN orig/tpm_emulator-0.2-x86_64/tpmd.c vtpm/tpmd.c +--- orig/tpm_emulator-0.2-x86_64/tpmd.c 1969-12-31 16:00:00.000000000 -0800 ++++ vtpm/tpmd.c 2005-09-15 19:28:55.783005352 -0700 @@ -0,0 +1,207 @@ +/* Software-Based Trusted Platform Module (TPM) Emulator for Linux + * Copyright (C) 2005 INTEL Corp @@ -1468,9 +1473,9 @@ +#else + #define GUEST_RX_FIFO_D "/var/vtpm/fifos/guest-to-%d.fifo" + #define GUEST_TX_FIFO "/var/vtpm/fifos/guest-from-all.fifo" ++#endif + + int dmi_id; -+#endif + +#define BUFFER_SIZE 2048 + @@ -1506,7 +1511,7 @@ +{ + uint8_t in[BUFFER_SIZE], *out, *addressed_out; + uint32_t out_size; -+ int in_size, written ; ++ int in_size, written; + int i, guest_id=-1; + + int vtpm_tx_fh=-1, vtpm_rx_fh=-1; @@ -1602,7 +1607,7 @@ + written = write(vtpm_tx_fh, ctrl_msg, sizeof(ctrl_msg)); + + if (written != sizeof(ctrl_msg)) { -+ printf("ERROR: Part of response not written %d/%d.\n", written, sizeof(ctrl_msg)); ++ printf("ERROR: Part of response not written %d/%Zu.\n", written, sizeof(ctrl_msg)); + } else { + printf("Send Ctrl Message confermation\n"); + } @@ -1623,7 +1628,7 @@ + printf("%x ", addressed_out[i]); + printf("\n"); + } else { -+ printf("Sent[%d]: ", out_size + sizeof(uint32_t)); ++ printf("Sent[%Zu]: ", out_size + sizeof(uint32_t)); + for (i=0; i< out_size+ sizeof(uint32_t); i++) + printf("%x ", addressed_out[i]); + printf("\n"); diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/README --- a/tools/vtpm_manager/README Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/README Tue Sep 20 09:43:46 2005 @@ -51,14 +51,24 @@ DUMMY_BACKEND -> vtpm_manager listens on /tmp/in.fifo and /tmp/out.fifo rather than backend -MANUAL_DM_LAUNCH -> User must manually launch & kill VTPMs +MANUAL_DM_LAUNCH -> Must manually launch & kill VTPMs -USE_FIXED_SRK_AUTH -> Do not randomly generate a random SRK & Owner auth +WELL_KNOWN_SRK_AUTH -> Rather than randomly generating the password for the SRK, + use a well known value. This is necessary for sharing use + of the SRK across applications. Such as VTPM and Dom0 + measurement software. + +WELL_KNOWN_OWNER_AUTH -> Rather than randomly generating the password for the owner, + use a well known value. This is useful for debugging and for + poor bios which do not support clearing TPM if OwnerAuth is + lost. However this has no protection from malicious app + issuing a TPM_OwnerClear to wipe the TPM Requirements ============ - xen-unstable -- IBM frontend/backend vtpm driver patch +- vtpm frontend/backend driver patch +- OpenSSL Library Single-VM Flow ============================ diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/Rules.mk --- a/tools/vtpm_manager/Rules.mk Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/Rules.mk Tue Sep 20 09:43:46 2005 @@ -57,7 +57,8 @@ #CFLAGS += -DMANUAL_DM_LAUNCH # Fixed SRK -CFLAGS += -DUSE_FIXED_SRK_AUTH +CFLAGS += -DWELL_KNOWN_SRK_AUTH +#CFLAGS += -DWELL_KNOWN_OWNER_AUTH # TPM Hardware Device or TPM Simulator #CFLAGS += -DTPM_HWDEV diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/crypto/Makefile --- a/tools/vtpm_manager/crypto/Makefile Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/crypto/Makefile Tue Sep 20 09:43:46 2005 @@ -13,6 +13,7 @@ rm -f *.a *.so *.o *.rpm $(DEP_FILES) mrproper: clean + rm -f *~ $(BIN): $(OBJS) $(AR) rcs $(BIN) $(OBJS) diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/manager/Makefile --- a/tools/vtpm_manager/manager/Makefile Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/manager/Makefile Tue Sep 20 09:43:46 2005 @@ -17,7 +17,7 @@ rm -f *.a *.so *.o *.rpm $(DEP_FILES) mrproper: clean - rm -f $(BIN) + rm -f $(BIN) *~ $(BIN): $(OBJS) $(CC) $(LDFLAGS) $^ $(LIBS) -o $@ diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/manager/dmictl.c --- a/tools/vtpm_manager/manager/dmictl.c Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/manager/dmictl.c Tue Sep 20 09:43:46 2005 @@ -1,339 +1,344 @@ -// =================================================================== -// -// Copyright (c) 2005, Intel Corp. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions -// are met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following -// disclaimer in the documentation and/or other materials provided -// with the distribution. -// * Neither the name of Intel Corporation nor the names of its -// contributors may be used to endorse or promote products derived -// from this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS -// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE -// COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, -// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES -// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -// SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) -// HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, -// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED -// OF THE POSSIBILITY OF SUCH DAMAGE. -// =================================================================== -// -// dmictl.c -// -// Functions for creating and destroying DMIs -// -// ================================================================== - -#include <stdio.h> -#include <unistd.h> -#include <string.h> - -#ifndef VTPM_MUTLI_VM - #include <sys/types.h> - #include <sys/stat.h> - #include <fcntl.h> - #include <signal.h> - #include <wait.h> -#endif - -#include "vtpmpriv.h" -#include "bsg.h" -#include "buffer.h" -#include "log.h" -#include "hashtable.h" -#include "hashtable_itr.h" - -#define TPM_EMULATOR_PATH "/usr/bin/vtpmd" - -TPM_RESULT close_dmi( VTPM_DMI_RESOURCE *dmi_res) { - TPM_RESULT status = TPM_FAIL; - - if (dmi_res == NULL) - return TPM_SUCCESS; - - status = TCS_CloseContext(dmi_res->TCSContext); - free ( dmi_res->NVMLocation ); - dmi_res->connected = FALSE; - -#ifndef VTPM_MULTI_VM - free(dmi_res->guest_tx_fname); - free(dmi_res->vtpm_tx_fname); - - close(dmi_res->guest_tx_fh); dmi_res->guest_tx_fh = -1; - close(dmi_res->vtpm_tx_fh); dmi_res->vtpm_tx_fh = -1; - - - #ifndef MANUAL_DM_LAUNCH - if (dmi_res->dmi_id != VTPM_CTL_DM) { - if (dmi_res->dmi_pid != 0) { - vtpmloginfo(VTPM_LOG_VTPM, "Killing dmi on pid %d.\n", dmi_res->dmi_pid); - if ((kill(dmi_res->dmi_pid, SIGKILL) !=0) || - (waitpid(dmi_res->dmi_pid, NULL, 0) != dmi_res->dmi_pid)){ - vtpmlogerror(VTPM_LOG_VTPM, "Could not kill dmi on pid %d.\n", dmi_res->dmi_pid); - status = TPM_FAIL; - } - } else - vtpmlogerror(VTPM_LOG_VTPM, "Could not kill dmi because it's pid was 0.\n"); - } - #endif -#endif - - return status; -} - -TPM_RESULT VTPM_Handle_New_DMI( const buffer_t *param_buf) { - - VTPM_DMI_RESOURCE *new_dmi=NULL; - TPM_RESULT status=TPM_FAIL; - BYTE type; - UINT32 dmi_id, domain_id, *dmi_id_key; - int fh; - -#ifndef VTPM_MUTLI_VM - char dmi_id_str[11]; // UINT32s are up to 10 digits + NULL - struct stat file_info; -#endif - - if (param_buf == NULL) { // Assume creation of Dom 0 control - type = 0; - domain_id = VTPM_CTL_DM; - dmi_id = VTPM_CTL_DM; - } else if (buffer_len(param_buf) != sizeof(BYTE) + sizeof(UINT32) *2) { - vtpmloginfo(VTPM_LOG_VTPM, "New DMI command wrong length: %d.\n", buffer_len(param_buf)); - status = TPM_BAD_PARAMETER; - goto abort_egress; - } else { - BSG_UnpackList( param_buf->bytes, 3, - BSG_TYPE_BYTE, &type, - BSG_TYPE_UINT32, &domain_id, - BSG_TYPE_UINT32, &dmi_id); - } - - new_dmi = (VTPM_DMI_RESOURCE *) hashtable_search(vtpm_globals->dmi_map, &dmi_id); - if (new_dmi == NULL) { - vtpmloginfo(VTPM_LOG_VTPM, "Creating new DMI instance %d attached on domain %d.\n", dmi_id, domain_id); - // Brand New DMI. Initialize the persistent pieces - if ((new_dmi = (VTPM_DMI_RESOURCE *) malloc (sizeof(VTPM_DMI_RESOURCE))) == NULL) { - status = TPM_RESOURCES; - goto abort_egress; - } - memset(new_dmi, 0, sizeof(VTPM_DMI_RESOURCE)); - new_dmi->dmi_id = dmi_id; - new_dmi->connected = FALSE; - - if ((dmi_id_key = (UINT32 *) malloc (sizeof(UINT32))) == NULL) { - status = TPM_RESOURCES; - goto abort_egress; - } - *dmi_id_key = new_dmi->dmi_id; - - // install into map - if (!hashtable_insert(vtpm_globals->dmi_map, dmi_id_key, new_dmi)){ - free(new_dmi); - free(dmi_id_key); - status = TPM_FAIL; - goto egress; - } - - } else - vtpmloginfo(VTPM_LOG_VTPM, "Re-attaching DMI instance %d on domain %d .\n", dmi_id, domain_id); - - if (new_dmi->connected) { - vtpmlogerror(VTPM_LOG_VTPM, "Attempt to re-attach, currently attached instance %d. Ignoring\n", dmi_id); - status = TPM_BAD_PARAMETER; - goto egress; - } - - // Initialize the Non-persistent pieces - new_dmi->dmi_domain_id = domain_id; - new_dmi->NVMLocation = NULL; - - new_dmi->TCSContext = 0; - TPMTRYRETURN( TCS_OpenContext(&new_dmi->TCSContext) ); - - new_dmi->NVMLocation = (char *) malloc(11 + strlen(DMI_NVM_FILE)); - sprintf(new_dmi->NVMLocation, DMI_NVM_FILE, (uint32_t) new_dmi->dmi_id); - - // Measure DMI - // FIXME: This will measure DMI. Until then use a fixed DMI_Measurement value - /* - fh = open(TPM_EMULATOR_PATH, O_RDONLY); - stat_ret = fstat(fh, &file_stat); - if (stat_ret == 0) - dmi_size = file_stat.st_size; - else { - vtpmlogerror(VTPM_LOG_VTPM, "Could not open tpm_emulator!!\n"); - status = TPM_IOERROR; - goto abort_egress; - } - dmi_buffer - */ - memset(&new_dmi->DMI_measurement, 0xcc, sizeof(TPM_DIGEST)); - -#ifndef VTPM_MULTI_VM - if (dmi_id != VTPM_CTL_DM) { - // Create a pair of fifo pipes - if( (new_dmi->guest_tx_fname = (char *) malloc(11 + strlen(GUEST_TX_FIFO))) == NULL){ - status = TPM_RESOURCES; - goto abort_egress; - } - sprintf(new_dmi->guest_tx_fname, GUEST_TX_FIFO, (uint32_t) dmi_id); - - if ((new_dmi->vtpm_tx_fname = (char *) malloc(11 + strlen(VTPM_TX_FIFO))) == NULL) { - status = TPM_RESOURCES; - goto abort_egress; - } - sprintf(new_dmi->vtpm_tx_fname, VTPM_TX_FIFO, (uint32_t) dmi_id); - - new_dmi->guest_tx_fh = -1; - new_dmi->vtpm_tx_fh= -1; - - if ( stat(new_dmi->guest_tx_fname, &file_info) == -1) { - if ( mkfifo(new_dmi->guest_tx_fname, S_IWUSR | S_IRUSR ) ){ - status = TPM_FAIL; - goto abort_egress; - } - } - - if ( (fh = open(new_dmi->vtpm_tx_fname, O_RDWR)) == -1) { - if ( mkfifo(new_dmi->vtpm_tx_fname, S_IWUSR | S_IRUSR ) ) { - status = TPM_FAIL; - goto abort_egress; - } - } - - // Launch DMI - sprintf(dmi_id_str, "%d", (int) dmi_id); -#ifdef MANUAL_DM_LAUNCH - vtpmlogerror(VTPM_LOG_VTPM, "FAKING starting vtpm with dmi=%s\n", dmi_id_str); - new_dmi->dmi_pid = 0; -#else - pid_t pid = fork(); - - if (pid == -1) { - vtpmlogerror(VTPM_LOG_VTPM, "Could not fork to launch vtpm\n"); - status = TPM_RESOURCES; - goto abort_egress; - } else if (pid == 0) { - if ( stat(new_dmi->NVMLocation, &file_info) == -1) - execl (TPM_EMULATOR_PATH, "vtmpd", "clear", dmi_id_str, NULL); - else - execl (TPM_EMULATOR_PATH, "vtpmd", "save", dmi_id_str, NULL); - - // Returning from these at all is an error. - vtpmlogerror(VTPM_LOG_VTPM, "Could not exec to launch vtpm\n"); - } else { - new_dmi->dmi_pid = pid; - vtpmloginfo(VTPM_LOG_VTPM, "Launching DMI on PID = %d\n", pid); - } -#endif // MANUAL_DM_LAUNCH - } -#else // VTPM_MUTLI_VM - // FIXME: Measure DMI through call to Measurement agent in platform. -#endif - - vtpm_globals->DMI_table_dirty = TRUE; - new_dmi->connected = TRUE; - status=TPM_SUCCESS; - goto egress; - - abort_egress: - close_dmi( new_dmi ); - - egress: - return status; -} - -TPM_RESULT VTPM_Handle_Close_DMI( const buffer_t *param_buf) { - - TPM_RESULT status=TPM_FAIL; - VTPM_DMI_RESOURCE *dmi_res=NULL; - UINT32 dmi_id; - - if ((param_buf == NULL) || (buffer_len(param_buf) != sizeof(UINT32)) ) { - vtpmlogerror(VTPM_LOG_VTPM, "Closing DMI has bad size."); - status = TPM_BAD_PARAMETER; - goto abort_egress; - } - - BSG_UnpackList( param_buf->bytes, 1, - BSG_TYPE_UINT32, &dmi_id); - - vtpmloginfo(VTPM_LOG_VTPM, "Closing DMI %d.\n", dmi_id); - - dmi_res = (VTPM_DMI_RESOURCE *) hashtable_search(vtpm_globals->dmi_map, &dmi_id); - if (dmi_res == NULL ) { - vtpmlogerror(VTPM_LOG_VTPM, "Trying to close nonexistent DMI.\n"); - status = TPM_BAD_PARAMETER; - goto abort_egress; - } - - if (!dmi_res->connected) { - vtpmlogerror(VTPM_LOG_VTPM, "Closing non-connected DMI.\n"); - status = TPM_BAD_PARAMETER; - goto abort_egress; - } - - // Close Dmi - TPMTRYRETURN(close_dmi( dmi_res )); - - status=TPM_SUCCESS; - goto egress; - - abort_egress: - egress: - - return status; -} - -TPM_RESULT VTPM_Handle_Delete_DMI( const buffer_t *param_buf) { - - TPM_RESULT status=TPM_FAIL; - VTPM_DMI_RESOURCE *dmi_res=NULL; - UINT32 dmi_id; - - if ((param_buf == NULL) || (buffer_len(param_buf) != sizeof(UINT32)) ) { - vtpmlogerror(VTPM_LOG_VTPM, "Closing DMI has bad size.\n"); - status = TPM_BAD_PARAMETER; - goto abort_egress; - } - - BSG_UnpackList( param_buf->bytes, 1, - BSG_TYPE_UINT32, &dmi_id); - - vtpmloginfo(VTPM_LOG_VTPM, "Deleting DMI %d.\n", dmi_id); - - dmi_res = (VTPM_DMI_RESOURCE *) hashtable_remove(vtpm_globals->dmi_map, &dmi_id); - if (dmi_res == NULL) { - vtpmlogerror(VTPM_LOG_VTPM, "Closing non-existent DMI.\n"); - status = TPM_BAD_PARAMETER; - goto abort_egress; - } - - //TODO: Automatically delete file dmi_res->NVMLocation - - // Close DMI first - TPMTRYRETURN(close_dmi( dmi_res )); - free ( dmi_res ); - - status=TPM_SUCCESS; - goto egress; - - abort_egress: - egress: - - return status; -} +// =================================================================== +// +// Copyright (c) 2005, Intel Corp. +// All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// +// * Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// * Redistributions in binary form must reproduce the above +// copyright notice, this list of conditions and the following +// disclaimer in the documentation and/or other materials provided +// with the distribution. +// * Neither the name of Intel Corporation nor the names of its +// contributors may be used to endorse or promote products derived +// from this software without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +// COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +// SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +// HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, +// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED +// OF THE POSSIBILITY OF SUCH DAMAGE. +// =================================================================== +// +// dmictl.c +// +// Functions for creating and destroying DMIs +// +// ================================================================== + +#include <stdio.h> +#include <unistd.h> +#include <string.h> + +#ifndef VTPM_MUTLI_VM + #include <sys/types.h> + #include <sys/stat.h> + #include <fcntl.h> + #include <signal.h> + #include <wait.h> +#endif + +#include "vtpmpriv.h" +#include "bsg.h" +#include "buffer.h" +#include "log.h" +#include "hashtable.h" +#include "hashtable_itr.h" + +#define TPM_EMULATOR_PATH "/usr/bin/vtpmd" + +TPM_RESULT close_dmi( VTPM_DMI_RESOURCE *dmi_res) { + TPM_RESULT status = TPM_FAIL; + + if (dmi_res == NULL) + return TPM_SUCCESS; + + status = TCS_CloseContext(dmi_res->TCSContext); + free ( dmi_res->NVMLocation ); + dmi_res->connected = FALSE; + +#ifndef VTPM_MULTI_VM + free(dmi_res->guest_tx_fname); + free(dmi_res->vtpm_tx_fname); + + close(dmi_res->guest_tx_fh); dmi_res->guest_tx_fh = -1; + close(dmi_res->vtpm_tx_fh); dmi_res->vtpm_tx_fh = -1; + + #ifndef MANUAL_DM_LAUNCH + if (dmi_res->dmi_id != VTPM_CTL_DM) { + if (dmi_res->dmi_pid != 0) { + vtpmloginfo(VTPM_LOG_VTPM, "Killing dmi on pid %d.\n", dmi_res->dmi_pid); + if (kill(dmi_res->dmi_pid, SIGKILL) !=0) { + vtpmloginfo(VTPM_LOG_VTPM, "DMI on pid %d is already dead.\n", dmi_res->dmi_pid); + } else if (waitpid(dmi_res->dmi_pid, NULL, 0) != dmi_res->dmi_pid) { + vtpmlogerror(VTPM_LOG_VTPM, "DMI on pid %d failed to stop.\n", dmi_res->dmi_pid); + status = TPM_FAIL; + } + } else { + vtpmlogerror(VTPM_LOG_VTPM, "Could not kill dmi because it's pid was 0.\n"); + status = TPM_FAIL; + } + } + #endif +#endif + + return status; +} + +TPM_RESULT VTPM_Handle_New_DMI( const buffer_t *param_buf) { + + VTPM_DMI_RESOURCE *new_dmi=NULL; + TPM_RESULT status=TPM_FAIL; + BYTE type; + UINT32 dmi_id, domain_id, *dmi_id_key; + +#ifndef VTPM_MULTI_VM + int fh; + char dmi_id_str[11]; // UINT32s are up to 10 digits + NULL + struct stat file_info; +#endif + + if (param_buf == NULL) { // Assume creation of Dom 0 control + type = 0; + domain_id = VTPM_CTL_DM; + dmi_id = VTPM_CTL_DM; + } else if (buffer_len(param_buf) != sizeof(BYTE) + sizeof(UINT32) *2) { + vtpmloginfo(VTPM_LOG_VTPM, "New DMI command wrong length: %d.\n", buffer_len(param_buf)); + status = TPM_BAD_PARAMETER; + goto abort_egress; + } else { + BSG_UnpackList( param_buf->bytes, 3, + BSG_TYPE_BYTE, &type, + BSG_TYPE_UINT32, &domain_id, + BSG_TYPE_UINT32, &dmi_id); + } + + new_dmi = (VTPM_DMI_RESOURCE *) hashtable_search(vtpm_globals->dmi_map, &dmi_id); + if (new_dmi == NULL) { + vtpmloginfo(VTPM_LOG_VTPM, "Creating new DMI instance %d attached on domain %d.\n", dmi_id, domain_id); + // Brand New DMI. Initialize the persistent pieces + if ((new_dmi = (VTPM_DMI_RESOURCE *) malloc (sizeof(VTPM_DMI_RESOURCE))) == NULL) { + status = TPM_RESOURCES; + goto abort_egress; + } + memset(new_dmi, 0, sizeof(VTPM_DMI_RESOURCE)); + new_dmi->dmi_id = dmi_id; + new_dmi->connected = FALSE; + + if ((dmi_id_key = (UINT32 *) malloc (sizeof(UINT32))) == NULL) { + status = TPM_RESOURCES; + goto abort_egress; + } + *dmi_id_key = new_dmi->dmi_id; + + // install into map + if (!hashtable_insert(vtpm_globals->dmi_map, dmi_id_key, new_dmi)){ + free(new_dmi); + free(dmi_id_key); + status = TPM_FAIL; + goto egress; + } + + } else + vtpmloginfo(VTPM_LOG_VTPM, "Re-attaching DMI instance %d on domain %d .\n", dmi_id, domain_id); + + if (new_dmi->connected) { + vtpmlogerror(VTPM_LOG_VTPM, "Attempt to re-attach, currently attached instance %d. Ignoring\n", dmi_id); + status = TPM_BAD_PARAMETER; + goto egress; + } + + // Initialize the Non-persistent pieces + new_dmi->dmi_domain_id = domain_id; + new_dmi->NVMLocation = NULL; + + new_dmi->TCSContext = 0; + TPMTRYRETURN( TCS_OpenContext(&new_dmi->TCSContext) ); + + new_dmi->NVMLocation = (char *) malloc(11 + strlen(DMI_NVM_FILE)); + sprintf(new_dmi->NVMLocation, DMI_NVM_FILE, (uint32_t) new_dmi->dmi_id); + + // Measure DMI + // FIXME: This will measure DMI. Until then use a fixed DMI_Measurement value + /* + fh = open(TPM_EMULATOR_PATH, O_RDONLY); + stat_ret = fstat(fh, &file_stat); + if (stat_ret == 0) + dmi_size = file_stat.st_size; + else { + vtpmlogerror(VTPM_LOG_VTPM, "Could not open tpm_emulator!!\n"); + status = TPM_IOERROR; + goto abort_egress; + } + dmi_buffer + */ + memset(&new_dmi->DMI_measurement, 0xcc, sizeof(TPM_DIGEST)); + +#ifndef VTPM_MULTI_VM + if (dmi_id != VTPM_CTL_DM) { + // Create a pair of fifo pipes + if( (new_dmi->guest_tx_fname = (char *) malloc(11 + strlen(GUEST_TX_FIFO))) == NULL){ + status = TPM_RESOURCES; + goto abort_egress; + } + sprintf(new_dmi->guest_tx_fname, GUEST_TX_FIFO, (uint32_t) dmi_id); + + if ((new_dmi->vtpm_tx_fname = (char *) malloc(11 + strlen(VTPM_TX_FIFO))) == NULL) { + status = TPM_RESOURCES; + goto abort_egress; + } + sprintf(new_dmi->vtpm_tx_fname, VTPM_TX_FIFO, (uint32_t) dmi_id); + + new_dmi->guest_tx_fh = -1; + new_dmi->vtpm_tx_fh= -1; + + if ( stat(new_dmi->guest_tx_fname, &file_info) == -1) { + if ( mkfifo(new_dmi->guest_tx_fname, S_IWUSR | S_IRUSR ) ){ + vtpmlogerror(VTPM_LOG_VTPM, "Failed to create dmi fifo.\n"); + status = TPM_IOERROR; + goto abort_egress; + } + } + + if ( (fh = open(new_dmi->vtpm_tx_fname, O_RDWR)) == -1) { + if ( mkfifo(new_dmi->vtpm_tx_fname, S_IWUSR | S_IRUSR ) ) { + vtpmlogerror(VTPM_LOG_VTPM, "Failed to create dmi fifo.\n"); + status = TPM_IOERROR; + goto abort_egress; + } + } + + // Launch DMI + sprintf(dmi_id_str, "%d", (int) dmi_id); +#ifdef MANUAL_DM_LAUNCH + vtpmlogerror(VTPM_LOG_VTPM, "FAKING starting vtpm with dmi=%s\n", dmi_id_str); + new_dmi->dmi_pid = 0; +#else + pid_t pid = fork(); + + if (pid == -1) { + vtpmlogerror(VTPM_LOG_VTPM, "Could not fork to launch vtpm\n"); + status = TPM_RESOURCES; + goto abort_egress; + } else if (pid == 0) { + if ( stat(new_dmi->NVMLocation, &file_info) == -1) + execl (TPM_EMULATOR_PATH, "vtmpd", "clear", dmi_id_str, NULL); + else + execl (TPM_EMULATOR_PATH, "vtpmd", "save", dmi_id_str, NULL); + + // Returning from these at all is an error. + vtpmlogerror(VTPM_LOG_VTPM, "Could not exec to launch vtpm\n"); + } else { + new_dmi->dmi_pid = pid; + vtpmloginfo(VTPM_LOG_VTPM, "Launching DMI on PID = %d\n", pid); + } +#endif // MANUAL_DM_LAUNCH + } +#else // VTPM_MUTLI_VM + // FIXME: Measure DMI through call to Measurement agent in platform. +#endif + + vtpm_globals->DMI_table_dirty = TRUE; + new_dmi->connected = TRUE; + status=TPM_SUCCESS; + goto egress; + + abort_egress: + vtpmlogerror(VTPM_LOG_VTPM, "Failed to create DMI id=%d due to status=%s. Cleaning.\n", dmi_id, tpm_get_error_name(status)); + close_dmi( new_dmi ); + + egress: + return status; +} + +TPM_RESULT VTPM_Handle_Close_DMI( const buffer_t *param_buf) { + + TPM_RESULT status=TPM_FAIL; + VTPM_DMI_RESOURCE *dmi_res=NULL; + UINT32 dmi_id; + + if ((param_buf == NULL) || (buffer_len(param_buf) != sizeof(UINT32)) ) { + vtpmlogerror(VTPM_LOG_VTPM, "Closing DMI has bad size."); + status = TPM_BAD_PARAMETER; + goto abort_egress; + } + + BSG_UnpackList( param_buf->bytes, 1, + BSG_TYPE_UINT32, &dmi_id); + + vtpmloginfo(VTPM_LOG_VTPM, "Closing DMI %d.\n", dmi_id); + + dmi_res = (VTPM_DMI_RESOURCE *) hashtable_search(vtpm_globals->dmi_map, &dmi_id); + if (dmi_res == NULL ) { + vtpmlogerror(VTPM_LOG_VTPM, "Trying to close nonexistent DMI.\n"); + status = TPM_BAD_PARAMETER; + goto abort_egress; + } + + if (!dmi_res->connected) { + vtpmlogerror(VTPM_LOG_VTPM, "Closing non-connected DMI.\n"); + status = TPM_BAD_PARAMETER; + goto abort_egress; + } + + // Close Dmi + TPMTRYRETURN(close_dmi( dmi_res )); + + status=TPM_SUCCESS; + goto egress; + + abort_egress: + egress: + + return status; +} + +TPM_RESULT VTPM_Handle_Delete_DMI( const buffer_t *param_buf) { + + TPM_RESULT status=TPM_FAIL; + VTPM_DMI_RESOURCE *dmi_res=NULL; + UINT32 dmi_id; + + if ((param_buf == NULL) || (buffer_len(param_buf) != sizeof(UINT32)) ) { + vtpmlogerror(VTPM_LOG_VTPM, "Closing DMI has bad size.\n"); + status = TPM_BAD_PARAMETER; + goto abort_egress; + } + + BSG_UnpackList( param_buf->bytes, 1, + BSG_TYPE_UINT32, &dmi_id); + + vtpmloginfo(VTPM_LOG_VTPM, "Deleting DMI %d.\n", dmi_id); + + dmi_res = (VTPM_DMI_RESOURCE *) hashtable_remove(vtpm_globals->dmi_map, &dmi_id); + if (dmi_res == NULL) { + vtpmlogerror(VTPM_LOG_VTPM, "Closing non-existent DMI.\n"); + status = TPM_BAD_PARAMETER; + goto abort_egress; + } + + //TODO: Automatically delete file dmi_res->NVMLocation + + // Close DMI first + TPMTRYRETURN(close_dmi( dmi_res )); + free ( dmi_res ); + + status=TPM_SUCCESS; + goto egress; + + abort_egress: + egress: + + return status; +} diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/manager/securestorage.c --- a/tools/vtpm_manager/manager/securestorage.c Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/manager/securestorage.c Tue Sep 20 09:43:46 2005 @@ -1,401 +1,401 @@ -// =================================================================== -// -// Copyright (c) 2005, Intel Corp. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions -// are met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following -// disclaimer in the documentation and/or other materials provided -// with the distribution. -// * Neither the name of Intel Corporation nor the names of its -// contributors may be used to endorse or promote products derived -// from this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS -// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE -// COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, -// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES -// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -// SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) -// HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, -// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED -// OF THE POSSIBILITY OF SUCH DAMAGE. -// =================================================================== -// -// securestorage.c -// -// Functions regarding securely storing DMI secrets. -// -// ================================================================== - -#include <sys/types.h> -#include <sys/stat.h> -#include <fcntl.h> -#include <unistd.h> -#include <string.h> - -#include "tcg.h" -#include "vtpm_manager.h" -#include "vtpmpriv.h" -#include "vtsp.h" -#include "bsg.h" -#include "crypto.h" -#include "hashtable.h" -#include "hashtable_itr.h" -#include "buffer.h" -#include "log.h" - -TPM_RESULT VTPM_Handle_Save_NVM(VTPM_DMI_RESOURCE *myDMI, - const buffer_t *inbuf, - buffer_t *outbuf) { - - TPM_RESULT status = TPM_SUCCESS; - symkey_t symkey; - buffer_t state_cipher = NULL_BUF, - symkey_cipher = NULL_BUF; - int fh; - long bytes_written; - BYTE *sealed_NVM=NULL; - UINT32 sealed_NVM_size, i; - struct pack_constbuf_t symkey_cipher32, state_cipher32; - - vtpmloginfo(VTPM_LOG_VTPM_DEEP, "Save_NVMing[%d]: 0x", buffer_len(inbuf)); - for (i=0; i< buffer_len(inbuf); i++) - vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", inbuf->bytes[i]); - vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); - - // Generate a sym key and encrypt state with it - TPMTRY(TPM_ENCRYPT_ERROR, Crypto_symcrypto_genkey (&symkey) ); - TPMTRY(TPM_ENCRYPT_ERROR, Crypto_symcrypto_encrypt (&symkey, inbuf, &state_cipher) ); - - // Encrypt symmetric key - TPMTRYRETURN( VTSP_Bind( &vtpm_globals->storageKey, - &symkey.key, - &symkey_cipher) ); - - // Create output blob: symkey_size + symkey_cipher + state_cipher_size + state_cipher - - symkey_cipher32.size = buffer_len(&symkey_cipher); - symkey_cipher32.data = symkey_cipher.bytes; - - state_cipher32.size = buffer_len(&state_cipher); - state_cipher32.data = state_cipher.bytes; - - sealed_NVM = (BYTE *) malloc( 2 * sizeof(UINT32) + symkey_cipher32.size + state_cipher32.size); - - sealed_NVM_size = BSG_PackList(sealed_NVM, 2, - BSG_TPM_SIZE32_DATA, &symkey_cipher32, - BSG_TPM_SIZE32_DATA, &state_cipher32); - - // Mark DMI Table so new save state info will get pushed to disk on return. - vtpm_globals->DMI_table_dirty = TRUE; - - // Write sealed blob off disk from NVMLocation - // TODO: How to properly return from these. Do we care if we return failure - // after writing the file? We can't get the old one back. - // TODO: Backup old file and try and recover that way. - fh = open(myDMI->NVMLocation, O_WRONLY | O_CREAT, S_IREAD | S_IWRITE); - if ( (bytes_written = write(fh, sealed_NVM, sealed_NVM_size) ) != (long) sealed_NVM_size) { - vtpmlogerror(VTPM_LOG_VTPM, "We just overwrote a DMI_NVM and failed to finish. %ld/%ld bytes.\n", bytes_written, (long)sealed_NVM_size); - status = TPM_IOERROR; - goto abort_egress; - } - close(fh); - - Crypto_SHA1Full (sealed_NVM, sealed_NVM_size, (BYTE *) &myDMI->NVM_measurement); - - vtpmloginfo(VTPM_LOG_VTPM, "Saved %d bytes of E(symkey) + %d bytes of E(NVM)\n", buffer_len(&symkey_cipher), buffer_len(&state_cipher)); - goto egress; - - abort_egress: - vtpmlogerror(VTPM_LOG_VTPM, "Failed to load NVM\n."); - - egress: - - buffer_free ( &state_cipher); - buffer_free ( &symkey_cipher); - free(sealed_NVM); - Crypto_symcrypto_freekey (&symkey); - - return status; -} - - -/* inbuf = null outbuf = sealed blob size, sealed blob.*/ -TPM_RESULT VTPM_Handle_Load_NVM(VTPM_DMI_RESOURCE *myDMI, - const buffer_t *inbuf, - buffer_t *outbuf) { - - TPM_RESULT status = TPM_SUCCESS; - symkey_t symkey; - buffer_t state_cipher = NULL_BUF, - symkey_clear = NULL_BUF, - symkey_cipher = NULL_BUF; - struct pack_buf_t symkey_cipher32, state_cipher32; - - UINT32 sealed_NVM_size; - BYTE *sealed_NVM = NULL; - long fh_size; - int fh, stat_ret, i; - struct stat file_stat; - TPM_DIGEST sealedNVMHash; - - memset(&symkey, 0, sizeof(symkey_t)); - - if (myDMI->NVMLocation == NULL) { - vtpmlogerror(VTPM_LOG_VTPM, "Unable to load NVM because the file name NULL.\n"); - status = TPM_AUTHFAIL; - goto abort_egress; - } - - //Read sealed blob off disk from NVMLocation - fh = open(myDMI->NVMLocation, O_RDONLY); - stat_ret = fstat(fh, &file_stat); - if (stat_ret == 0) - fh_size = file_stat.st_size; - else { - status = TPM_IOERROR; - goto abort_egress; - } - - sealed_NVM = (BYTE *) malloc(fh_size); - if (read(fh, sealed_NVM, fh_size) != fh_size) { - status = TPM_IOERROR; - goto abort_egress; - } - close(fh); - - vtpmloginfo(VTPM_LOG_VTPM_DEEP, "Load_NVMing[%ld]: 0x", fh_size); - for (i=0; i< fh_size; i++) - vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", sealed_NVM[i]); - vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); - - sealed_NVM_size = BSG_UnpackList(sealed_NVM, 2, - BSG_TPM_SIZE32_DATA, &symkey_cipher32, - BSG_TPM_SIZE32_DATA, &state_cipher32); - - TPMTRYRETURN( buffer_init_convert (&symkey_cipher, - symkey_cipher32.size, - symkey_cipher32.data) ); - - TPMTRYRETURN( buffer_init_convert (&state_cipher, - state_cipher32.size, - state_cipher32.data) ); - - Crypto_SHA1Full(sealed_NVM, sealed_NVM_size, (BYTE *) &sealedNVMHash); - - // Verify measurement of sealed blob. - if (memcmp(&sealedNVMHash, &myDMI->NVM_measurement, sizeof(TPM_DIGEST)) ) { - vtpmlogerror(VTPM_LOG_VTPM, "VTPM LoadNVM NVM measurement check failed.\n"); - vtpmloginfo(VTPM_LOG_VTPM_DEEP, "Correct hash: "); - for (i=0; i< sizeof(TPM_DIGEST); i++) - vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", ((BYTE*)&myDMI->NVM_measurement)[i]); - vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); - - vtpmloginfo(VTPM_LOG_VTPM_DEEP, "Measured hash: "); - for (i=0; i< sizeof(TPM_DIGEST); i++) - vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", ((BYTE*)&sealedNVMHash)[i]); - vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); - - status = TPM_AUTHFAIL; - goto abort_egress; - } - - // Decrypt Symmetric Key - TPMTRYRETURN( VTSP_Unbind( myDMI->TCSContext, - vtpm_globals->storageKeyHandle, - &symkey_cipher, - (const TPM_AUTHDATA*)&vtpm_globals->storage_key_usage_auth, - &symkey_clear, - &(vtpm_globals->keyAuth) ) ); - - // create symmetric key using saved bits - Crypto_symcrypto_initkey (&symkey, &symkey_clear); - - // Decrypt State - TPMTRY(TPM_DECRYPT_ERROR, Crypto_symcrypto_decrypt (&symkey, &state_cipher, outbuf) ); - - goto egress; - - abort_egress: - vtpmlogerror(VTPM_LOG_VTPM, "Failed to load NVM\n."); - - egress: - - buffer_free ( &state_cipher); - buffer_free ( &symkey_clear); - buffer_free ( &symkey_cipher); - free( sealed_NVM ); - Crypto_symcrypto_freekey (&symkey); - - return status; -} - -TPM_RESULT VTPM_SaveService(void) { - TPM_RESULT status=TPM_SUCCESS; - int fh, dmis=-1; - - BYTE *flat_global; - int flat_global_size, bytes_written; - UINT32 storageKeySize = buffer_len(&vtpm_globals->storageKeyWrap); - struct pack_buf_t storage_key_pack = {storageKeySize, vtpm_globals->storageKeyWrap.bytes}; - - struct hashtable_itr *dmi_itr; - VTPM_DMI_RESOURCE *dmi_res; - - UINT32 flat_global_full_size; - - // Global Values needing to be saved - flat_global_full_size = 3*sizeof(TPM_DIGEST) + // Auths - sizeof(UINT32) + // storagekeysize - storageKeySize + // storage key - hashtable_count(vtpm_globals->dmi_map) * // num DMIS - (sizeof(UINT32) + 2*sizeof(TPM_DIGEST)); // Per DMI info - - - flat_global = (BYTE *) malloc( flat_global_full_size); - - flat_global_size = BSG_PackList(flat_global, 4, - BSG_TPM_AUTHDATA, &vtpm_globals->owner_usage_auth, - BSG_TPM_AUTHDATA, &vtpm_globals->srk_usage_auth, - BSG_TPM_SECRET, &vtpm_globals->storage_key_usage_auth, - BSG_TPM_SIZE32_DATA, &storage_key_pack); - - // Per DMI values to be saved - if (hashtable_count(vtpm_globals->dmi_map) > 0) { - - dmi_itr = hashtable_iterator(vtpm_globals->dmi_map); - do { - dmi_res = (VTPM_DMI_RESOURCE *) hashtable_iterator_value(dmi_itr); - dmis++; - - // No need to save dmi0. - if (dmi_res->dmi_id == 0) - continue; - - - flat_global_size += BSG_PackList( flat_global + flat_global_size, 3, - BSG_TYPE_UINT32, &dmi_res->dmi_id, - BSG_TPM_DIGEST, &dmi_res->NVM_measurement, - BSG_TPM_DIGEST, &dmi_res->DMI_measurement); - - } while (hashtable_iterator_advance(dmi_itr)); - } - - //FIXME: Once we have a way to protect a TPM key, we should use it to - // encrypt this blob. BUT, unless there is a way to ensure the key is - // not used by other apps, this encryption is useless. - fh = open(STATE_FILE, O_WRONLY | O_CREAT, S_IREAD | S_IWRITE); - if (fh == -1) { - vtpmlogerror(VTPM_LOG_VTPM, "Unable to open %s file for write.\n", STATE_FILE); - status = TPM_IOERROR; - goto abort_egress; - } - - if ( (bytes_written = write(fh, flat_global, flat_global_size)) != flat_global_size ) { - vtpmlogerror(VTPM_LOG_VTPM, "Failed to save service data. %d/%d bytes written.\n", bytes_written, flat_global_size); - status = TPM_IOERROR; - goto abort_egress; - } - vtpm_globals->DMI_table_dirty = FALSE; - - goto egress; - - abort_egress: - egress: - - free(flat_global); - close(fh); - - vtpmloginfo(VTPM_LOG_VTPM, "Saved VTPM Service state (status = %d, dmis = %d)\n", (int) status, dmis); - return status; -} - -TPM_RESULT VTPM_LoadService(void) { - - TPM_RESULT status=TPM_SUCCESS; - int fh, stat_ret, dmis=0; - long fh_size = 0, step_size; - BYTE *flat_global=NULL; - struct pack_buf_t storage_key_pack; - UINT32 *dmi_id_key; - - VTPM_DMI_RESOURCE *dmi_res; - struct stat file_stat; - - fh = open(STATE_FILE, O_RDONLY ); - stat_ret = fstat(fh, &file_stat); - if (stat_ret == 0) - fh_size = file_stat.st_size; - else { - status = TPM_IOERROR; - goto abort_egress; - } - - flat_global = (BYTE *) malloc(fh_size); - - if ((long) read(fh, flat_global, fh_size) != fh_size ) { - status = TPM_IOERROR; - goto abort_egress; - } - - // Global Values needing to be saved - step_size = BSG_UnpackList( flat_global, 4, - BSG_TPM_AUTHDATA, &vtpm_globals->owner_usage_auth, - BSG_TPM_AUTHDATA, &vtpm_globals->srk_usage_auth, - BSG_TPM_SECRET, &vtpm_globals->storage_key_usage_auth, - BSG_TPM_SIZE32_DATA, &storage_key_pack); - - TPMTRYRETURN(buffer_init(&vtpm_globals->storageKeyWrap, 0, 0) ); - TPMTRYRETURN(buffer_append_raw(&vtpm_globals->storageKeyWrap, storage_key_pack.size, storage_key_pack.data) ); - - // Per DMI values to be saved - while ( step_size < fh_size ){ - if (fh_size - step_size < (long) (sizeof(UINT32) + 2*sizeof(TPM_DIGEST))) { - vtpmlogerror(VTPM_LOG_VTPM, "Encountered %ld extra bytes at end of manager state.\n", fh_size-step_size); - step_size = fh_size; - } else { - dmi_res = (VTPM_DMI_RESOURCE *) malloc(sizeof(VTPM_DMI_RESOURCE)); - dmis++; - - dmi_res->connected = FALSE; - - step_size += BSG_UnpackList(flat_global + step_size, 3, - BSG_TYPE_UINT32, &dmi_res->dmi_id, - BSG_TPM_DIGEST, &dmi_res->NVM_measurement, - BSG_TPM_DIGEST, &dmi_res->DMI_measurement); - - // install into map - dmi_id_key = (UINT32 *) malloc (sizeof(UINT32)); - *dmi_id_key = dmi_res->dmi_id; - if (!hashtable_insert(vtpm_globals->dmi_map, dmi_id_key, dmi_res)) { - status = TPM_FAIL; - goto abort_egress; - } - - } - - } - - goto egress; - - abort_egress: - vtpmlogerror(VTPM_LOG_VTPM, "Failed to save service data\n"); - egress: - - if (flat_global) - free(flat_global); - close(fh); - - vtpmloginfo(VTPM_LOG_VTPM, "Previously saved state reloaded (status = %d, dmis = %d).\n", (int) status, dmis); - return status; -} +// =================================================================== +// +// Copyright (c) 2005, Intel Corp. +// All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// +// * Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// * Redistributions in binary form must reproduce the above +// copyright notice, this list of conditions and the following +// disclaimer in the documentation and/or other materials provided +// with the distribution. +// * Neither the name of Intel Corporation nor the names of its +// contributors may be used to endorse or promote products derived +// from this software without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +// COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +// SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +// HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, +// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED +// OF THE POSSIBILITY OF SUCH DAMAGE. +// =================================================================== +// +// securestorage.c +// +// Functions regarding securely storing DMI secrets. +// +// ================================================================== + +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> +#include <unistd.h> +#include <string.h> + +#include "tcg.h" +#include "vtpm_manager.h" +#include "vtpmpriv.h" +#include "vtsp.h" +#include "bsg.h" +#include "crypto.h" +#include "hashtable.h" +#include "hashtable_itr.h" +#include "buffer.h" +#include "log.h" + +TPM_RESULT VTPM_Handle_Save_NVM(VTPM_DMI_RESOURCE *myDMI, + const buffer_t *inbuf, + buffer_t *outbuf) { + + TPM_RESULT status = TPM_SUCCESS; + symkey_t symkey; + buffer_t state_cipher = NULL_BUF, + symkey_cipher = NULL_BUF; + int fh; + long bytes_written; + BYTE *sealed_NVM=NULL; + UINT32 sealed_NVM_size, i; + struct pack_constbuf_t symkey_cipher32, state_cipher32; + + vtpmloginfo(VTPM_LOG_VTPM_DEEP, "Save_NVMing[%d]: 0x", buffer_len(inbuf)); + for (i=0; i< buffer_len(inbuf); i++) + vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", inbuf->bytes[i]); + vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + + // Generate a sym key and encrypt state with it + TPMTRY(TPM_ENCRYPT_ERROR, Crypto_symcrypto_genkey (&symkey) ); + TPMTRY(TPM_ENCRYPT_ERROR, Crypto_symcrypto_encrypt (&symkey, inbuf, &state_cipher) ); + + // Encrypt symmetric key + TPMTRYRETURN( VTSP_Bind( &vtpm_globals->storageKey, + &symkey.key, + &symkey_cipher) ); + + // Create output blob: symkey_size + symkey_cipher + state_cipher_size + state_cipher + + symkey_cipher32.size = buffer_len(&symkey_cipher); + symkey_cipher32.data = symkey_cipher.bytes; + + state_cipher32.size = buffer_len(&state_cipher); + state_cipher32.data = state_cipher.bytes; + + sealed_NVM = (BYTE *) malloc( 2 * sizeof(UINT32) + symkey_cipher32.size + state_cipher32.size); + + sealed_NVM_size = BSG_PackList(sealed_NVM, 2, + BSG_TPM_SIZE32_DATA, &symkey_cipher32, + BSG_TPM_SIZE32_DATA, &state_cipher32); + + // Mark DMI Table so new save state info will get pushed to disk on return. + vtpm_globals->DMI_table_dirty = TRUE; + + // Write sealed blob off disk from NVMLocation + // TODO: How to properly return from these. Do we care if we return failure + // after writing the file? We can't get the old one back. + // TODO: Backup old file and try and recover that way. + fh = open(myDMI->NVMLocation, O_WRONLY | O_CREAT, S_IREAD | S_IWRITE); + if ( (bytes_written = write(fh, sealed_NVM, sealed_NVM_size) ) != (long) sealed_NVM_size) { + vtpmlogerror(VTPM_LOG_VTPM, "We just overwrote a DMI_NVM and failed to finish. %ld/%ld bytes.\n", bytes_written, (long)sealed_NVM_size); + status = TPM_IOERROR; + goto abort_egress; + } + close(fh); + + Crypto_SHA1Full (sealed_NVM, sealed_NVM_size, (BYTE *) &myDMI->NVM_measurement); + + vtpmloginfo(VTPM_LOG_VTPM, "Saved %d bytes of E(symkey) + %d bytes of E(NVM)\n", buffer_len(&symkey_cipher), buffer_len(&state_cipher)); + goto egress; + + abort_egress: + vtpmlogerror(VTPM_LOG_VTPM, "Failed to load NVM\n."); + + egress: + + buffer_free ( &state_cipher); + buffer_free ( &symkey_cipher); + free(sealed_NVM); + Crypto_symcrypto_freekey (&symkey); + + return status; +} + + +/* inbuf = null outbuf = sealed blob size, sealed blob.*/ +TPM_RESULT VTPM_Handle_Load_NVM(VTPM_DMI_RESOURCE *myDMI, + const buffer_t *inbuf, + buffer_t *outbuf) { + + TPM_RESULT status = TPM_SUCCESS; + symkey_t symkey; + buffer_t state_cipher = NULL_BUF, + symkey_clear = NULL_BUF, + symkey_cipher = NULL_BUF; + struct pack_buf_t symkey_cipher32, state_cipher32; + + UINT32 sealed_NVM_size; + BYTE *sealed_NVM = NULL; + long fh_size; + int fh, stat_ret, i; + struct stat file_stat; + TPM_DIGEST sealedNVMHash; + + memset(&symkey, 0, sizeof(symkey_t)); + + if (myDMI->NVMLocation == NULL) { + vtpmlogerror(VTPM_LOG_VTPM, "Unable to load NVM because the file name NULL.\n"); + status = TPM_AUTHFAIL; + goto abort_egress; + } + + //Read sealed blob off disk from NVMLocation + fh = open(myDMI->NVMLocation, O_RDONLY); + stat_ret = fstat(fh, &file_stat); + if (stat_ret == 0) + fh_size = file_stat.st_size; + else { + status = TPM_IOERROR; + goto abort_egress; + } + + sealed_NVM = (BYTE *) malloc(fh_size); + if (read(fh, sealed_NVM, fh_size) != fh_size) { + status = TPM_IOERROR; + goto abort_egress; + } + close(fh); + + vtpmloginfo(VTPM_LOG_VTPM_DEEP, "Load_NVMing[%ld]: 0x", fh_size); + for (i=0; i< fh_size; i++) + vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", sealed_NVM[i]); + vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + + sealed_NVM_size = BSG_UnpackList(sealed_NVM, 2, + BSG_TPM_SIZE32_DATA, &symkey_cipher32, + BSG_TPM_SIZE32_DATA, &state_cipher32); + + TPMTRYRETURN( buffer_init_convert (&symkey_cipher, + symkey_cipher32.size, + symkey_cipher32.data) ); + + TPMTRYRETURN( buffer_init_convert (&state_cipher, + state_cipher32.size, + state_cipher32.data) ); + + Crypto_SHA1Full(sealed_NVM, sealed_NVM_size, (BYTE *) &sealedNVMHash); + + // Verify measurement of sealed blob. + if (memcmp(&sealedNVMHash, &myDMI->NVM_measurement, sizeof(TPM_DIGEST)) ) { + vtpmlogerror(VTPM_LOG_VTPM, "VTPM LoadNVM NVM measurement check failed.\n"); + vtpmloginfo(VTPM_LOG_VTPM_DEEP, "Correct hash: "); + for (i=0; i< sizeof(TPM_DIGEST); i++) + vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", ((BYTE*)&myDMI->NVM_measurement)[i]); + vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + + vtpmloginfo(VTPM_LOG_VTPM_DEEP, "Measured hash: "); + for (i=0; i< sizeof(TPM_DIGEST); i++) + vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", ((BYTE*)&sealedNVMHash)[i]); + vtpmloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + + status = TPM_AUTHFAIL; + goto abort_egress; + } + + // Decrypt Symmetric Key + TPMTRYRETURN( VTSP_Unbind( myDMI->TCSContext, + vtpm_globals->storageKeyHandle, + &symkey_cipher, + (const TPM_AUTHDATA*)&vtpm_globals->storage_key_usage_auth, + &symkey_clear, + &(vtpm_globals->keyAuth) ) ); + + // create symmetric key using saved bits + Crypto_symcrypto_initkey (&symkey, &symkey_clear); + + // Decrypt State + TPMTRY(TPM_DECRYPT_ERROR, Crypto_symcrypto_decrypt (&symkey, &state_cipher, outbuf) ); + + goto egress; + + abort_egress: + vtpmlogerror(VTPM_LOG_VTPM, "Failed to load NVM\n."); + + egress: + + buffer_free ( &state_cipher); + buffer_free ( &symkey_clear); + buffer_free ( &symkey_cipher); + free( sealed_NVM ); + Crypto_symcrypto_freekey (&symkey); + + return status; +} + +TPM_RESULT VTPM_SaveService(void) { + TPM_RESULT status=TPM_SUCCESS; + int fh, dmis=-1; + + BYTE *flat_global; + int flat_global_size, bytes_written; + UINT32 storageKeySize = buffer_len(&vtpm_globals->storageKeyWrap); + struct pack_buf_t storage_key_pack = {storageKeySize, vtpm_globals->storageKeyWrap.bytes}; + + struct hashtable_itr *dmi_itr; + VTPM_DMI_RESOURCE *dmi_res; + + UINT32 flat_global_full_size; + + // Global Values needing to be saved + flat_global_full_size = 3*sizeof(TPM_DIGEST) + // Auths + sizeof(UINT32) + // storagekeysize + storageKeySize + // storage key + hashtable_count(vtpm_globals->dmi_map) * // num DMIS + (sizeof(UINT32) + 2*sizeof(TPM_DIGEST)); // Per DMI info + + + flat_global = (BYTE *) malloc( flat_global_full_size); + + flat_global_size = BSG_PackList(flat_global, 4, + BSG_TPM_AUTHDATA, &vtpm_globals->owner_usage_auth, + BSG_TPM_AUTHDATA, &vtpm_globals->srk_usage_auth, + BSG_TPM_SECRET, &vtpm_globals->storage_key_usage_auth, + BSG_TPM_SIZE32_DATA, &storage_key_pack); + + // Per DMI values to be saved + if (hashtable_count(vtpm_globals->dmi_map) > 0) { + + dmi_itr = hashtable_iterator(vtpm_globals->dmi_map); + do { + dmi_res = (VTPM_DMI_RESOURCE *) hashtable_iterator_value(dmi_itr); + dmis++; + + // No need to save dmi0. + if (dmi_res->dmi_id == 0) + continue; + + + flat_global_size += BSG_PackList( flat_global + flat_global_size, 3, + BSG_TYPE_UINT32, &dmi_res->dmi_id, + BSG_TPM_DIGEST, &dmi_res->NVM_measurement, + BSG_TPM_DIGEST, &dmi_res->DMI_measurement); + + } while (hashtable_iterator_advance(dmi_itr)); + } + + //FIXME: Once we have a way to protect a TPM key, we should use it to + // encrypt this blob. BUT, unless there is a way to ensure the key is + // not used by other apps, this encryption is useless. + fh = open(STATE_FILE, O_WRONLY | O_CREAT, S_IREAD | S_IWRITE); + if (fh == -1) { + vtpmlogerror(VTPM_LOG_VTPM, "Unable to open %s file for write.\n", STATE_FILE); + status = TPM_IOERROR; + goto abort_egress; + } + + if ( (bytes_written = write(fh, flat_global, flat_global_size)) != flat_global_size ) { + vtpmlogerror(VTPM_LOG_VTPM, "Failed to save service data. %d/%d bytes written.\n", bytes_written, flat_global_size); + status = TPM_IOERROR; + goto abort_egress; + } + vtpm_globals->DMI_table_dirty = FALSE; + + goto egress; + + abort_egress: + egress: + + free(flat_global); + close(fh); + + vtpmloginfo(VTPM_LOG_VTPM, "Saved VTPM Service state (status = %d, dmis = %d)\n", (int) status, dmis); + return status; +} + +TPM_RESULT VTPM_LoadService(void) { + + TPM_RESULT status=TPM_SUCCESS; + int fh, stat_ret, dmis=0; + long fh_size = 0, step_size; + BYTE *flat_global=NULL; + struct pack_buf_t storage_key_pack; + UINT32 *dmi_id_key; + + VTPM_DMI_RESOURCE *dmi_res; + struct stat file_stat; + + fh = open(STATE_FILE, O_RDONLY ); + stat_ret = fstat(fh, &file_stat); + if (stat_ret == 0) + fh_size = file_stat.st_size; + else { + status = TPM_IOERROR; + goto abort_egress; + } + + flat_global = (BYTE *) malloc(fh_size); + + if ((long) read(fh, flat_global, fh_size) != fh_size ) { + status = TPM_IOERROR; + goto abort_egress; + } + + // Global Values needing to be saved + step_size = BSG_UnpackList( flat_global, 4, + BSG_TPM_AUTHDATA, &vtpm_globals->owner_usage_auth, + BSG_TPM_AUTHDATA, &vtpm_globals->srk_usage_auth, + BSG_TPM_SECRET, &vtpm_globals->storage_key_usage_auth, + BSG_TPM_SIZE32_DATA, &storage_key_pack); + + TPMTRYRETURN(buffer_init(&vtpm_globals->storageKeyWrap, 0, 0) ); + TPMTRYRETURN(buffer_append_raw(&vtpm_globals->storageKeyWrap, storage_key_pack.size, storage_key_pack.data) ); + + // Per DMI values to be saved + while ( step_size < fh_size ){ + if (fh_size - step_size < (long) (sizeof(UINT32) + 2*sizeof(TPM_DIGEST))) { + vtpmlogerror(VTPM_LOG_VTPM, "Encountered %ld extra bytes at end of manager state.\n", fh_size-step_size); + step_size = fh_size; + } else { + dmi_res = (VTPM_DMI_RESOURCE *) malloc(sizeof(VTPM_DMI_RESOURCE)); + dmis++; + + dmi_res->connected = FALSE; + + step_size += BSG_UnpackList(flat_global + step_size, 3, + BSG_TYPE_UINT32, &dmi_res->dmi_id, + BSG_TPM_DIGEST, &dmi_res->NVM_measurement, + BSG_TPM_DIGEST, &dmi_res->DMI_measurement); + + // install into map + dmi_id_key = (UINT32 *) malloc (sizeof(UINT32)); + *dmi_id_key = dmi_res->dmi_id; + if (!hashtable_insert(vtpm_globals->dmi_map, dmi_id_key, dmi_res)) { + status = TPM_FAIL; + goto abort_egress; + } + + } + + } + + vtpmloginfo(VTPM_LOG_VTPM, "Loaded saved state (dmis = %d).\n", dmis); + goto egress; + + abort_egress: + vtpmlogerror(VTPM_LOG_VTPM, "Failed to load service data with error = %s\n", tpm_get_error_name(status)); + egress: + + if (flat_global) + free(flat_global); + close(fh); + + return status; +} diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/manager/vtpm_manager.c --- a/tools/vtpm_manager/manager/vtpm_manager.c Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/manager/vtpm_manager.c Tue Sep 20 09:43:46 2005 @@ -1,735 +1,811 @@ -// =================================================================== -// -// Copyright (c) 2005, Intel Corp. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions -// are met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following -// disclaimer in the documentation and/or other materials provided -// with the distribution. -// * Neither the name of Intel Corporation nor the names of its -// contributors may be used to endorse or promote products derived -// from this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS -// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE -// COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, -// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES -// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -// SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) -// HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, -// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED -// OF THE POSSIBILITY OF SUCH DAMAGE. -// =================================================================== -// -// vtpm_manager.c -// -// This file will house the main logic of the VTPM Manager -// -// ================================================================== - -#include <stdio.h> -#include <unistd.h> -#include <sys/types.h> -#include <sys/stat.h> -#include <fcntl.h> -#include <string.h> - -#ifndef VTPM_MULTI_VM -#include <pthread.h> -#include <errno.h> -#include <aio.h> -#include <time.h> -#endif - -#include "vtpm_manager.h" -#include "vtpmpriv.h" -#include "vtsp.h" -#include "bsg.h" -#include "hashtable.h" -#include "hashtable_itr.h" - -#include "log.h" -#include "buffer.h" - -VTPM_GLOBALS *vtpm_globals=NULL; - -#ifdef VTPM_MULTI_VM - #define vtpmhandlerloginfo(module,fmt,args...) vtpmloginfo (module, fmt, ##args ); - #define vtpmhandlerloginfomore(module,fmt,args...) vtpmloginfomore (module, fmt, ##args ); - #define vtpmhandlerlogerror(module,fmt,args...) vtpmlogerror (module, fmt, ##args ); -#else - #define vtpmhandlerloginfo(module,fmt,args...) vtpmloginfo (module, "[%d]: " fmt, threadType, ##args ); - #define vtpmhandlerloginfomore(module,fmt,args...) vtpmloginfomore (module, fmt, ##args ); - #define vtpmhandlerlogerror(module,fmt,args...) vtpmlogerror (module, "[%d]: " fmt, threadType, ##args ); -#endif - -// --------------------------- Static Auths -------------------------- -#ifdef USE_FIXED_SRK_AUTH - -static BYTE FIXED_SRK_AUTH[20] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; - -static BYTE FIXED_EK_AUTH[20] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; - -#endif - -// -------------------------- Hash table functions -------------------- - -static unsigned int hashfunc32(void *ky) { - return (* (UINT32 *) ky); -} - -static int equals32(void *k1, void *k2) { - return (*(UINT32 *) k1 == *(UINT32 *) k2); -} - -// --------------------------- Functions ------------------------------ - -TPM_RESULT VTPM_Create_Service(){ - - TPM_RESULT status = TPM_SUCCESS; - - // Generate Auth's for SRK & Owner -#ifdef USE_FIXED_SRK_AUTH - memcpy(vtpm_globals->owner_usage_auth, FIXED_SRK_AUTH, sizeof(TPM_AUTHDATA)); - memcpy(vtpm_globals->srk_usage_auth, FIXED_EK_AUTH, sizeof(TPM_AUTHDATA)); -#else - Crypto_GetRandom(vtpm_globals->owner_usage_auth, sizeof(TPM_AUTHDATA) ); - Crypto_GetRandom(vtpm_globals->srk_usage_auth, sizeof(TPM_AUTHDATA) ); -#endif - - // Take Owership of TPM - CRYPTO_INFO ek_cryptoInfo; - - vtpmloginfo(VTPM_LOG_VTPM, "Attempting Pubek Read. NOTE: Failure is ok.\n"); - status = VTSP_ReadPubek(vtpm_globals->manager_tcs_handle, &ek_cryptoInfo); - - // If we can read PubEK then there is no owner and we should take it. - if (status == TPM_SUCCESS) { - TPMTRYRETURN(VTSP_TakeOwnership(vtpm_globals->manager_tcs_handle, - (const TPM_AUTHDATA*)&vtpm_globals->owner_usage_auth, - (const TPM_AUTHDATA*)&vtpm_globals->srk_usage_auth, - &ek_cryptoInfo, - &vtpm_globals->keyAuth)); - - TPMTRYRETURN(VTSP_DisablePubekRead(vtpm_globals->manager_tcs_handle, - (const TPM_AUTHDATA*)&vtpm_globals->owner_usage_auth, - &vtpm_globals->keyAuth)); - } - - // Generate storage key's auth - Crypto_GetRandom( &vtpm_globals->storage_key_usage_auth, - sizeof(TPM_AUTHDATA) ); - - TCS_AUTH osap; - TPM_AUTHDATA sharedsecret; - - TPMTRYRETURN( VTSP_OSAP(vtpm_globals->manager_tcs_handle, - TPM_ET_SRK, - 0, - (const TPM_AUTHDATA*)&vtpm_globals->srk_usage_auth, - &sharedsecret, - &osap) ); - - TPMTRYRETURN( VTSP_CreateWrapKey( vtpm_globals->manager_tcs_handle, - TPM_KEY_BIND, - (const TPM_AUTHDATA*)&vtpm_globals->storage_key_usage_auth, - TPM_SRK_KEYHANDLE, - (const TPM_AUTHDATA*)&sharedsecret, - &vtpm_globals->storageKeyWrap, - &osap) ); - - vtpm_globals->keyAuth.fContinueAuthSession = TRUE; - - goto egress; - - abort_egress: - exit(1); - - egress: - vtpmloginfo(VTPM_LOG_VTPM, "New VTPM Service initialized (Status = %d).\n", status); - return status; - -} - - -////////////////////////////////////////////////////////////////////////////// -#ifdef VTPM_MULTI_VM -int VTPM_Service_Handler(){ -#else -void *VTPM_Service_Handler(void *threadTypePtr){ -#endif - TPM_RESULT status = TPM_FAIL; // Should never return - UINT32 dmi, in_param_size, cmd_size, out_param_size, out_message_size, out_message_size_full, dmi_cmd_size; - BYTE *cmd_header, *in_param, *out_message, *dmi_cmd; - buffer_t *command_buf=NULL, *result_buf=NULL; - TPM_TAG tag; - TPM_COMMAND_CODE ord; - VTPM_DMI_RESOURCE *dmi_res; - int size_read, size_write, i; - -#ifndef VTPM_MULTI_VM - int threadType = *(int *) threadTypePtr; - - // async io structures - struct aiocb dmi_aio; - struct aiocb *dmi_aio_a[1]; - dmi_aio_a[0] = &dmi_aio; -#endif - -#ifdef DUMMY_BACKEND - int dummy_rx; -#endif - - // TODO: Reinsert ifdefs to enable support for MULTI-VM - - cmd_header = (BYTE *) malloc(VTPM_COMMAND_HEADER_SIZE_SRV); - command_buf = (buffer_t *) malloc(sizeof(buffer_t)); - result_buf = (buffer_t *) malloc(sizeof(buffer_t)); - -#ifndef VTPM_MULTI_VM - TPM_RESULT *ret_value = (TPM_RESULT *) malloc(sizeof(TPM_RESULT)); -#endif - - int *tx_fh, *rx_fh; - -#ifdef VTPM_MULTI_VM - rx_fh = &vtpm_globals->be_fh; -#else - if (threadType == BE_LISTENER_THREAD) { -#ifdef DUMMY_BACKEND - dummy_rx = -1; - rx_fh = &dummy_rx; -#else - rx_fh = &vtpm_globals->be_fh; -#endif - } else { // DMI_LISTENER_THREAD - rx_fh = &vtpm_globals->vtpm_rx_fh; - } -#endif - -#ifndef VTPM_MULTI_VM - int fh; - if (threadType == BE_LISTENER_THREAD) { - tx_fh = &vtpm_globals->be_fh; - if ( (fh = open(GUEST_RX_FIFO, O_RDWR)) == -1) { - if ( mkfifo(GUEST_RX_FIFO, S_IWUSR | S_IRUSR ) ){ - *ret_value = TPM_FAIL; - pthread_exit(ret_value); - } - } else - close(fh); - - } else { // else DMI_LISTENER_THREAD - // tx_fh will be set once the DMI is identified - // But we need to make sure the read pip is created. - if ( (fh = open(VTPM_RX_FIFO, O_RDWR)) == -1) { - if ( mkfifo(VTPM_RX_FIFO, S_IWUSR | S_IRUSR ) ){ - *ret_value = TPM_FAIL; - pthread_exit(ret_value); - } - } else - close(fh); - - } -#endif - - while(1) { - - if (threadType == BE_LISTENER_THREAD) { - vtpmhandlerloginfo(VTPM_LOG_VTPM, "Waiting for Guest requests & ctrl messages.\n"); - } else - vtpmhandlerloginfo(VTPM_LOG_VTPM, "Waiting for DMI messages.\n"); - - - if (*rx_fh < 0) { - if (threadType == BE_LISTENER_THREAD) -#ifdef DUMMY_BACKEND - *rx_fh = open("/tmp/in.fifo", O_RDWR); -#else - *rx_fh = open(VTPM_BE_DEV, O_RDWR); -#endif - else // DMI Listener - *rx_fh = open(VTPM_RX_FIFO, O_RDWR); - - } - - if (*rx_fh < 0) { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Can't open inbound fh.\n"); -#ifdef VTPM_MULTI_VM - return TPM_IOERROR; -#else - *ret_value = TPM_IOERROR; - pthread_exit(ret_value); -#endif - } - - size_read = read(*rx_fh, cmd_header, VTPM_COMMAND_HEADER_SIZE_SRV); - if (size_read > 0) { - vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "RECV[%d}: 0x", size_read); - for (i=0; i<size_read; i++) - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", cmd_header[i]); - } else { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Can't read from BE. Aborting... \n"); - close(*rx_fh); - *rx_fh = -1; - goto abort_command; - } - - if (size_read < (int) VTPM_COMMAND_HEADER_SIZE_SRV) { - vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "\n"); - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Command shorter than normal header (%d bytes). Aborting...\n", size_read); - goto abort_command; - } - - BSG_UnpackList(cmd_header, 4, - BSG_TYPE_UINT32, &dmi, - BSG_TPM_TAG, &tag, - BSG_TYPE_UINT32, &in_param_size, - BSG_TPM_COMMAND_CODE, &ord ); - - // Note that in_param_size is in the client's context - cmd_size = in_param_size - VTPM_COMMAND_HEADER_SIZE_CLT; - if (cmd_size > 0) { - in_param = (BYTE *) malloc(cmd_size); - size_read = read( *rx_fh, in_param, cmd_size); - if (size_read > 0) { - for (i=0; i<size_read; i++) - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", in_param[i]); - - } else { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error reading from BE. Aborting... \n"); - close(*rx_fh); - *rx_fh = -1; - goto abort_command; - } - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); - - if (size_read < (int) cmd_size) { - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Command read(%d) is shorter than header indicates(%d). Aborting...\n", size_read, cmd_size); - goto abort_command; - } - } else { - in_param = NULL; - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); - } - - if ((threadType != BE_LISTENER_THREAD) && (dmi == 0)) { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Attempt to access dom0 commands from DMI interface. Aborting...\n"); - goto abort_command; - } - - dmi_res = (VTPM_DMI_RESOURCE *) hashtable_search(vtpm_globals->dmi_map, &dmi); - if (dmi_res == NULL) { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Attempted access to non-existent DMI in domain: %d. Aborting...\n", dmi); - goto abort_command; - } - if (!dmi_res->connected) { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Attempted access to disconnected DMI in domain: %d. Aborting...\n", dmi); - goto abort_command; - } - - if (threadType != BE_LISTENER_THREAD) - tx_fh = &dmi_res->vtpm_tx_fh; - // else we set this before the while loop since it doesn't change. - - if ( (buffer_init_convert(command_buf, cmd_size, in_param) != TPM_SUCCESS) || - (buffer_init(result_buf, 0, 0) != TPM_SUCCESS) ) { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Failed to setup buffers. Aborting...\n"); - goto abort_command; - } - - // Dispatch it as either control or user request. - if (tag == VTPM_TAG_REQ) { - if (dmi_res->dmi_id == VTPM_CTL_DM){ - switch (ord) { - case VTPM_ORD_OPEN: - status = VTPM_Handle_New_DMI(command_buf); - break; - - case VTPM_ORD_CLOSE: - status = VTPM_Handle_Close_DMI(command_buf); - break; - - case VTPM_ORD_DELETE: - status = VTPM_Handle_Delete_DMI(command_buf); - break; - default: - status = TPM_BAD_ORDINAL; - } // switch - } else { - - switch (ord) { - case VTPM_ORD_SAVENVM: - status= VTPM_Handle_Save_NVM(dmi_res, - command_buf, - result_buf); - break; - case VTPM_ORD_LOADNVM: - status= VTPM_Handle_Load_NVM(dmi_res, - command_buf, - result_buf); - break; - - case VTPM_ORD_TPMCOMMAND: - status= VTPM_Handle_TPM_Command(dmi_res, - command_buf, - result_buf); - break; - - default: - status = TPM_BAD_ORDINAL; - } // switch - } - } else { // This is not a VTPM Command at all - - if (threadType == BE_LISTENER_THREAD) { - if (dmi == 0) { - // This usually indicates a FE/BE driver. - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Illegal use of TPM command from dom0\n"); - status = TPM_FAIL; - } else { - vtpmhandlerloginfo(VTPM_LOG_VTPM, "Forwarding command to DMI.\n"); - - if (dmi_res->guest_tx_fh < 0) - dmi_res->guest_tx_fh = open(dmi_res->guest_tx_fname, O_WRONLY | O_NONBLOCK); - - if (dmi_res->guest_tx_fh < 0){ - vtpmhandlerlogerror(VTPM_LOG_VTPM, "VTPM ERROR: Can't open outbound fh to dmi.\n"); - status = TPM_IOERROR; - goto abort_with_error; - } - - //Note: Send message + dmi_id - if (cmd_size) { - dmi_cmd = (BYTE *) malloc(VTPM_COMMAND_HEADER_SIZE_SRV + cmd_size); - dmi_cmd_size = VTPM_COMMAND_HEADER_SIZE_SRV + cmd_size; - memcpy(dmi_cmd, cmd_header, VTPM_COMMAND_HEADER_SIZE_SRV); - memcpy(dmi_cmd + VTPM_COMMAND_HEADER_SIZE_SRV, in_param, cmd_size); - size_write = write(dmi_res->guest_tx_fh, dmi_cmd, dmi_cmd_size); - - if (size_write > 0) { - vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "SENT (DMI): 0x"); - for (i=0; i<VTPM_COMMAND_HEADER_SIZE_SRV + cmd_size; i++) { - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", dmi_cmd[i]); - } - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); - } else { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error writing to DMI. Aborting... \n"); - close(dmi_res->guest_tx_fh); - dmi_res->guest_tx_fh = -1; - status = TPM_IOERROR; - goto abort_with_error; - } - free(dmi_cmd); - } else { - dmi_cmd_size = VTPM_COMMAND_HEADER_SIZE_SRV; - size_write = write(dmi_res->guest_tx_fh, cmd_header, VTPM_COMMAND_HEADER_SIZE_SRV ); - if (size_write > 0) { - for (i=0; i<VTPM_COMMAND_HEADER_SIZE_SRV; i++) - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", cmd_header[i]); - - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); - } else { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error writing to DMI. Aborting... \n"); - close(dmi_res->guest_tx_fh); - dmi_res->guest_tx_fh = -1; - status = TPM_IOERROR; - goto abort_with_error; - } - } - - if (size_write != (int) dmi_cmd_size) - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Could not write entire command to DMI (%d/%d)\n", size_write, dmi_cmd_size); - buffer_free(command_buf); - - if (vtpm_globals->guest_rx_fh < 0) - vtpm_globals->guest_rx_fh = open(GUEST_RX_FIFO, O_RDONLY); - - if (vtpm_globals->guest_rx_fh < 0){ - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Can't open inbound fh to dmi.\n"); - status = TPM_IOERROR; - goto abort_with_error; - } - - size_read = read( vtpm_globals->guest_rx_fh, cmd_header, VTPM_COMMAND_HEADER_SIZE_SRV); - if (size_read > 0) { - vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "RECV (DMI): 0x"); - for (i=0; i<size_read; i++) - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", cmd_header[i]); - - } else { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error reading from DMI. Aborting... \n"); - close(vtpm_globals->guest_rx_fh); - vtpm_globals->guest_rx_fh = -1; - status = TPM_IOERROR; - goto abort_with_error; - } - - if (size_read < (int) VTPM_COMMAND_HEADER_SIZE_SRV) { - //vtpmdeepsublog("\n"); - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Command from DMI shorter than normal header. Aborting...\n"); - status = TPM_IOERROR; - goto abort_with_error; - } - - BSG_UnpackList(cmd_header, 4, - BSG_TYPE_UINT32, &dmi, - BSG_TPM_TAG, &tag, - BSG_TYPE_UINT32, &in_param_size, - BSG_TPM_COMMAND_CODE, &status ); - - // Note that in_param_size is in the client's context - cmd_size = in_param_size - VTPM_COMMAND_HEADER_SIZE_CLT; - if (cmd_size > 0) { - in_param = (BYTE *) malloc(cmd_size); - size_read = read( vtpm_globals->guest_rx_fh, in_param, cmd_size); - if (size_read > 0) { - for (i=0; i<size_read; i++) - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", in_param[i]); - - } else { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error reading from BE. Aborting... \n"); - close(vtpm_globals->guest_rx_fh); - vtpm_globals->guest_rx_fh = -1; - status = TPM_IOERROR; - goto abort_with_error; - } - vtpmhandlerloginfomore(VTPM_LOG_VTPM, "\n"); - - if (size_read < (int)cmd_size) { - vtpmhandlerloginfomore(VTPM_LOG_VTPM, "\n"); - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Command read(%d) from DMI is shorter than header indicates(%d). Aborting...\n", size_read, cmd_size); - status = TPM_IOERROR; - goto abort_with_error; - } - } else { - in_param = NULL; - vtpmhandlerloginfomore(VTPM_LOG_VTPM, "\n"); - } - - if (buffer_init_convert(result_buf, cmd_size, in_param) != TPM_SUCCESS) { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Failed to setup buffers. Aborting...\n"); - status = TPM_FAIL; - goto abort_with_error; - } - - vtpmhandlerloginfo(VTPM_LOG_VTPM, "Sending DMI's response to guest.\n"); - } // end else for if (dmi==0) - - } else { // This is a DMI lister thread. Thus this is from a DMI -#ifdef VTPM_MULTI_VM - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Attempt to use unsupported direct access to TPM.\n"); - vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "Bad Command. dmi:%d, tag:%d, size:%d, ord:%d, Params: ", dmi, tag, in_param_size, ord); - for (UINT32 q=0; q<cmd_size; q++) - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", in_param[q]); - - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); - - status = TPM_FAIL; -#else - -#endif - } // end else for if BE Listener - } // end else for is VTPM Command - - // Send response to Backend - if (*tx_fh < 0) { - if (threadType == BE_LISTENER_THREAD) -#ifdef DUMMY_BACKEND - *tx_fh = open("/tmp/out.fifo", O_RDWR); -#else - *tx_fh = open(VTPM_BE_DEV, O_RDWR); -#endif - else // DMI Listener - *tx_fh = open(dmi_res->vtpm_tx_fname, O_WRONLY); - } - - if (*tx_fh < 0) { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "VTPM ERROR: Can't open outbound fh.\n"); -#ifdef VTPM_MULTI_VM - return TPM_IOERROR; -#else - *ret_value = TPM_IOERROR; - pthread_exit(ret_value); -#endif - } - - abort_with_error: - // Prepend VTPM header with destination DM stamped - out_param_size = buffer_len(result_buf); - out_message_size = VTPM_COMMAND_HEADER_SIZE_CLT + out_param_size; - out_message_size_full = VTPM_COMMAND_HEADER_SIZE_SRV + out_param_size; - out_message = (BYTE *) malloc (out_message_size_full); - - BSG_PackList(out_message, 4, - BSG_TYPE_UINT32, (BYTE *) &dmi, - BSG_TPM_TAG, (BYTE *) &tag, - BSG_TYPE_UINT32, (BYTE *) &out_message_size, - BSG_TPM_RESULT, (BYTE *) &status); - - if (buffer_len(result_buf) > 0) - memcpy(out_message + VTPM_COMMAND_HEADER_SIZE_SRV, result_buf->bytes, out_param_size); - - - //Note: Send message + dmi_id - size_write = write(*tx_fh, out_message, out_message_size_full ); - if (size_write > 0) { - vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "SENT: 0x"); - for (i=0; i < out_message_size_full; i++) - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", out_message[i]); - - vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); - } else { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error writing to BE. Aborting... \n"); - close(*tx_fh); - *tx_fh = -1; - goto abort_command; - } - free(out_message); - - if (size_write < (int)out_message_size_full) { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "Unable to write full command to BE (%d/%d)\n", size_write, out_message_size_full); - goto abort_command; - } - - abort_command: - //free buffers - bzero(cmd_header, VTPM_COMMAND_HEADER_SIZE_SRV); - //free(in_param); // This was converted to command_buf. No need to free - if (command_buf != result_buf) - buffer_free(result_buf); - - buffer_free(command_buf); - -#ifndef VTPM_MULTI_VM - if (threadType != BE_LISTENER_THREAD) { -#endif - if ( (vtpm_globals->DMI_table_dirty) && - (VTPM_SaveService() != TPM_SUCCESS) ) { - vtpmhandlerlogerror(VTPM_LOG_VTPM, "ERROR: Unable to save manager data.\n"); - } -#ifndef VTPM_MULTI_VM - } -#endif - - } // End while(1) - -} - - -/////////////////////////////////////////////////////////////////////////////// -TPM_RESULT VTPM_Init_Service() { - TPM_RESULT status = TPM_FAIL; - BYTE *randomsead; - UINT32 randomsize; - - if ((vtpm_globals = (VTPM_GLOBALS *) malloc(sizeof(VTPM_GLOBALS))) == NULL){ - status = TPM_FAIL; - goto abort_egress; - } - memset(vtpm_globals, 0, sizeof(VTPM_GLOBALS)); - vtpm_globals->be_fh = -1; - -#ifndef VTPM_MULTI_VM - vtpm_globals->vtpm_rx_fh = -1; - vtpm_globals->guest_rx_fh = -1; -#endif - if ((vtpm_globals->dmi_map = create_hashtable(10, hashfunc32, equals32)) == NULL){ - status = TPM_FAIL; - goto abort_egress; - } - - vtpm_globals->DMI_table_dirty = FALSE; - - // Create new TCS Object - vtpm_globals->manager_tcs_handle = 0; - - TPMTRYRETURN(TCS_create()); - - // Create TCS Context for service - TPMTRYRETURN( TCS_OpenContext(&vtpm_globals->manager_tcs_handle ) ); - - TPMTRYRETURN( TCSP_GetRandom(vtpm_globals->manager_tcs_handle, - &randomsize, - &randomsead)); - - Crypto_Init(randomsead, randomsize); - TPMTRYRETURN( TCS_FreeMemory (vtpm_globals->manager_tcs_handle, randomsead)); - - // Create OIAP session for service's authorized commands - TPMTRYRETURN( VTSP_OIAP( vtpm_globals->manager_tcs_handle, - &vtpm_globals->keyAuth) ); - vtpm_globals->keyAuth.fContinueAuthSession = TRUE; - - // If failed, create new Service. - if (VTPM_LoadService() != TPM_SUCCESS) - TPMTRYRETURN( VTPM_Create_Service() ); - - - //Load Storage Key - TPMTRYRETURN( VTSP_LoadKey( vtpm_globals->manager_tcs_handle, - TPM_SRK_KEYHANDLE, - &vtpm_globals->storageKeyWrap, - (const TPM_AUTHDATA*)&vtpm_globals->srk_usage_auth, - &vtpm_globals->storageKeyHandle, - &vtpm_globals->keyAuth, - &vtpm_globals->storageKey) ); - - // Create entry for Dom0 for control messages - TPMTRYRETURN( VTPM_Handle_New_DMI(NULL) ); - - // --------------------- Command handlers --------------------------- - - goto egress; - - abort_egress: - egress: - - return(status); -} - -void VTPM_Stop_Service() { - VTPM_DMI_RESOURCE *dmi_res; - struct hashtable_itr *dmi_itr; - - // Close all the TCS contexts. TCS should evict keys based on this - if (hashtable_count(vtpm_globals->dmi_map) > 0) { - dmi_itr = hashtable_iterator(vtpm_globals->dmi_map); - do { - dmi_res = (VTPM_DMI_RESOURCE *) hashtable_iterator_value(dmi_itr); - if (dmi_res->connected) - if (close_dmi( dmi_res ) != TPM_SUCCESS) - vtpmlogerror(VTPM_LOG_VTPM, "Failed to close dmi %d properly.\n", dmi_res->dmi_id); - - } while (hashtable_iterator_advance(dmi_itr)); - free (dmi_itr); - } - - - TCS_CloseContext(vtpm_globals->manager_tcs_handle); - - if ( (vtpm_globals->DMI_table_dirty) && - (VTPM_SaveService() != TPM_SUCCESS) ) - vtpmlogerror(VTPM_LOG_VTPM, "Unable to save manager data.\n"); - - hashtable_destroy(vtpm_globals->dmi_map, 1); - free(vtpm_globals); - - close(vtpm_globals->be_fh); - Crypto_Exit(); - - vtpmloginfo(VTPM_LOG_VTPM, "VTPM Manager stopped.\n"); -} +// =================================================================== +// +// Copyright (c) 2005, Intel Corp. +// All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// +// * Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// * Redistributions in binary form must reproduce the above +// copyright notice, this list of conditions and the following +// disclaimer in the documentation and/or other materials provided +// with the distribution. +// * Neither the name of Intel Corporation nor the names of its +// contributors may be used to endorse or promote products derived +// from this software without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +// COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +// SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +// HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, +// STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED +// OF THE POSSIBILITY OF SUCH DAMAGE. +// =================================================================== +// +// vtpm_manager.c +// +// This file will house the main logic of the VTPM Manager +// +// ================================================================== + +#include <stdio.h> +#include <unistd.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> +#include <string.h> + +#ifndef VTPM_MULTI_VM +#include <pthread.h> +#include <errno.h> +#include <aio.h> +#include <time.h> +#endif + +#include "vtpm_manager.h" +#include "vtpmpriv.h" +#include "vtsp.h" +#include "bsg.h" +#include "hashtable.h" +#include "hashtable_itr.h" + +#include "log.h" +#include "buffer.h" + +VTPM_GLOBALS *vtpm_globals=NULL; + +#ifdef VTPM_MULTI_VM + #define vtpmhandlerloginfo(module,fmt,args...) vtpmloginfo (module, fmt, ##args ); + #define vtpmhandlerloginfomore(module,fmt,args...) vtpmloginfomore (module, fmt, ##args ); + #define vtpmhandlerlogerror(module,fmt,args...) vtpmlogerror (module, fmt, ##args ); +#else + #define vtpmhandlerloginfo(module,fmt,args...) vtpmloginfo (module, "[%d]: " fmt, threadType, ##args ); + #define vtpmhandlerloginfomore(module,fmt,args...) vtpmloginfomore (module, fmt, ##args ); + #define vtpmhandlerlogerror(module,fmt,args...) vtpmlogerror (module, "[%d]: " fmt, threadType, ##args ); +#endif + +// --------------------------- Well Known Auths -------------------------- +#ifdef WELL_KNOWN_SRK_AUTH +static BYTE FIXED_SRK_AUTH[20] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; +#endif + +#ifdef WELL_KNOWN_OWNER_AUTH +static BYTE FIXED_OWNER_AUTH[20] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; +#endif + +// -------------------------- Hash table functions -------------------- + +static unsigned int hashfunc32(void *ky) { + return (* (UINT32 *) ky); +} + +static int equals32(void *k1, void *k2) { + return (*(UINT32 *) k1 == *(UINT32 *) k2); +} + +// --------------------------- Functions ------------------------------ + +TPM_RESULT VTPM_Create_Service(){ + + TPM_RESULT status = TPM_SUCCESS; + + // Generate Auth's for SRK & Owner +#ifdef WELL_KNOWN_SRK_AUTH + memcpy(vtpm_globals->srk_usage_auth, FIXED_SRK_AUTH, sizeof(TPM_AUTHDATA)); +#else + Crypto_GetRandom(vtpm_globals->srk_usage_auth, sizeof(TPM_AUTHDATA) ); +#endif + +#ifdef WELL_KNOWN_OWNER_AUTH + memcpy(vtpm_globals->owner_usage_auth, FIXED_OWNER_AUTH, sizeof(TPM_AUTHDATA)); +#else + Crypto_GetRandom(vtpm_globals->owner_usage_auth, sizeof(TPM_AUTHDATA) ); +#endif + + // Take Owership of TPM + CRYPTO_INFO ek_cryptoInfo; + + vtpmloginfo(VTPM_LOG_VTPM, "Attempting Pubek Read. NOTE: Failure is ok.\n"); + status = VTSP_ReadPubek(vtpm_globals->manager_tcs_handle, &ek_cryptoInfo); + + // If we can read PubEK then there is no owner and we should take it. + if (status == TPM_SUCCESS) { + TPMTRYRETURN(VTSP_TakeOwnership(vtpm_globals->manager_tcs_handle, + (const TPM_AUTHDATA*)&vtpm_globals->owner_usage_auth, + (const TPM_AUTHDATA*)&vtpm_globals->srk_usage_auth, + &ek_cryptoInfo, + &vtpm_globals->keyAuth)); + + TPMTRYRETURN(VTSP_DisablePubekRead(vtpm_globals->manager_tcs_handle, + (const TPM_AUTHDATA*)&vtpm_globals->owner_usage_auth, + &vtpm_globals->keyAuth)); + } + + // Generate storage key's auth + Crypto_GetRandom( &vtpm_globals->storage_key_usage_auth, + sizeof(TPM_AUTHDATA) ); + + TCS_AUTH osap; + TPM_AUTHDATA sharedsecret; + + TPMTRYRETURN( VTSP_OSAP(vtpm_globals->manager_tcs_handle, + TPM_ET_SRK, + 0, + (const TPM_AUTHDATA*)&vtpm_globals->srk_usage_auth, + &sharedsecret, + &osap) ); + + TPMTRYRETURN( VTSP_CreateWrapKey( vtpm_globals->manager_tcs_handle, + TPM_KEY_BIND, + (const TPM_AUTHDATA*)&vtpm_globals->storage_key_usage_auth, + TPM_SRK_KEYHANDLE, + (const TPM_AUTHDATA*)&sharedsecret, + &vtpm_globals->storageKeyWrap, + &osap) ); + + vtpm_globals->keyAuth.fContinueAuthSession = TRUE; + + goto egress; + + abort_egress: + exit(1); + + egress: + vtpmloginfo(VTPM_LOG_VTPM, "Finished initialized new VTPM service (Status = %d).\n", status); + return status; + +} + + +////////////////////////////////////////////////////////////////////////////// +#ifdef VTPM_MULTI_VM +int VTPM_Service_Handler(){ +#else +void *VTPM_Service_Handler(void *threadTypePtr){ +#endif + TPM_RESULT status = TPM_FAIL; // Should never return + UINT32 dmi, in_param_size, cmd_size, out_param_size, out_message_size, out_message_size_full; + BYTE *cmd_header, *in_param, *out_message; + buffer_t *command_buf=NULL, *result_buf=NULL; + TPM_TAG tag; + TPM_COMMAND_CODE ord; + VTPM_DMI_RESOURCE *dmi_res; + int size_read, size_write, i; + +#ifndef VTPM_MULTI_VM + UINT32 dmi_cmd_size; + BYTE *dmi_cmd; + int threadType = *(int *) threadTypePtr; + + // async io structures + struct aiocb dmi_aio; + struct aiocb *dmi_aio_a[1]; + dmi_aio_a[0] = &dmi_aio; +#endif + +#ifdef DUMMY_BACKEND + int dummy_rx; +#endif + + cmd_header = (BYTE *) malloc(VTPM_COMMAND_HEADER_SIZE_SRV); + command_buf = (buffer_t *) malloc(sizeof(buffer_t)); + result_buf = (buffer_t *) malloc(sizeof(buffer_t)); + +#ifndef VTPM_MULTI_VM + TPM_RESULT *ret_value = (TPM_RESULT *) malloc(sizeof(TPM_RESULT)); +#endif + + int *tx_fh, // Pointer to the filehandle this function will write to + *rx_fh; // Pointer to the filehandle this function will read from + // For a multi VM VTPM system, this function tx/rx with the BE + // via vtpm_globals->be_fh. + // For a single VM system, the BE_LISTENER_THREAD tx/rx with theBE + // via vtpm_globals->be_fh, and the DMI_LISTENER_THREAD rx from + // vtpm_globals->vtpm_rx_fh and tx to dmi_res->vtpm_tx_fh + + // Set rx_fh to point to the correct fh based on this mode. +#ifdef VTPM_MULTI_VM + rx_fh = &vtpm_globals->be_fh; +#else + if (threadType == BE_LISTENER_THREAD) { + #ifdef DUMMY_BACKEND + dummy_rx = -1; + rx_fh = &dummy_rx; + #else + rx_fh = &vtpm_globals->be_fh; + #endif + } else { // DMI_LISTENER_THREAD + rx_fh = &vtpm_globals->vtpm_rx_fh; + } +#endif + + // Set tx_fh to point to the correct fh based on this mode (If static) + // Create any fifos that these fh will use. +#ifndef VTPM_MULTI_VM + int fh; + if (threadType == BE_LISTENER_THREAD) { + tx_fh = &vtpm_globals->be_fh; + if ( (fh = open(GUEST_RX_FIFO, O_RDWR)) == -1) { + if ( mkfifo(GUEST_RX_FIFO, S_IWUSR | S_IRUSR ) ){ + vtpmlogerror(VTPM_LOG_VTPM, "Unable to create FIFO: %s.\n", GUEST_RX_FIFO); + *ret_value = TPM_FAIL; + pthread_exit(ret_value); + } + } else + close(fh); + + } else { // else DMI_LISTENER_THREAD + // tx_fh will be set once the DMI is identified + // But we need to make sure the read pip is created. + if ( (fh = open(VTPM_RX_FIFO, O_RDWR)) == -1) { + if ( mkfifo(VTPM_RX_FIFO, S_IWUSR | S_IRUSR ) ){ + vtpmlogerror(VTPM_LOG_VTPM, "Unable to create FIFO: %s.\n", VTPM_RX_FIFO); + *ret_value = TPM_FAIL; + pthread_exit(ret_value); + } + } else + close(fh); + + } +#else + tx_fh = &vtpm_globals->be_fh; +#endif + + ////////////////////////// Main Loop ////////////////////////////////// + while(1) { + +#ifdef VTPM_MULTI_VM + vtpmhandlerloginfo(VTPM_LOG_VTPM, "Waiting for DMI messages.\n"); +#else + if (threadType == BE_LISTENER_THREAD) { + vtpmhandlerloginfo(VTPM_LOG_VTPM, "Waiting for Guest requests & ctrl messages.\n"); + } else + vtpmhandlerloginfo(VTPM_LOG_VTPM, "Waiting for DMI messages.\n"); +#endif + + // Check status of rx_fh. If necessary attempt to re-open it. + if (*rx_fh < 0) { +#ifdef VTPM_MULTI_VM + *rx_fh = open(VTPM_BE_DEV, O_RDWR); +#else + if (threadType == BE_LISTENER_THREAD) + #ifdef DUMMY_BACKEND + *rx_fh = open("/tmp/in.fifo", O_RDWR); + #else + *rx_fh = open(VTPM_BE_DEV, O_RDWR); + #endif + else // DMI Listener + *rx_fh = open(VTPM_RX_FIFO, O_RDWR); +#endif + } + + // Respond to failures to open rx_fh + if (*rx_fh < 0) { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Can't open inbound fh.\n"); +#ifdef VTPM_MULTI_VM + return TPM_IOERROR; +#else + *ret_value = TPM_IOERROR; + pthread_exit(ret_value); +#endif + } + + // Read command header from rx_fh + size_read = read(*rx_fh, cmd_header, VTPM_COMMAND_HEADER_SIZE_SRV); + if (size_read > 0) { + vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "RECV[%d}: 0x", size_read); + for (i=0; i<size_read; i++) + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", cmd_header[i]); + } else { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Can't read from BE. Aborting... \n"); + close(*rx_fh); + *rx_fh = -1; + goto abort_command; + } + + if (size_read < (int) VTPM_COMMAND_HEADER_SIZE_SRV) { + vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "\n"); + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Command shorter than normal header (%d bytes). Aborting...\n", size_read); + goto abort_command; + } + + // Unpack header + BSG_UnpackList(cmd_header, 4, + BSG_TYPE_UINT32, &dmi, + BSG_TPM_TAG, &tag, + BSG_TYPE_UINT32, &in_param_size, + BSG_TPM_COMMAND_CODE, &ord ); + + // Using the header info, read from rx_fh the parameters of the command + // Note that in_param_size is in the client's context + cmd_size = in_param_size - VTPM_COMMAND_HEADER_SIZE_CLT; + if (cmd_size > 0) { + in_param = (BYTE *) malloc(cmd_size); + size_read = read( *rx_fh, in_param, cmd_size); + if (size_read > 0) { + for (i=0; i<size_read; i++) + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", in_param[i]); + + } else { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error reading from cmd. Aborting... \n"); + close(*rx_fh); + *rx_fh = -1; + goto abort_command; + } + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + + if (size_read < (int) cmd_size) { + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Command read(%d) is shorter than header indicates(%d). Aborting...\n", size_read, cmd_size); + goto abort_command; + } + } else { + in_param = NULL; + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + } + +#ifndef VTPM_MULTI_VM + // It's illegal to receive a Dom0 command from a DMI. + if ((threadType != BE_LISTENER_THREAD) && (dmi == 0)) { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Attempt to access dom0 commands from DMI interface. Aborting...\n"); + goto abort_command; + } +#endif + + // Fetch infomation about the DMI issuing the request. + dmi_res = (VTPM_DMI_RESOURCE *) hashtable_search(vtpm_globals->dmi_map, &dmi); + if (dmi_res == NULL) { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Attempted access to non-existent DMI in domain: %d. Aborting...\n", dmi); + goto abort_command; + } + if (!dmi_res->connected) { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Attempted access to disconnected DMI in domain: %d. Aborting...\n", dmi); + goto abort_command; + } + +#ifndef VTPM_MULTI_VM + // Now that we know which DMI this is, we can set the tx_fh handle. + if (threadType != BE_LISTENER_THREAD) + tx_fh = &dmi_res->vtpm_tx_fh; + // else we set this before the while loop since it doesn't change. +#endif + + // Init the buffers used to handle the command and the response + if ( (buffer_init_convert(command_buf, cmd_size, in_param) != TPM_SUCCESS) || + (buffer_init(result_buf, 0, 0) != TPM_SUCCESS) ) { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Failed to setup buffers. Aborting...\n"); + goto abort_command; + } + + // Dispatch it as either control or user request. + if (tag == VTPM_TAG_REQ) { + if (dmi_res->dmi_id == VTPM_CTL_DM){ + switch (ord) { + case VTPM_ORD_OPEN: + status = VTPM_Handle_New_DMI(command_buf); + break; + + case VTPM_ORD_CLOSE: + status = VTPM_Handle_Close_DMI(command_buf); + break; + + case VTPM_ORD_DELETE: + status = VTPM_Handle_Delete_DMI(command_buf); + break; + default: + status = TPM_BAD_ORDINAL; + } // switch + } else { + + switch (ord) { + case VTPM_ORD_SAVENVM: + status= VTPM_Handle_Save_NVM(dmi_res, + command_buf, + result_buf); + break; + case VTPM_ORD_LOADNVM: + status= VTPM_Handle_Load_NVM(dmi_res, + command_buf, + result_buf); + break; + + case VTPM_ORD_TPMCOMMAND: + status= VTPM_Handle_TPM_Command(dmi_res, + command_buf, + result_buf); + break; + + default: + status = TPM_BAD_ORDINAL; + } // switch + } + } else { // This is not a VTPM Command at all. + // This happens in two cases. + // MULTI_VM = A DMI illegally sent a raw TPM command to the manager + // Single VM: + // BE_LISTENER_THREAD: Guest issued a TPM command. + // Send this to DMI and wait for response + // DMI_LISTENER_THREAD: A DMI illegally sent a raw TPM command. + +#ifdef VTPM_MULTI_VM + // Raw TPM commands are not supported from the DMI + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Attempt to use unsupported direct access to TPM.\n"); + vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "Bad Command. dmi:%d, tag:%d, size:%d, ord:%d, Params: ", dmi, tag, in_param_size, ord); + for (i=0; i<cmd_size; i++) + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", in_param[i]); + + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + status = TPM_FAIL; + +#else + // If BE_LISTENER_THREAD then this is a TPM command from a guest + if (threadType == BE_LISTENER_THREAD) { + // Dom0 can't talk to the BE, so this must be a broken FE/BE or badness + if (dmi == 0) { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Illegal use of TPM command from dom0\n"); + status = TPM_FAIL; + } else { + vtpmhandlerloginfo(VTPM_LOG_VTPM, "Forwarding command to DMI.\n"); + + // open the dmi_res->guest_tx_fh to send command to DMI + if (dmi_res->guest_tx_fh < 0) + dmi_res->guest_tx_fh = open(dmi_res->guest_tx_fname, O_WRONLY | O_NONBLOCK); + + // handle failed opens dmi_res->guest_tx_fh + if (dmi_res->guest_tx_fh < 0){ + vtpmhandlerlogerror(VTPM_LOG_VTPM, "VTPM ERROR: Can't open outbound fh to dmi.\n"); + status = TPM_IOERROR; + goto abort_with_error; + } + + //Forward TPM CMD stamped with dmi_id to DMI for handling + if (cmd_size) { + dmi_cmd = (BYTE *) malloc(VTPM_COMMAND_HEADER_SIZE_SRV + cmd_size); + dmi_cmd_size = VTPM_COMMAND_HEADER_SIZE_SRV + cmd_size; + memcpy(dmi_cmd, cmd_header, VTPM_COMMAND_HEADER_SIZE_SRV); + memcpy(dmi_cmd + VTPM_COMMAND_HEADER_SIZE_SRV, in_param, cmd_size); + size_write = write(dmi_res->guest_tx_fh, dmi_cmd, dmi_cmd_size); + + if (size_write > 0) { + vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "SENT (DMI): 0x"); + for (i=0; i<VTPM_COMMAND_HEADER_SIZE_SRV + cmd_size; i++) { + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", dmi_cmd[i]); + } + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + } else { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error writing to DMI. Aborting... \n"); + close(dmi_res->guest_tx_fh); + dmi_res->guest_tx_fh = -1; + status = TPM_IOERROR; + goto abort_with_error; + } + free(dmi_cmd); + } else { + dmi_cmd_size = VTPM_COMMAND_HEADER_SIZE_SRV; + size_write = write(dmi_res->guest_tx_fh, cmd_header, VTPM_COMMAND_HEADER_SIZE_SRV ); + if (size_write > 0) { + for (i=0; i<VTPM_COMMAND_HEADER_SIZE_SRV; i++) + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", cmd_header[i]); + + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + } else { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error writing to DMI. Aborting... \n"); + close(dmi_res->guest_tx_fh); + dmi_res->guest_tx_fh = -1; + status = TPM_IOERROR; + goto abort_with_error; + } + } + + if (size_write != (int) dmi_cmd_size) + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Could not write entire command to DMI (%d/%d)\n", size_write, dmi_cmd_size); + buffer_free(command_buf); + + // Open vtpm_globals->guest_rx_fh to receive DMI response + if (vtpm_globals->guest_rx_fh < 0) + vtpm_globals->guest_rx_fh = open(GUEST_RX_FIFO, O_RDONLY); + + // Handle open failures + if (vtpm_globals->guest_rx_fh < 0){ + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Can't open inbound fh to dmi.\n"); + status = TPM_IOERROR; + goto abort_with_error; + } + + // Read header for response to TPM command from DMI + size_read = read( vtpm_globals->guest_rx_fh, cmd_header, VTPM_COMMAND_HEADER_SIZE_SRV); + if (size_read > 0) { + vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "RECV (DMI): 0x"); + for (i=0; i<size_read; i++) + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", cmd_header[i]); + + } else { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error reading from DMI. Aborting... \n"); + close(vtpm_globals->guest_rx_fh); + vtpm_globals->guest_rx_fh = -1; + status = TPM_IOERROR; + goto abort_with_error; + } + + if (size_read < (int) VTPM_COMMAND_HEADER_SIZE_SRV) { + //vtpmdeepsublog("\n"); + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Command from DMI shorter than normal header. Aborting...\n"); + status = TPM_IOERROR; + goto abort_with_error; + } + + // Unpack response from DMI for TPM command + BSG_UnpackList(cmd_header, 4, + BSG_TYPE_UINT32, &dmi, + BSG_TPM_TAG, &tag, + BSG_TYPE_UINT32, &in_param_size, + BSG_TPM_COMMAND_CODE, &status ); + + // If response has parameters, read them. + // Note that in_param_size is in the client's context + cmd_size = in_param_size - VTPM_COMMAND_HEADER_SIZE_CLT; + if (cmd_size > 0) { + in_param = (BYTE *) malloc(cmd_size); + size_read = read( vtpm_globals->guest_rx_fh, in_param, cmd_size); + if (size_read > 0) { + for (i=0; i<size_read; i++) + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", in_param[i]); + + } else { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error reading from BE. Aborting... \n"); + close(vtpm_globals->guest_rx_fh); + vtpm_globals->guest_rx_fh = -1; + status = TPM_IOERROR; + goto abort_with_error; + } + vtpmhandlerloginfomore(VTPM_LOG_VTPM, "\n"); + + if (size_read < (int)cmd_size) { + vtpmhandlerloginfomore(VTPM_LOG_VTPM, "\n"); + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Command read(%d) from DMI is shorter than header indicates(%d). Aborting...\n", size_read, cmd_size); + status = TPM_IOERROR; + goto abort_with_error; + } + } else { + in_param = NULL; + vtpmhandlerloginfomore(VTPM_LOG_VTPM, "\n"); + } + + if (buffer_init_convert(result_buf, cmd_size, in_param) != TPM_SUCCESS) { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Failed to setup buffers. Aborting...\n"); + status = TPM_FAIL; + goto abort_with_error; + } + + vtpmhandlerloginfo(VTPM_LOG_VTPM, "Sending DMI's response to guest.\n"); + } // end else for if (dmi==0) + + } else { // This is a DMI lister thread. Thus this is from a DMI + // Raw TPM commands are not supported from the DMI + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Attempt to use unsupported direct access to TPM.\n"); + vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "Bad Command. dmi:%d, tag:%d, size:%d, ord:%d, Params: ", dmi, tag, in_param_size, ord); + for (i=0; i<cmd_size; i++) + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", in_param[i]); + + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + + status = TPM_FAIL; + } // end else for if BE Listener +#endif + + } // end else for is VTPM Command + + // This marks the beginning of preparing response to be sent out. + // Errors while handling responses jump here to reply with error messages + // NOTE: Currently there are no recoverable errors in multi-VM mode. If one + // is added to the code, this ifdef should be removed. + // Also note this is NOT referring to errors in commands, but rather + // this is about I/O errors and such. +#ifndef VTPM_MULTI_VM + abort_with_error: +#endif + + // Open tx_fh in preperation to send reponse back + if (*tx_fh < 0) { +#ifdef VTPM_MULTI_VM + *tx_fh = open(VTPM_BE_DEV, O_RDWR); +#else + if (threadType == BE_LISTENER_THREAD) + #ifdef DUMMY_BACKEND + *tx_fh = open("/tmp/out.fifo", O_RDWR); + #else + *tx_fh = open(VTPM_BE_DEV, O_RDWR); + #endif + else // DMI Listener + *tx_fh = open(dmi_res->vtpm_tx_fname, O_WRONLY); +#endif + } + + + // Handle failed open + if (*tx_fh < 0) { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "VTPM ERROR: Can't open outbound fh.\n"); +#ifdef VTPM_MULTI_VM + return TPM_IOERROR; +#else + *ret_value = TPM_IOERROR; + pthread_exit(ret_value); +#endif + } + + // Prepend VTPM header with destination DM stamped + out_param_size = buffer_len(result_buf); + out_message_size = VTPM_COMMAND_HEADER_SIZE_CLT + out_param_size; + out_message_size_full = VTPM_COMMAND_HEADER_SIZE_SRV + out_param_size; + out_message = (BYTE *) malloc (out_message_size_full); + + BSG_PackList(out_message, 4, + BSG_TYPE_UINT32, (BYTE *) &dmi, + BSG_TPM_TAG, (BYTE *) &tag, + BSG_TYPE_UINT32, (BYTE *) &out_message_size, + BSG_TPM_RESULT, (BYTE *) &status); + + if (buffer_len(result_buf) > 0) + memcpy(out_message + VTPM_COMMAND_HEADER_SIZE_SRV, result_buf->bytes, out_param_size); + + + //Note: Send message + dmi_id + size_write = write(*tx_fh, out_message, out_message_size_full ); + if (size_write > 0) { + vtpmhandlerloginfo(VTPM_LOG_VTPM_DEEP, "SENT: 0x"); + for (i=0; i < out_message_size_full; i++) + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "%x ", out_message[i]); + + vtpmhandlerloginfomore(VTPM_LOG_VTPM_DEEP, "\n"); + } else { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Error writing to BE. Aborting... \n"); + close(*tx_fh); + *tx_fh = -1; + goto abort_command; + } + free(out_message); + + if (size_write < (int)out_message_size_full) { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "Unable to write full command to BE (%d/%d)\n", size_write, out_message_size_full); + goto abort_command; + } + + // On certain failures an error message cannot be sent. + // This marks the beginning of cleanup in preperation for the next command. + abort_command: + //free buffers + bzero(cmd_header, VTPM_COMMAND_HEADER_SIZE_SRV); + //free(in_param); // This was converted to command_buf. No need to free + if (command_buf != result_buf) + buffer_free(result_buf); + + buffer_free(command_buf); + +#ifndef VTPM_MULTI_VM + if (threadType != BE_LISTENER_THREAD) { +#endif + if ( (vtpm_globals->DMI_table_dirty) && + (VTPM_SaveService() != TPM_SUCCESS) ) { + vtpmhandlerlogerror(VTPM_LOG_VTPM, "ERROR: Unable to save manager data.\n"); + } +#ifndef VTPM_MULTI_VM + } +#endif + + } // End while(1) + +} + + +/////////////////////////////////////////////////////////////////////////////// +TPM_RESULT VTPM_Init_Service() { + TPM_RESULT status = TPM_FAIL; + BYTE *randomsead; + UINT32 randomsize; + + if ((vtpm_globals = (VTPM_GLOBALS *) malloc(sizeof(VTPM_GLOBALS))) == NULL){ + status = TPM_FAIL; + goto abort_egress; + } + memset(vtpm_globals, 0, sizeof(VTPM_GLOBALS)); + vtpm_globals->be_fh = -1; + +#ifndef VTPM_MULTI_VM + vtpm_globals->vtpm_rx_fh = -1; + vtpm_globals->guest_rx_fh = -1; +#endif + if ((vtpm_globals->dmi_map = create_hashtable(10, hashfunc32, equals32)) == NULL){ + status = TPM_FAIL; + goto abort_egress; + } + + vtpm_globals->DMI_table_dirty = FALSE; + + // Create new TCS Object + vtpm_globals->manager_tcs_handle = 0; + + TPMTRYRETURN(TCS_create()); + + // Create TCS Context for service + TPMTRYRETURN( TCS_OpenContext(&vtpm_globals->manager_tcs_handle ) ); + + TPMTRYRETURN( TCSP_GetRandom(vtpm_globals->manager_tcs_handle, + &randomsize, + &randomsead)); + + Crypto_Init(randomsead, randomsize); + TPMTRYRETURN( TCS_FreeMemory (vtpm_globals->manager_tcs_handle, randomsead)); + + // Create OIAP session for service's authorized commands + TPMTRYRETURN( VTSP_OIAP( vtpm_globals->manager_tcs_handle, + &vtpm_globals->keyAuth) ); + vtpm_globals->keyAuth.fContinueAuthSession = TRUE; + + // If failed, create new Service. + if (VTPM_LoadService() != TPM_SUCCESS) + TPMTRYRETURN( VTPM_Create_Service() ); + + //Load Storage Key + TPMTRYRETURN( VTSP_LoadKey( vtpm_globals->manager_tcs_handle, + TPM_SRK_KEYHANDLE, + &vtpm_globals->storageKeyWrap, + (const TPM_AUTHDATA*)&vtpm_globals->srk_usage_auth, + &vtpm_globals->storageKeyHandle, + &vtpm_globals->keyAuth, + &vtpm_globals->storageKey) ); + + // Create entry for Dom0 for control messages + TPMTRYRETURN( VTPM_Handle_New_DMI(NULL) ); + + // --------------------- Command handlers --------------------------- + + goto egress; + + abort_egress: + egress: + + return(status); +} + +void VTPM_Stop_Service() { + VTPM_DMI_RESOURCE *dmi_res; + struct hashtable_itr *dmi_itr; + + // Close all the TCS contexts. TCS should evict keys based on this + if (hashtable_count(vtpm_globals->dmi_map) > 0) { + dmi_itr = hashtable_iterator(vtpm_globals->dmi_map); + do { + dmi_res = (VTPM_DMI_RESOURCE *) hashtable_iterator_value(dmi_itr); + if (dmi_res->connected) + close_dmi( dmi_res ); // Not really interested in return code + + } while (hashtable_iterator_advance(dmi_itr)); + free (dmi_itr); + } + + + TCS_CloseContext(vtpm_globals->manager_tcs_handle); + + if ( (vtpm_globals->DMI_table_dirty) && + (VTPM_SaveService() != TPM_SUCCESS) ) + vtpmlogerror(VTPM_LOG_VTPM, "Unable to save manager data.\n"); + + hashtable_destroy(vtpm_globals->dmi_map, 1); + free(vtpm_globals); + + close(vtpm_globals->be_fh); + Crypto_Exit(); + + vtpmloginfo(VTPM_LOG_VTPM, "VTPM Manager stopped.\n"); +} diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/manager/vtpmpriv.h --- a/tools/vtpm_manager/manager/vtpmpriv.h Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/manager/vtpmpriv.h Tue Sep 20 09:43:46 2005 @@ -47,8 +47,8 @@ #define STATE_FILE "/var/vtpm/VTPM" #define DMI_NVM_FILE "/var/vtpm/vtpm_dm_%d.data" -#define VTPM_BE_DEV "/dev/vtpm" -#define VTPM_CTL_DM 0 +#define VTPM_BE_DEV "/dev/vtpm0" +#define VTPM_CTL_DM 0 #ifndef VTPM_MUTLI_VM #include <sys/types.h> diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/tcs/Makefile --- a/tools/vtpm_manager/tcs/Makefile Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/tcs/Makefile Tue Sep 20 09:43:46 2005 @@ -13,6 +13,7 @@ rm -f *.a *.so *.o *.rpm $(DEP_FILES) mrproper: clean + rm -f *~ $(BIN): $(OBJS) $(AR) rcs $(BIN) $(OBJS) diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/tcs/contextmgr.c --- a/tools/vtpm_manager/tcs/contextmgr.c Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/tcs/contextmgr.c Tue Sep 20 09:43:46 2005 @@ -43,6 +43,7 @@ #include "tcs.h" #include "contextmgr.h" #include "log.h" +#include "hashtable.h" BYTE* AddMemBlock(CONTEXT_HANDLE* pContextHandle, // in int BlockSize) { // in @@ -131,12 +132,14 @@ return bFound; } -BOOL AddHandleToList(CONTEXT_HANDLE* pContextHandle, // in +BOOL AddHandleToList(TCS_CONTEXT_HANDLE hContext, // in TPM_RESOURCE_TYPE type, // in TPM_HANDLE handle) { // in HANDLE_LIST* pNewHandle = NULL; - + vtpmloginfo(VTPM_LOG_TCS_DEEP, "Adding Handle to list\n"); + CONTEXT_HANDLE* pContextHandle = LookupContext(hContext); + if (pContextHandle == NULL) return 0; @@ -154,11 +157,13 @@ return 1; } -BOOL DeleteHandleFromList( CONTEXT_HANDLE* pContextHandle, // in +BOOL DeleteHandleFromList( TCS_CONTEXT_HANDLE hContext, // in TPM_HANDLE handle) { // in + CONTEXT_HANDLE* pContextHandle = LookupContext(hContext); + HANDLE_LIST *pCurrentHandle = pContextHandle->pHandleList, - *pLastHandle = pCurrentHandle; + *pLastHandle = pCurrentHandle; vtpmloginfo(VTPM_LOG_TCS_DEEP, "Deleting Handle from list\n"); @@ -202,10 +207,10 @@ switch (pCurrentHandle->type) { case TPM_RT_KEY: - returncode = returncode && !TCSP_EvictKey((TCS_CONTEXT_HANDLE) pContextHandle, pCurrentHandle->handle); + returncode = returncode && !TCSP_EvictKey(pContextHandle->handle, pCurrentHandle->handle); break; case TPM_RT_AUTH: - returncode = returncode && !TCSP_TerminateHandle((TCS_CONTEXT_HANDLE) pContextHandle, pCurrentHandle->handle); + returncode = returncode && !TCSP_TerminateHandle(pContextHandle->handle, pCurrentHandle->handle); break; default: returncode = FALSE; diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/tcs/contextmgr.h --- a/tools/vtpm_manager/tcs/contextmgr.h Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/tcs/contextmgr.h Tue Sep 20 09:43:46 2005 @@ -57,6 +57,7 @@ } HANDLE_LIST; typedef struct context_handle { + TCS_CONTEXT_HANDLE handle; int nBlockCount; BLOCK* pTopBlock; HANDLE_LIST* pHandleList; @@ -69,11 +70,11 @@ BYTE* pTCPA_BYTEs); // in -BOOL AddHandleToList( CONTEXT_HANDLE* pContextHandle, // in +BOOL AddHandleToList( TCS_CONTEXT_HANDLE hContext, // in TPM_RESOURCE_TYPE type, // in TPM_HANDLE handle); // in -BOOL DeleteHandleFromList( CONTEXT_HANDLE* pContextHandle, // in +BOOL DeleteHandleFromList( TCS_CONTEXT_HANDLE hContext, // in TPM_HANDLE handle); // in BOOL FreeHandleList( CONTEXT_HANDLE* pContextHandle); // in diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/tcs/tcs.c --- a/tools/vtpm_manager/tcs/tcs.c Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/tcs/tcs.c Tue Sep 20 09:43:46 2005 @@ -47,9 +47,10 @@ #include "contextmgr.h" #include "tpmddl.h" #include "log.h" +#include "hashtable.h" +#include "hashtable_itr.h" // Static Global Vars for the TCS -static BOOL TCS_m_bConnected; static int TCS_m_nCount = 0; #define TCPA_MAX_BUFFER_LENGTH 0x2000 @@ -57,6 +58,21 @@ static BYTE InBuf [TCPA_MAX_BUFFER_LENGTH]; static BYTE OutBuf[TCPA_MAX_BUFFER_LENGTH]; +struct hashtable *context_ht; + +// -------------------------- Hash table functions -------------------- + +static unsigned int hashfunc32(void *ky) { + return (* (UINT32 *) ky); +} + +static int equals32(void *k1, void *k2) { + return (*(UINT32 *) k1 == *(UINT32 *) k2); +} + +CONTEXT_HANDLE *LookupContext( TCS_CONTEXT_HANDLE hContext) { + return( (CONTEXT_HANDLE *) hashtable_search(context_ht, &hContext) ); +} // --------------------------------------------------------------------------------- // Initialization/Uninitialization SubComponent API @@ -64,34 +80,50 @@ TPM_RESULT TCS_create() { TDDL_RESULT hRes = TDDL_E_FAIL; TPM_RESULT result = TPM_FAIL; - TCS_m_bConnected = FALSE; if (TCS_m_nCount == 0) { vtpmloginfo(VTPM_LOG_TCS, "Constructing new TCS:\n"); hRes = TDDL_Open(); - - if (hRes == TDDL_SUCCESS) { - TCS_m_bConnected = TRUE; + + context_ht = create_hashtable(10, hashfunc32, equals32); + + if ((hRes == TDDL_SUCCESS) && (context_ht != NULL)) { result = TPM_SUCCESS; + TCS_m_nCount++; + } else { + result = TPM_IOERROR; + hashtable_destroy(context_ht, 1); } } else - TCS_m_bConnected = TRUE; - - TCS_m_nCount++; - + TCS_m_nCount++; + return(result); } void TCS_destroy() { - // FIXME: Should iterate through all open contexts and close them. TCS_m_nCount--; - if (TCS_m_bConnected == TRUE && TCS_m_nCount == 0) { + if (TCS_m_nCount == 0) { vtpmloginfo(VTPM_LOG_TCS, "Destructing TCS:\n"); TDDL_Close(); - TCS_m_bConnected = FALSE; + + struct hashtable_itr *context_itr; + TCS_CONTEXT_HANDLE *hContext; + + // Close all the TCS contexts. TCS should evict keys based on this + if (hashtable_count(context_ht) > 0) { + context_itr = hashtable_iterator(context_ht); + do { + hContext = (TCS_CONTEXT_HANDLE *) hashtable_iterator_key(context_itr); + if (TCS_CloseContext(*hContext) != TPM_SUCCESS) + vtpmlogerror(VTPM_LOG_TCS, "Failed to close context %d properly.\n", *hContext); + + } while (hashtable_iterator_advance(context_itr)); + free(context_itr); + } + hashtable_destroy(context_ht, 1); } } @@ -101,7 +133,7 @@ BYTE** ppMemPtr) {// out TPM_RESULT returnCode = TPM_FAIL; - CONTEXT_HANDLE* pContextHandle = (CONTEXT_HANDLE*)hContext; + CONTEXT_HANDLE* pContextHandle = LookupContext(hContext); if (pContextHandle != NULL && ppMemPtr != NULL) { *ppMemPtr = (BYTE *)AddMemBlock(pContextHandle, MemSize); @@ -114,7 +146,7 @@ TPM_RESULT TCS_FreeMemory( TCS_CONTEXT_HANDLE hContext, // in BYTE* pMemory) { // in TPM_RESULT returnCode = TPM_FAIL; - CONTEXT_HANDLE* pContextHandle = (CONTEXT_HANDLE*)hContext; + CONTEXT_HANDLE* pContextHandle = LookupContext(hContext); if ( (pContextHandle != NULL && pMemory != NULL) && (DeleteMemBlock(pContextHandle, pMemory) == TRUE) ) @@ -126,15 +158,15 @@ TPM_RESULT TCS_OpenContext(TCS_CONTEXT_HANDLE* hContext) { // out TPM_RESULT returnCode = TPM_FAIL; + TCS_CONTEXT_HANDLE *newContext; vtpmloginfo(VTPM_LOG_TCS, "Calling TCS_OpenContext:\n"); // hContext must point to a null memory context handle if(*hContext == HANDLE_NULL) { - CONTEXT_HANDLE* pContextHandle = (CONTEXT_HANDLE *)malloc(sizeof(CONTEXT_HANDLE)); + CONTEXT_HANDLE* pContextHandle = (CONTEXT_HANDLE *) malloc(sizeof(CONTEXT_HANDLE)); if (pContextHandle == NULL) return TPM_SIZE; - // initialize to 0 pContextHandle->nBlockCount = 0; @@ -144,19 +176,32 @@ // Create New Block AddMemBlock(pContextHandle, BLOCK_SIZE); - *hContext = (TCS_CONTEXT_HANDLE)pContextHandle; - returnCode = TPM_SUCCESS; + newContext = (TCS_CONTEXT_HANDLE *) malloc(sizeof(TCS_CONTEXT_HANDLE)); + *newContext = (TCS_CONTEXT_HANDLE) (((uintptr_t) pContextHandle >> 2) & 0xffffffff); + + if (hashtable_search(context_ht, &newContext) !=NULL) + *newContext += 1; + + pContextHandle->handle = *newContext; + if (!hashtable_insert(context_ht, newContext, pContextHandle)) { + free(newContext); + free(pContextHandle); + returnCode = TPM_FAIL; + } else { + *hContext = *newContext; + returnCode = TPM_SUCCESS; + } } return(returnCode); } TPM_RESULT TCS_CloseContext(TCS_CONTEXT_HANDLE hContext) {// in - //FIXME: TCS SHOULD Track track failed auths and make sure + //FIXME: TCS SHOULD Track failed auths and make sure //we don't try and re-free them here. TPM_RESULT returnCode = TPM_FAIL; - CONTEXT_HANDLE* pContextHandle = (CONTEXT_HANDLE*)hContext; + CONTEXT_HANDLE* pContextHandle = LookupContext(hContext); if(pContextHandle != NULL) { // Print test info @@ -171,6 +216,9 @@ vtpmlogerror(VTPM_LOG_TCS, "Not all handles evicted from TPM.\n"); // Release the TPM's resources + if (hashtable_remove(context_ht, &hContext) == NULL) + vtpmlogerror(VTPM_LOG_TCS, "Not all handles evicted from TPM.\n"); + free(pContextHandle); returnCode = TPM_SUCCESS; } @@ -255,7 +303,7 @@ BSG_TYPE_UINT32, authHandle, BSG_TPM_NONCE, nonce0); - if (!AddHandleToList((CONTEXT_HANDLE *)hContext, TPM_RT_AUTH, *authHandle)) + if (!AddHandleToList(hContext, TPM_RT_AUTH, *authHandle)) vtpmlogerror(VTPM_LOG_TCS, "New AuthHandle not recorded\n"); vtpmloginfo(VTPM_LOG_TCS_DEEP, "Received paramSize : %d\n", paramSize); @@ -321,7 +369,7 @@ BSG_TPM_NONCE, nonceEven, BSG_TPM_NONCE, nonceEvenOSAP); - if (!AddHandleToList((CONTEXT_HANDLE *)hContext, TPM_RT_AUTH, *authHandle)) { + if (!AddHandleToList(hContext, TPM_RT_AUTH, *authHandle)) { vtpmlogerror(VTPM_LOG_TCS, "New AuthHandle not recorded\n"); } @@ -498,7 +546,7 @@ BSG_TYPE_UINT32, ¶mSize, BSG_TPM_COMMAND_CODE, &returnCode); - if (!DeleteHandleFromList((CONTEXT_HANDLE *)hContext, handle)) + if (!DeleteHandleFromList(hContext, handle)) vtpmlogerror(VTPM_LOG_TCS, "KeyHandle not removed from list\n"); @@ -897,7 +945,7 @@ phKeyTCSI); unpackAuth(pAuth, OutBuf+i); - if (!AddHandleToList((CONTEXT_HANDLE *)hContext, TPM_RT_KEY, *phKeyTCSI)) { + if (!AddHandleToList(hContext, TPM_RT_KEY, *phKeyTCSI)) { vtpmlogerror(VTPM_LOG_TCS, "New KeyHandle not recorded\n"); } @@ -942,7 +990,7 @@ BSG_TYPE_UINT32, ¶mSize, BSG_TPM_COMMAND_CODE, &returnCode); - if (!DeleteHandleFromList((CONTEXT_HANDLE *)hContext, hKey)) { + if (!DeleteHandleFromList(hContext, hKey)) { vtpmlogerror(VTPM_LOG_TCS, "KeyHandle not removed from list\n"); } diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/tcs/tcs.h --- a/tools/vtpm_manager/tcs/tcs.h Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/tcs/tcs.h Tue Sep 20 09:43:46 2005 @@ -41,6 +41,7 @@ #define __TCS_H__ #include "tcg.h" +#include "contextmgr.h" #include "buffer.h" #define HANDLE_NULL 0 @@ -235,4 +236,7 @@ UINT32 *outDataSize,// in/out BYTE *outData); // out +///////////// Private Functions //////////////////// +CONTEXT_HANDLE* LookupContext( TCS_CONTEXT_HANDLE hContext); + #endif //TCS_H diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/tcs/transmit.c --- a/tools/vtpm_manager/tcs/transmit.c Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/tcs/transmit.c Tue Sep 20 09:43:46 2005 @@ -69,7 +69,7 @@ ERRORDIE (TPM_IOERROR); } else if ((TDDL_UINT32) size < insize) { - vtpmlogerror(VTPM_LOG_TXDATA, "Wrote %d instead of %d bytes!\n", size, insize); + vtpmlogerror(VTPM_LOG_TXDATA, "Wrote %d instead of %d bytes!\n", (int) size, insize); // ... ? } diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/util/Makefile --- a/tools/vtpm_manager/util/Makefile Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/util/Makefile Tue Sep 20 09:43:46 2005 @@ -13,6 +13,7 @@ rm -f *.a *.so *.o *.rpm $(DEP_FILES) mrproper: clean + rm -f *~ $(BIN): $(OBJS) $(AR) rcs $(BIN) $(OBJS) diff -r a6b72464a042 -r f8e7af29daa1 tools/vtpm_manager/util/tcg.h --- a/tools/vtpm_manager/util/tcg.h Tue Sep 20 09:43:29 2005 +++ b/tools/vtpm_manager/util/tcg.h Tue Sep 20 09:43:46 2005 @@ -453,14 +453,14 @@ // DEPENDS: local var 'status' of type TPM_RESULT // DEPENDS: label 'abort_egress' which cleans up and returns the status #define ERRORDIE(s) do { status = s; \ - fprintf (stderr, "*** ERRORDIE in %s, line %i\n", __func__, __LINE__); \ + fprintf (stderr, "*** ERRORDIE in %s at %s: %i\n", __func__, __FILE__, __LINE__); \ goto abort_egress; } \ while (0) // ASSUME: the return value used after the abort_egress label has been set // already (eg. the 'status' local var) #define STATUSCHECK(s) if (s != TPM_SUCCESS) { \ - fprintf (stderr, "*** ERR in %s, line %i\n", __func__, __LINE__); \ + fprintf (stderr, "*** ERR in %s at %s:%i\n", __func__, __FILE__, __LINE__); \ goto abort_egress; \ } @@ -475,7 +475,7 @@ // Try command c. If it fails, print error message, set status to actual return code. Goto shame #define TPMTRYRETURN(c) do { status = c; \ if (status != TPM_SUCCESS) { \ - printf("ERROR in %s:%i code: %s.\n", __func__, __LINE__, tpm_get_error_name(status)); \ + printf("ERROR in %s at %s:%i code: %s.\n", __func__, __FILE__, __LINE__, tpm_get_error_name(status)); \ goto abort_egress; \ } \ } while(0) diff -r a6b72464a042 -r f8e7af29daa1 xen/arch/x86/vmx_platform.c --- a/xen/arch/x86/vmx_platform.c Tue Sep 20 09:43:29 2005 +++ b/xen/arch/x86/vmx_platform.c Tue Sep 20 09:43:46 2005 @@ -671,13 +671,13 @@ if (inst->operand[0] & REGISTER) { /* dest is memory */ index = operand_index(inst->operand[0]); value = get_reg_value(size, index, 0, regs); - send_mmio_req(type, gpa, 1, size, value, IOREQ_WRITE, 0); + send_mmio_req(type, gpa, 1, inst->op_size, value, IOREQ_WRITE, 0); } else if (inst->operand[0] & IMMEDIATE) { /* dest is memory */ value = inst->immediate; - send_mmio_req(type, gpa, 1, size, value, IOREQ_WRITE, 0); + send_mmio_req(type, gpa, 1, inst->op_size, value, IOREQ_WRITE, 0); } else if (inst->operand[0] & MEMORY) { /* dest is register */ /* send the request and wait for the value */ - send_mmio_req(type, gpa, 1, size, 0, IOREQ_READ, 0); + send_mmio_req(type, gpa, 1, inst->op_size, 0, IOREQ_READ, 0); } else { printf("mmio_operands: invalid operand\n"); domain_crash_synchronous(); diff -r a6b72464a042 -r f8e7af29daa1 docs/src/interface/architecture.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/interface/architecture.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,140 @@ +\chapter{Virtual Architecture} + +On a Xen-based system, the hypervisor itself runs in {\it ring 0}. It +has full access to the physical memory available in the system and is +responsible for allocating portions of it to the domains. Guest +operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as +they see fit. Segmentation is used to prevent the guest OS from +accessing the portion of the address space that is reserved for Xen. +We expect most guest operating systems will use ring 1 for their own +operation and place applications in ring 3. + +In this chapter we consider the basic virtual architecture provided by +Xen: the basic CPU state, exception and interrupt handling, and time. +Other aspects such as memory and device access are discussed in later +chapters. + + +\section{CPU state} + +All privileged state must be handled by Xen. The guest OS has no +direct access to CR3 and is not permitted to update privileged bits in +EFLAGS. Guest OSes use \emph{hypercalls} to invoke operations in Xen; +these are analogous to system calls but occur from ring 1 to ring 0. + +A list of all hypercalls is given in Appendix~\ref{a:hypercalls}. + + +\section{Exceptions} + +A virtual IDT is provided --- a domain can submit a table of trap +handlers to Xen via the {\tt set\_trap\_table()} hypercall. Most trap +handlers are identical to native x86 handlers, although the page-fault +handler is somewhat different. + + +\section{Interrupts and events} + +Interrupts are virtualized by mapping them to \emph{events}, which are +delivered asynchronously to the target domain using a callback +supplied via the {\tt set\_callbacks()} hypercall. A guest OS can map +these events onto its standard interrupt dispatch mechanisms. Xen is +responsible for determining the target domain that will handle each +physical interrupt source. For more details on the binding of event +sources to events, see Chapter~\ref{c:devices}. + + +\section{Time} + +Guest operating systems need to be aware of the passage of both real +(or wallclock) time and their own `virtual time' (the time for which +they have been executing). Furthermore, Xen has a notion of time which +is used for scheduling. The following notions of time are provided: + +\begin{description} +\item[Cycle counter time.] + + This provides a fine-grained time reference. The cycle counter time + is used to accurately extrapolate the other time references. On SMP + machines it is currently assumed that the cycle counter time is + synchronized between CPUs. The current x86-based implementation + achieves this within inter-CPU communication latencies. + +\item[System time.] + + This is a 64-bit counter which holds the number of nanoseconds that + have elapsed since system boot. + +\item[Wall clock time.] + + This is the time of day in a Unix-style {\tt struct timeval} + (seconds and microseconds since 1 January 1970, adjusted by leap + seconds). An NTP client hosted by {\it domain 0} can keep this + value accurate. + +\item[Domain virtual time.] + + This progresses at the same pace as system time, but only while a + domain is executing --- it stops while a domain is de-scheduled. + Therefore the share of the CPU that a domain receives is indicated + by the rate at which its virtual time increases. + +\end{description} + + +Xen exports timestamps for system time and wall-clock time to guest +operating systems through a shared page of memory. Xen also provides +the cycle counter time at the instant the timestamps were calculated, +and the CPU frequency in Hertz. This allows the guest to extrapolate +system and wall-clock times accurately based on the current cycle +counter time. + +Since all time stamps need to be updated and read \emph{atomically} +two version numbers are also stored in the shared info page. The first +is incremented prior to an update, while the second is only +incremented afterwards. Thus a guest can be sure that it read a +consistent state by checking the two version numbers are equal. + +Xen includes a periodic ticker which sends a timer event to the +currently executing domain every 10ms. The Xen scheduler also sends a +timer event whenever a domain is scheduled; this allows the guest OS +to adjust for the time that has passed while it has been inactive. In +addition, Xen allows each domain to request that they receive a timer +event sent at a specified system time by using the {\tt + set\_timer\_op()} hypercall. Guest OSes may use this timer to +implement timeout values when they block. + + + +%% % akw: demoting this to a section -- not sure if there is any point +%% % though, maybe just remove it. + +\section{Xen CPU Scheduling} + +Xen offers a uniform API for CPU schedulers. It is possible to choose +from a number of schedulers at boot and it should be easy to add more. +The BVT, Atropos and Round Robin schedulers are part of the normal Xen +distribution. BVT provides proportional fair shares of the CPU to the +running domains. Atropos can be used to reserve absolute shares of +the CPU for each domain. Round-robin is provided as an example of +Xen's internal scheduler API. + +\paragraph*{Note: SMP host support} +Xen has always supported SMP host systems. Domains are statically +assigned to CPUs, either at creation time or when manually pinning to +a particular CPU. The current schedulers then run locally on each CPU +to decide which of the assigned domains should be run there. The +user-level control software can be used to perform coarse-grain +load-balancing between CPUs. + + +%% More information on the characteristics and use of these schedulers +%% is available in {\tt Sched-HOWTO.txt}. + + +\section{Privileged operations} + +Xen exports an extended interface to privileged domains (viz.\ {\it + Domain 0}). This allows such domains to build and boot other domains +on the server, and provides control interfaces for managing +scheduling, memory, networking, and block devices. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/interface/debugging.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/interface/debugging.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,62 @@ +\chapter{Debugging} + +Xen provides tools for debugging both Xen and guest OSes. Currently, the +Pervasive Debugger provides a GDB stub, which provides facilities for symbolic +debugging of Xen itself and of OS kernels running on top of Xen. The Trace +Buffer provides a lightweight means to log data about Xen's internal state and +behaviour at runtime, for later analysis. + +\section{Pervasive Debugger} + +Information on using the pervasive debugger is available in pdb.txt. + + +\section{Trace Buffer} + +The trace buffer provides a means to observe Xen's operation from domain 0. +Trace events, inserted at key points in Xen's code, record data that can be +read by the {\tt xentrace} tool. Recording these events has a low overhead +and hence the trace buffer may be useful for debugging timing-sensitive +behaviours. + +\subsection{Internal API} + +To use the trace buffer functionality from within Xen, you must {\tt \#include +<xen/trace.h>}, which contains definitions related to the trace buffer. Trace +events are inserted into the buffer using the {\tt TRACE\_xD} ({\tt x} = 0, 1, +2, 3, 4 or 5) macros. These all take an event number, plus {\tt x} additional +(32-bit) data as their arguments. For trace buffer-enabled builds of Xen these +will insert the event ID and data into the trace buffer, along with the current +value of the CPU cycle-counter. For builds without the trace buffer enabled, +the macros expand to no-ops and thus can be left in place without incurring +overheads. + +\subsection{Trace-enabled builds} + +By default, the trace buffer is enabled only in debug builds (i.e. {\tt NDEBUG} +is not defined). It can be enabled separately by defining {\tt TRACE\_BUFFER}, +either in {\tt <xen/config.h>} or on the gcc command line. + +The size (in pages) of the per-CPU trace buffers can be specified using the +{\tt tbuf\_size=n } boot parameter to Xen. If the size is set to 0, the trace +buffers will be disabled. + +\subsection{Dumping trace data} + +When running a trace buffer build of Xen, trace data are written continuously +into the buffer data areas, with newer data overwriting older data. This data +can be captured using the {\tt xentrace} program in domain 0. + +The {\tt xentrace} tool uses {\tt /dev/mem} in domain 0 to map the trace +buffers into its address space. It then periodically polls all the buffers for +new data, dumping out any new records from each buffer in turn. As a result, +for machines with multiple (logical) CPUs, the trace buffer output will not be +in overall chronological order. + +The output from {\tt xentrace} can be post-processed using {\tt +xentrace\_cpusplit} (used to split trace data out into per-cpu log files) and +{\tt xentrace\_format} (used to pretty-print trace data). For the predefined +trace points, there is an example format file in {\tt tools/xentrace/formats }. + +For more information, see the manual pages for {\tt xentrace}, {\tt +xentrace\_format} and {\tt xentrace\_cpusplit}. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/interface/devices.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/interface/devices.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,178 @@ +\chapter{Devices} +\label{c:devices} + +Devices such as network and disk are exported to guests using a split +device driver. The device driver domain, which accesses the physical +device directly also runs a \emph{backend} driver, serving requests to +that device from guests. Each guest will use a simple \emph{frontend} +driver, to access the backend. Communication between these domains is +composed of two parts: First, data is placed onto a shared memory page +between the domains. Second, an event channel between the two domains +is used to pass notification that data is outstanding. This +separation of notification from data transfer allows message batching, +and results in very efficient device access. + +Event channels are used extensively in device virtualization; each +domain has a number of end-points or \emph{ports} each of which may be +bound to one of the following \emph{event sources}: +\begin{itemize} + \item a physical interrupt from a real device, + \item a virtual interrupt (callback) from Xen, or + \item a signal from another domain +\end{itemize} + +Events are lightweight and do not carry much information beyond the +source of the notification. Hence when performing bulk data transfer, +events are typically used as synchronization primitives over a shared +memory transport. Event channels are managed via the {\tt + event\_channel\_op()} hypercall; for more details see +Section~\ref{s:idc}. + +This chapter focuses on some individual device interfaces available to +Xen guests. + + +\section{Network I/O} + +Virtual network device services are provided by shared memory +communication with a backend domain. From the point of view of other +domains, the backend may be viewed as a virtual ethernet switch +element with each domain having one or more virtual network interfaces +connected to it. + +\subsection{Backend Packet Handling} + +The backend driver is responsible for a variety of actions relating to +the transmission and reception of packets from the physical device. +With regard to transmission, the backend performs these key actions: + +\begin{itemize} +\item {\bf Validation:} To ensure that domains do not attempt to + generate invalid (e.g. spoofed) traffic, the backend driver may + validate headers ensuring that source MAC and IP addresses match the + interface that they have been sent from. + + Validation functions can be configured using standard firewall rules + ({\small{\tt iptables}} in the case of Linux). + +\item {\bf Scheduling:} Since a number of domains can share a single + physical network interface, the backend must mediate access when + several domains each have packets queued for transmission. This + general scheduling function subsumes basic shaping or rate-limiting + schemes. + +\item {\bf Logging and Accounting:} The backend domain can be + configured with classifier rules that control how packets are + accounted or logged. For example, log messages might be generated + whenever a domain attempts to send a TCP packet containing a SYN. +\end{itemize} + +On receipt of incoming packets, the backend acts as a simple +demultiplexer: Packets are passed to the appropriate virtual interface +after any necessary logging and accounting have been carried out. + +\subsection{Data Transfer} + +Each virtual interface uses two ``descriptor rings'', one for +transmit, the other for receive. Each descriptor identifies a block +of contiguous physical memory allocated to the domain. + +The transmit ring carries packets to transmit from the guest to the +backend domain. The return path of the transmit ring carries messages +indicating that the contents have been physically transmitted and the +backend no longer requires the associated pages of memory. + +To receive packets, the guest places descriptors of unused pages on +the receive ring. The backend will return received packets by +exchanging these pages in the domain's memory with new pages +containing the received data, and passing back descriptors regarding +the new packets on the ring. This zero-copy approach allows the +backend to maintain a pool of free pages to receive packets into, and +then deliver them to appropriate domains after examining their +headers. + +% Real physical addresses are used throughout, with the domain +% performing translation from pseudo-physical addresses if that is +% necessary. + +If a domain does not keep its receive ring stocked with empty buffers +then packets destined to it may be dropped. This provides some +defence against receive livelock problems because an overload domain +will cease to receive further data. Similarly, on the transmit path, +it provides the application with feedback on the rate at which packets +are able to leave the system. + +Flow control on rings is achieved by including a pair of producer +indexes on the shared ring page. Each side will maintain a private +consumer index indicating the next outstanding message. In this +manner, the domains cooperate to divide the ring into two message +lists, one in each direction. Notification is decoupled from the +immediate placement of new messages on the ring; the event channel +will be used to generate notification when {\em either} a certain +number of outstanding messages are queued, {\em or} a specified number +of nanoseconds have elapsed since the oldest message was placed on the +ring. + +%% Not sure if my version is any better -- here is what was here +%% before: Synchronization between the backend domain and the guest is +%% achieved using counters held in shared memory that is accessible to +%% both. Each ring has associated producer and consumer indices +%% indicating the area in the ring that holds descriptors that contain +%% data. After receiving {\it n} packets or {\t nanoseconds} after +%% receiving the first packet, the hypervisor sends an event to the +%% domain. + + +\section{Block I/O} + +All guest OS disk access goes through the virtual block device VBD +interface. This interface allows domains access to portions of block +storage devices visible to the the block backend device. The VBD +interface is a split driver, similar to the network interface +described above. A single shared memory ring is used between the +frontend and backend drivers, across which read and write messages are +sent. + +Any block device accessible to the backend domain, including +network-based block (iSCSI, *NBD, etc), loopback and LVM/MD devices, +can be exported as a VBD. Each VBD is mapped to a device node in the +guest, specified in the guest's startup configuration. + +Old (Xen 1.2) virtual disks are not supported under Xen 2.0, since +similar functionality can be achieved using the more complete LVM +system, which is already in widespread use. + +\subsection{Data Transfer} + +The single ring between the guest and the block backend supports three +messages: + +\begin{description} +\item [{\small {\tt PROBE}}:] Return a list of the VBDs available to + this guest from the backend. The request includes a descriptor of a + free page into which the reply will be written by the backend. + +\item [{\small {\tt READ}}:] Read data from the specified block + device. The front end identifies the device and location to read + from and attaches pages for the data to be copied to (typically via + DMA from the device). The backend acknowledges completed read + requests as they finish. + +\item [{\small {\tt WRITE}}:] Write data to the specified block + device. This functions essentially as {\small {\tt READ}}, except + that the data moves to the device instead of from it. +\end{description} + +%% um... some old text: In overview, the same style of descriptor-ring +%% that is used for network packets is used here. Each domain has one +%% ring that carries operation requests to the hypervisor and carries +%% the results back again. + +%% Rather than copying data, the backend simply maps the domain's +%% buffers in order to enable direct DMA to them. The act of mapping +%% the buffers also increases the reference counts of the underlying +%% pages, so that the unprivileged domain cannot try to return them to +%% the hypervisor, install them as page tables, or any other unsafe +%% behaviour. +%% +%% % block API here diff -r a6b72464a042 -r f8e7af29daa1 docs/src/interface/further_info.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/interface/further_info.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,49 @@ +\chapter{Further Information} + +If you have questions that are not answered by this manual, the +sources of information listed below may be of interest to you. Note +that bug reports, suggestions and contributions related to the +software (or the documentation) should be sent to the Xen developers' +mailing list (address below). + + +\section{Other documentation} + +If you are mainly interested in using (rather than developing for) +Xen, the \emph{Xen Users' Manual} is distributed in the {\tt docs/} +directory of the Xen source distribution. + +% Various HOWTOs are also available in {\tt docs/HOWTOS}. + + +\section{Online references} + +The official Xen web site is found at: +\begin{quote} +{\tt http://www.cl.cam.ac.uk/Research/SRG/netos/xen/} +\end{quote} + +This contains links to the latest versions of all on-line +documentation. + + +\section{Mailing lists} + +There are currently four official Xen mailing lists: + +\begin{description} +\item[xen-devel@xxxxxxxxxxxxxxxxxxx] Used for development + discussions and bug reports. Subscribe at: \\ + {\small {\tt http://lists.xensource.com/xen-devel}} +\item[xen-users@xxxxxxxxxxxxxxxxxxx] Used for installation and usage + discussions and requests for help. Subscribe at: \\ + {\small {\tt http://lists.xensource.com/xen-users}} +\item[xen-announce@xxxxxxxxxxxxxxxxxxx] Used for announcements only. + Subscribe at: \\ + {\small {\tt http://lists.xensource.com/xen-announce}} +\item[xen-changelog@xxxxxxxxxxxxxxxxxxx] Changelog feed + from the unstable and 2.0 trees - developer oriented. Subscribe at: \\ + {\small {\tt http://lists.xensource.com/xen-changelog}} +\end{description} + +Of these, xen-devel is the most active. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/interface/hypercalls.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/interface/hypercalls.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,524 @@ + +\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}} + +\chapter{Xen Hypercalls} +\label{a:hypercalls} + +Hypercalls represent the procedural interface to Xen; this appendix +categorizes and describes the current set of hypercalls. + +\section{Invoking Hypercalls} + +Hypercalls are invoked in a manner analogous to system calls in a +conventional operating system; a software interrupt is issued which +vectors to an entry point within Xen. On x86\_32 machines the +instruction required is {\tt int \$82}; the (real) IDT is setup so +that this may only be issued from within ring 1. The particular +hypercall to be invoked is contained in {\tt EAX} --- a list +mapping these values to symbolic hypercall names can be found +in {\tt xen/include/public/xen.h}. + +On some occasions a set of hypercalls will be required to carry +out a higher-level function; a good example is when a guest +operating wishes to context switch to a new process which +requires updating various privileged CPU state. As an optimization +for these cases, there is a generic mechanism to issue a set of +hypercalls as a batch: + +\begin{quote} +\hypercall{multicall(void *call\_list, int nr\_calls)} + +Execute a series of hypervisor calls; {\tt nr\_calls} is the length of +the array of {\tt multicall\_entry\_t} structures pointed to be {\tt +call\_list}. Each entry contains the hypercall operation code followed +by up to 7 word-sized arguments. +\end{quote} + +Note that multicalls are provided purely as an optimization; there is +no requirement to use them when first porting a guest operating +system. + + +\section{Virtual CPU Setup} + +At start of day, a guest operating system needs to setup the virtual +CPU it is executing on. This includes installing vectors for the +virtual IDT so that the guest OS can handle interrupts, page faults, +etc. However the very first thing a guest OS must setup is a pair +of hypervisor callbacks: these are the entry points which Xen will +use when it wishes to notify the guest OS of an occurrence. + +\begin{quote} +\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long + event\_address, unsigned long failsafe\_selector, unsigned long + failsafe\_address) } + +Register the normal (``event'') and failsafe callbacks for +event processing. In each case the code segment selector and +address within that segment are provided. The selectors must +have RPL 1; in XenLinux we simply use the kernel's CS for both +{\tt event\_selector} and {\tt failsafe\_selector}. + +The value {\tt event\_address} specifies the address of the guest OSes +event handling and dispatch routine; the {\tt failsafe\_address} +specifies a separate entry point which is used only if a fault occurs +when Xen attempts to use the normal callback. +\end{quote} + + +After installing the hypervisor callbacks, the guest OS can +install a `virtual IDT' by using the following hypercall: + +\begin{quote} +\hypercall{set\_trap\_table(trap\_info\_t *table)} + +Install one or more entries into the per-domain +trap handler table (essentially a software version of the IDT). +Each entry in the array pointed to by {\tt table} includes the +exception vector number with the corresponding segment selector +and entry point. Most guest OSes can use the same handlers on +Xen as when running on the real hardware; an exception is the +page fault handler (exception vector 14) where a modified +stack-frame layout is used. + + +\end{quote} + + + +\section{Scheduling and Timer} + +Domains are preemptively scheduled by Xen according to the +parameters installed by domain 0 (see Section~\ref{s:dom0ops}). +In addition, however, a domain may choose to explicitly +control certain behavior with the following hypercall: + +\begin{quote} +\hypercall{sched\_op(unsigned long op)} + +Request scheduling operation from hypervisor. The options are: {\it +yield}, {\it block}, and {\it shutdown}. {\it yield} keeps the +calling domain runnable but may cause a reschedule if other domains +are runnable. {\it block} removes the calling domain from the run +queue and cause is to sleeps until an event is delivered to it. {\it +shutdown} is used to end the domain's execution; the caller can +additionally specify whether the domain should reboot, halt or +suspend. +\end{quote} + +To aid the implementation of a process scheduler within a guest OS, +Xen provides a virtual programmable timer: + +\begin{quote} +\hypercall{set\_timer\_op(uint64\_t timeout)} + +Request a timer event to be sent at the specified system time (time +in nanoseconds since system boot). The hypercall actually passes the +64-bit timeout value as a pair of 32-bit values. + +\end{quote} + +Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op} +allows block-with-timeout semantics. + + +\section{Page Table Management} + +Since guest operating systems have read-only access to their page +tables, Xen must be involved when making any changes. The following +multi-purpose hypercall can be used to modify page-table entries, +update the machine-to-physical mapping table, flush the TLB, install +a new page-table base pointer, and more. + +\begin{quote} +\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} + +Update the page table for the domain; a set of {\tt count} updates are +submitted for processing in a batch, with {\tt success\_count} being +updated to report the number of successful updates. + +Each element of {\tt req[]} contains a pointer (address) and value; +the least significant 2-bits of the pointer are used to distinguish +the type of update requested as follows: +\begin{description} + +\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or +page table entry to the associated value; Xen will check that the +update is safe, as described in Chapter~\ref{c:memory}. + +\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the + machine-to-physical table. The calling domain must own the machine + page in question (or be privileged). + +\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations. +The set of additional MMU operations is considerable, and includes +updating {\tt cr3} (or just re-installing it for a TLB flush), +flushing the cache, installing a new LDT, or pinning \& unpinning +page-table pages (to ensure their reference count doesn't drop to zero +which would require a revalidation of all entries). + +Further extended commands are used to deal with granting and +acquiring page ownership; see Section~\ref{s:idc}. + + +\end{description} + +More details on the precise format of all commands can be +found in {\tt xen/include/public/xen.h}. + + +\end{quote} + +Explicitly updating batches of page table entries is extremely +efficient, but can require a number of alterations to the guest +OS. Using the writable page table mode (Chapter~\ref{c:memory}) is +recommended for new OS ports. + +Regardless of which page table update mode is being used, however, +there are some occasions (notably handling a demand page fault) where +a guest OS will wish to modify exactly one PTE rather than a +batch. This is catered for by the following: + +\begin{quote} +\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long +val, \\ unsigned long flags)} + +Update the currently installed PTE for the page {\tt page\_nr} to +{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification +is safe before applying it. The {\tt flags} determine which kind +of TLB flush, if any, should follow the update. + +\end{quote} + +Finally, sufficiently privileged domains may occasionally wish to manipulate +the pages of others: +\begin{quote} + +\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr, +unsigned long val, unsigned long flags, uint16\_t domid)} + +Identical to {\tt update\_va\_mapping()} save that the pages being +mapped must belong to the domain {\tt domid}. + +\end{quote} + +This privileged operation is currently used by backend virtual device +drivers to safely map pages containing I/O data. + + + +\section{Segmentation Support} + +Xen allows guest OSes to install a custom GDT if they require it; +this is context switched transparently whenever a domain is +[de]scheduled. The following hypercall is effectively a +`safe' version of {\tt lgdt}: + +\begin{quote} +\hypercall{set\_gdt(unsigned long *frame\_list, int entries)} + +Install a global descriptor table for a domain; {\tt frame\_list} is +an array of up to 16 machine page frames within which the GDT resides, +with {\tt entries} being the actual number of descriptor-entry +slots. All page frames must be mapped read-only within the guest's +address space, and the table must be large enough to contain Xen's +reserved entries (see {\tt xen/include/public/arch-x86\_32.h}). + +\end{quote} + +Many guest OSes will also wish to install LDTs; this is achieved by +using {\tt mmu\_update()} with an extended command, passing the +linear address of the LDT base along with the number of entries. No +special safety checks are required; Xen needs to perform this task +simply since {\tt lldt} requires CPL 0. + + +Xen also allows guest operating systems to update just an +individual segment descriptor in the GDT or LDT: + +\begin{quote} +\hypercall{update\_descriptor(unsigned long ma, unsigned long word1, +unsigned long word2)} + +Update the GDT/LDT entry at machine address {\tt ma}; the new +8-byte descriptor is stored in {\tt word1} and {\tt word2}. +Xen performs a number of checks to ensure the descriptor is +valid. + +\end{quote} + +Guest OSes can use the above in place of context switching entire +LDTs (or the GDT) when the number of changing descriptors is small. + +\section{Context Switching} + +When a guest OS wishes to context switch between two processes, +it can use the page table and segmentation hypercalls described +above to perform the the bulk of the privileged work. In addition, +however, it will need to invoke Xen to switch the kernel (ring 1) +stack pointer: + +\begin{quote} +\hypercall{stack\_switch(unsigned long ss, unsigned long esp)} + +Request kernel stack switch from hypervisor; {\tt ss} is the new +stack segment, which {\tt esp} is the new stack pointer. + +\end{quote} + +A final useful hypercall for context switching allows ``lazy'' +save and restore of floating point state: + +\begin{quote} +\hypercall{fpu\_taskswitch(void)} + +This call instructs Xen to set the {\tt TS} bit in the {\tt cr0} +control register; this means that the next attempt to use floating +point will cause a trap which the guest OS can trap. Typically it will +then save/restore the FP state, and clear the {\tt TS} bit. +\end{quote} + +This is provided as an optimization only; guest OSes can also choose +to save and restore FP state on all context switches for simplicity. + + +\section{Physical Memory Management} + +As mentioned previously, each domain has a maximum and current +memory allocation. The maximum allocation, set at domain creation +time, cannot be modified. However a domain can choose to reduce +and subsequently grow its current allocation by using the +following call: + +\begin{quote} +\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list, + unsigned long nr\_extents, unsigned int extent\_order)} + +Increase or decrease current memory allocation (as determined by +the value of {\tt op}). Each invocation provides a list of +extents each of which is $2^s$ pages in size, +where $s$ is the value of {\tt extent\_order}. + +\end{quote} + +In addition to simply reducing or increasing the current memory +allocation via a `balloon driver', this call is also useful for +obtaining contiguous regions of machine memory when required (e.g. +for certain PCI devices, or if using superpages). + + +\section{Inter-Domain Communication} +\label{s:idc} + +Xen provides a simple asynchronous notification mechanism via +\emph{event channels}. Each domain has a set of end-points (or +\emph{ports}) which may be bound to an event source (e.g. a physical +IRQ, a virtual IRQ, or an port in another domain). When a pair of +end-points in two different domains are bound together, then a `send' +operation on one will cause an event to be received by the destination +domain. + +The control and use of event channels involves the following hypercall: + +\begin{quote} +\hypercall{event\_channel\_op(evtchn\_op\_t *op)} + +Inter-domain event-channel management; {\tt op} is a discriminated +union which allows the following 7 operations: + +\begin{description} + +\item[\it alloc\_unbound:] allocate a free (unbound) local + port and prepare for connection from a specified domain. +\item[\it bind\_virq:] bind a local port to a virtual +IRQ; any particular VIRQ can be bound to at most one port per domain. +\item[\it bind\_pirq:] bind a local port to a physical IRQ; +once more, a given pIRQ can be bound to at most one port per +domain. Furthermore the calling domain must be sufficiently +privileged. +\item[\it bind\_interdomain:] construct an interdomain event +channel; in general, the target domain must have previously allocated +an unbound port for this channel, although this can be bypassed by +privileged domains during domain setup. +\item[\it close:] close an interdomain event channel. +\item[\it send:] send an event to the remote end of a +interdomain event channel. +\item[\it status:] determine the current status of a local port. +\end{description} + +For more details see +{\tt xen/include/public/event\_channel.h}. + +\end{quote} + +Event channels are the fundamental communication primitive between +Xen domains and seamlessly support SMP. However they provide little +bandwidth for communication {\sl per se}, and hence are typically +married with a piece of shared memory to produce effective and +high-performance inter-domain communication. + +Safe sharing of memory pages between guest OSes is carried out by +granting access on a per page basis to individual domains. This is +achieved by using the {\tt grant\_table\_op()} hypercall. + +\begin{quote} +\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)} + +Grant or remove access to a particular page to a particular domain. + +\end{quote} + +This is not currently widely in use by guest operating systems, but +we intend to integrate support more fully in the near future. + +\section{PCI Configuration} + +Domains with physical device access (i.e.\ driver domains) receive +limited access to certain PCI devices (bus address space and +interrupts). However many guest operating systems attempt to +determine the PCI configuration by directly access the PCI BIOS, +which cannot be allowed for safety. + +Instead, Xen provides the following hypercall: + +\begin{quote} +\hypercall{physdev\_op(void *physdev\_op)} + +Perform a PCI configuration option; depending on the value +of {\tt physdev\_op} this can be a PCI config read, a PCI config +write, or a small number of other queries. + +\end{quote} + + +For examples of using {\tt physdev\_op()}, see the +Xen-specific PCI code in the linux sparse tree. + +\section{Administrative Operations} +\label{s:dom0ops} + +A large number of control operations are available to a sufficiently +privileged domain (typically domain 0). These allow the creation and +management of new domains, for example. A complete list is given +below: for more details on any or all of these, please see +{\tt xen/include/public/dom0\_ops.h} + + +\begin{quote} +\hypercall{dom0\_op(dom0\_op\_t *op)} + +Administrative domain operations for domain management. The options are: + +\begin{description} +\item [\it DOM0\_CREATEDOMAIN:] create a new domain + +\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run +queue. + +\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable + once again. + +\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated +with a domain + +\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain + +\item [\it DOM0\_SCHEDCTL:] + +\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain + +\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain + +\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain + +\item [\it DOM0\_GETPAGEFRAMEINFO:] + +\item [\it DOM0\_GETPAGEFRAMEINFO2:] + +\item [\it DOM0\_IOPL:] set I/O privilege level + +\item [\it DOM0\_MSR:] read or write model specific registers + +\item [\it DOM0\_DEBUG:] interactively invoke the debugger + +\item [\it DOM0\_SETTIME:] set system time + +\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring + +\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU + +\item [\it DOM0\_GETTBUFS:] get information about the size and location of + the trace buffers (only on trace-buffer enabled builds) + +\item [\it DOM0\_PHYSINFO:] get information about the host machine + +\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions + +\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler + +\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes + +\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain + +\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain + +\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options +\end{description} +\end{quote} + +Most of the above are best understood by looking at the code +implementing them (in {\tt xen/common/dom0\_ops.c}) and in +the user-space tools that use them (mostly in {\tt tools/libxc}). + +\section{Debugging Hypercalls} + +A few additional hypercalls are mainly useful for debugging: + +\begin{quote} +\hypercall{console\_io(int cmd, int count, char *str)} + +Use Xen to interact with the console; operations are: + +{\it CONSOLEIO\_write}: Output count characters from buffer str. + +{\it CONSOLEIO\_read}: Input at most count characters into buffer str. +\end{quote} + +A pair of hypercalls allows access to the underlying debug registers: +\begin{quote} +\hypercall{set\_debugreg(int reg, unsigned long value)} + +Set debug register {\tt reg} to {\tt value} + +\hypercall{get\_debugreg(int reg)} + +Return the contents of the debug register {\tt reg} +\end{quote} + +And finally: +\begin{quote} +\hypercall{xen\_version(int cmd)} + +Request Xen version number. +\end{quote} + +This is useful to ensure that user-space tools are in sync +with the underlying hypervisor. + +\section{Deprecated Hypercalls} + +Xen is under constant development and refinement; as such there +are plans to improve the way in which various pieces of functionality +are exposed to guest OSes. + +\begin{quote} +\hypercall{vm\_assist(unsigned int cmd, unsigned int type)} + +Toggle various memory management modes (in particular wrritable page +tables and superpage support). + +\end{quote} + +This is likely to be replaced with mode values in the shared +information page since this is more resilient for resumption +after migration or checkpoint. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/interface/memory.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/interface/memory.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,162 @@ +\chapter{Memory} +\label{c:memory} + +Xen is responsible for managing the allocation of physical memory to +domains, and for ensuring safe use of the paging and segmentation +hardware. + + +\section{Memory Allocation} + +Xen resides within a small fixed portion of physical memory; it also +reserves the top 64MB of every virtual address space. The remaining +physical memory is available for allocation to domains at a page +granularity. Xen tracks the ownership and use of each page, which +allows it to enforce secure partitioning between domains. + +Each domain has a maximum and current physical memory allocation. A +guest OS may run a `balloon driver' to dynamically adjust its current +memory allocation up to its limit. + + +%% XXX SMH: I use machine and physical in the next section (which is +%% kinda required for consistency with code); wonder if this section +%% should use same terms? +%% +%% Probably. +%% +%% Merging this and below section at some point prob makes sense. + +\section{Pseudo-Physical Memory} + +Since physical memory is allocated and freed on a page granularity, +there is no guarantee that a domain will receive a contiguous stretch +of physical memory. However most operating systems do not have good +support for operating in a fragmented physical address space. To aid +porting such operating systems to run on top of Xen, we make a +distinction between \emph{machine memory} and \emph{pseudo-physical + memory}. + +Put simply, machine memory refers to the entire amount of memory +installed in the machine, including that reserved by Xen, in use by +various domains, or currently unallocated. We consider machine memory +to comprise a set of 4K \emph{machine page frames} numbered +consecutively starting from 0. Machine frame numbers mean the same +within Xen or any domain. + +Pseudo-physical memory, on the other hand, is a per-domain +abstraction. It allows a guest operating system to consider its memory +allocation to consist of a contiguous range of physical page frames +starting at physical frame 0, despite the fact that the underlying +machine page frames may be sparsely allocated and in any order. + +To achieve this, Xen maintains a globally readable {\it + machine-to-physical} table which records the mapping from machine +page frames to pseudo-physical ones. In addition, each domain is +supplied with a {\it physical-to-machine} table which performs the +inverse mapping. Clearly the machine-to-physical table has size +proportional to the amount of RAM installed in the machine, while each +physical-to-machine table has size proportional to the memory +allocation of the given domain. + +Architecture dependent code in guest operating systems can then use +the two tables to provide the abstraction of pseudo-physical memory. +In general, only certain specialized parts of the operating system +(such as page table management) needs to understand the difference +between machine and pseudo-physical addresses. + + +\section{Page Table Updates} + +In the default mode of operation, Xen enforces read-only access to +page tables and requires guest operating systems to explicitly request +any modifications. Xen validates all such requests and only applies +updates that it deems safe. This is necessary to prevent domains from +adding arbitrary mappings to their page tables. + +To aid validation, Xen associates a type and reference count with each +memory page. A page has one of the following mutually-exclusive types +at any point in time: page directory ({\sf PD}), page table ({\sf + PT}), local descriptor table ({\sf LDT}), global descriptor table +({\sf GDT}), or writable ({\sf RW}). Note that a guest OS may always +create readable mappings of its own memory regardless of its current +type. + +%%% XXX: possibly explain more about ref count 'lifecyle' here? +This mechanism is used to maintain the invariants required for safety; +for example, a domain cannot have a writable mapping to any part of a +page table as this would require the page concerned to simultaneously +be of types {\sf PT} and {\sf RW}. + + +% \section{Writable Page Tables} + +Xen also provides an alternative mode of operation in which guests be +have the illusion that their page tables are directly writable. Of +course this is not really the case, since Xen must still validate +modifications to ensure secure partitioning. To this end, Xen traps +any write attempt to a memory page of type {\sf PT} (i.e., that is +currently part of a page table). If such an access occurs, Xen +temporarily allows write access to that page while at the same time +\emph{disconnecting} it from the page table that is currently in use. +This allows the guest to safely make updates to the page because the +newly-updated entries cannot be used by the MMU until Xen revalidates +and reconnects the page. Reconnection occurs automatically in a +number of situations: for example, when the guest modifies a different +page-table page, when the domain is preempted, or whenever the guest +uses Xen's explicit page-table update interfaces. + +Finally, Xen also supports a form of \emph{shadow page tables} in +which the guest OS uses a independent copy of page tables which are +unknown to the hardware (i.e.\ which are never pointed to by {\tt + cr3}). Instead Xen propagates changes made to the guest's tables to +the real ones, and vice versa. This is useful for logging page writes +(e.g.\ for live migration or checkpoint). A full version of the shadow +page tables also allows guest OS porting with less effort. + + +\section{Segment Descriptor Tables} + +On boot a guest is supplied with a default GDT, which does not reside +within its own memory allocation. If the guest wishes to use other +than the default `flat' ring-1 and ring-3 segments that this GDT +provides, it must register a custom GDT and/or LDT with Xen, allocated +from its own memory. Note that a number of GDT entries are reserved by +Xen -- any custom GDT must also include sufficient space for these +entries. + +For example, the following hypercall is used to specify a new GDT: + +\begin{quote} + int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em + entries}) + + \emph{frame\_list}: An array of up to 16 machine page frames within + which the GDT resides. Any frame registered as a GDT frame may only + be mapped read-only within the guest's address space (e.g., no + writable mappings, no use as a page-table page, and so on). + + \emph{entries}: The number of descriptor-entry slots in the GDT. + Note that the table must be large enough to contain Xen's reserved + entries; thus we must have `{\em entries $>$ + LAST\_RESERVED\_GDT\_ENTRY}\ '. Note also that, after registering + the GDT, slots \emph{FIRST\_} through + \emph{LAST\_RESERVED\_GDT\_ENTRY} are no longer usable by the guest + and may be overwritten by Xen. +\end{quote} + +The LDT is updated via the generic MMU update mechanism (i.e., via the +{\tt mmu\_update()} hypercall. + +\section{Start of Day} + +The start-of-day environment for guest operating systems is rather +different to that provided by the underlying hardware. In particular, +the processor is already executing in protected mode with paging +enabled. + +{\it Domain 0} is created and booted by Xen itself. For all subsequent +domains, the analogue of the boot-loader is the {\it domain builder}, +user-space software running in {\it domain 0}. The domain builder is +responsible for building the initial page tables for a domain and +loading its kernel image at the appropriate virtual address. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/interface/scheduling.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/interface/scheduling.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,268 @@ +\chapter{Scheduling API} + +The scheduling API is used by both the schedulers described above and should +also be used by any new schedulers. It provides a generic interface and also +implements much of the ``boilerplate'' code. + +Schedulers conforming to this API are described by the following +structure: + +\begin{verbatim} +struct scheduler +{ + char *name; /* full name for this scheduler */ + char *opt_name; /* option name for this scheduler */ + unsigned int sched_id; /* ID for this scheduler */ + + int (*init_scheduler) (); + int (*alloc_task) (struct task_struct *); + void (*add_task) (struct task_struct *); + void (*free_task) (struct task_struct *); + void (*rem_task) (struct task_struct *); + void (*wake_up) (struct task_struct *); + void (*do_block) (struct task_struct *); + task_slice_t (*do_schedule) (s_time_t); + int (*control) (struct sched_ctl_cmd *); + int (*adjdom) (struct task_struct *, + struct sched_adjdom_cmd *); + s32 (*reschedule) (struct task_struct *); + void (*dump_settings) (void); + void (*dump_cpu_state) (int); + void (*dump_runq_el) (struct task_struct *); +}; +\end{verbatim} + +The only method that {\em must} be implemented is +{\tt do\_schedule()}. However, if there is not some implementation for the +{\tt wake\_up()} method then waking tasks will not get put on the runqueue! + +The fields of the above structure are described in more detail below. + +\subsubsection{name} + +The name field should point to a descriptive ASCII string. + +\subsubsection{opt\_name} + +This field is the value of the {\tt sched=} boot-time option that will select +this scheduler. + +\subsubsection{sched\_id} + +This is an integer that uniquely identifies this scheduler. There should be a +macro corrsponding to this scheduler ID in {\tt <xen/sched-if.h>}. + +\subsubsection{init\_scheduler} + +\paragraph*{Purpose} + +This is a function for performing any scheduler-specific initialisation. For +instance, it might allocate memory for per-CPU scheduler data and initialise it +appropriately. + +\paragraph*{Call environment} + +This function is called after the initialisation performed by the generic +layer. The function is called exactly once, for the scheduler that has been +selected. + +\paragraph*{Return values} + +This should return negative on failure --- this will cause an +immediate panic and the system will fail to boot. + +\subsubsection{alloc\_task} + +\paragraph*{Purpose} +Called when a {\tt task\_struct} is allocated by the generic scheduler +layer. A particular scheduler implementation may use this method to +allocate per-task data for this task. It may use the {\tt +sched\_priv} pointer in the {\tt task\_struct} to point to this data. + +\paragraph*{Call environment} +The generic layer guarantees that the {\tt sched\_priv} field will +remain intact from the time this method is called until the task is +deallocated (so long as the scheduler implementation does not change +it explicitly!). + +\paragraph*{Return values} +Negative on failure. + +\subsubsection{add\_task} + +\paragraph*{Purpose} + +Called when a task is initially added by the generic layer. + +\paragraph*{Call environment} + +The fields in the {\tt task\_struct} are now filled out and available for use. +Schedulers should implement appropriate initialisation of any per-task private +information in this method. + +\subsubsection{free\_task} + +\paragraph*{Purpose} + +Schedulers should free the space used by any associated private data +structures. + +\paragraph*{Call environment} + +This is called when a {\tt task\_struct} is about to be deallocated. +The generic layer will have done generic task removal operations and +(if implemented) called the scheduler's {\tt rem\_task} method before +this method is called. + +\subsubsection{rem\_task} + +\paragraph*{Purpose} + +This is called when a task is being removed from scheduling (but is +not yet being freed). + +\subsubsection{wake\_up} + +\paragraph*{Purpose} + +Called when a task is woken up, this method should put the task on the runqueue +(or do the scheduler-specific equivalent action). + +\paragraph*{Call environment} + +The task is already set to state RUNNING. + +\subsubsection{do\_block} + +\paragraph*{Purpose} + +This function is called when a task is blocked. This function should +not remove the task from the runqueue. + +\paragraph*{Call environment} + +The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to +TASK\_INTERRUPTIBLE on entry to this method. A call to the {\tt + do\_schedule} method will be made after this method returns, in +order to select the next task to run. + +\subsubsection{do\_schedule} + +This method must be implemented. + +\paragraph*{Purpose} + +The method is called each time a new task must be chosen for scheduling on the +current CPU. The current time as passed as the single argument (the current +task can be found using the {\tt current} macro). + +This method should select the next task to run on this CPU and set it's minimum +time to run as well as returning the data described below. + +This method should also take the appropriate action if the previous +task has blocked, e.g. removing it from the runqueue. + +\paragraph*{Call environment} + +The other fields in the {\tt task\_struct} are updated by the generic layer, +which also performs all Xen-specific tasks and performs the actual task switch +(unless the previous task has been chosen again). + +This method is called with the {\tt schedule\_lock} held for the current CPU +and local interrupts disabled. + +\paragraph*{Return values} + +Must return a {\tt struct task\_slice} describing what task to run and how long +for (at maximum). + +\subsubsection{control} + +\paragraph*{Purpose} + +This method is called for global scheduler control operations. It takes a +pointer to a {\tt struct sched\_ctl\_cmd}, which it should either +source data from or populate with data, depending on the value of the +{\tt direction} field. + +\paragraph*{Call environment} + +The generic layer guarantees that when this method is called, the +caller selected the correct scheduler ID, hence the scheduler's +implementation does not need to sanity-check these parts of the call. + +\paragraph*{Return values} + +This function should return the value to be passed back to user space, hence it +should either be 0 or an appropriate errno value. + +\subsubsection{sched\_adjdom} + +\paragraph*{Purpose} + +This method is called to adjust the scheduling parameters of a particular +domain, or to query their current values. The function should check +the {\tt direction} field of the {\tt sched\_adjdom\_cmd} it receives in +order to determine which of these operations is being performed. + +\paragraph*{Call environment} + +The generic layer guarantees that the caller has specified the correct +control interface version and scheduler ID and that the supplied {\tt +task\_struct} will not be deallocated during the call (hence it is not +necessary to {\tt get\_task\_struct}). + +\paragraph*{Return values} + +This function should return the value to be passed back to user space, hence it +should either be 0 or an appropriate errno value. + +\subsubsection{reschedule} + +\paragraph*{Purpose} + +This method is called to determine if a reschedule is required as a result of a +particular task. + +\paragraph*{Call environment} +The generic layer will cause a reschedule if the current domain is the idle +task or it has exceeded its minimum time slice before a reschedule. The +generic layer guarantees that the task passed is not currently running but is +on the runqueue. + +\paragraph*{Return values} + +Should return a mask of CPUs to cause a reschedule on. + +\subsubsection{dump\_settings} + +\paragraph*{Purpose} + +If implemented, this should dump any private global settings for this +scheduler to the console. + +\paragraph*{Call environment} + +This function is called with interrupts enabled. + +\subsubsection{dump\_cpu\_state} + +\paragraph*{Purpose} + +This method should dump any private settings for the specified CPU. + +\paragraph*{Call environment} + +This function is called with interrupts disabled and the {\tt schedule\_lock} +for the specified CPU held. + +\subsubsection{dump\_runq\_el} + +\paragraph*{Purpose} + +This method should dump any private settings for the specified task. + +\paragraph*{Call environment} + +This function is called with interrupts disabled and the {\tt schedule\_lock} +for the task's CPU held. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user/build.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/user/build.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,170 @@ +\chapter{Build, Boot and Debug Options} + +This chapter describes the build- and boot-time options which may be +used to tailor your Xen system. + + +\section{Xen Build Options} + +Xen provides a number of build-time options which should be set as +environment variables or passed on make's command-line. + +\begin{description} +\item[verbose=y] Enable debugging messages when Xen detects an + unexpected condition. Also enables console output from all domains. +\item[debug=y] Enable debug assertions. Implies {\bf verbose=y}. + (Primarily useful for tracing bugs in Xen). +\item[debugger=y] Enable the in-Xen debugger. This can be used to + debug Xen, guest OSes, and applications. +\item[perfc=y] Enable performance counters for significant events + within Xen. The counts can be reset or displayed on Xen's console + via console control keys. +\item[trace=y] Enable per-cpu trace buffers which log a range of + events within Xen for collection by control software. +\end{description} + + +\section{Xen Boot Options} +\label{s:xboot} + +These options are used to configure Xen's behaviour at runtime. They +should be appended to Xen's command line, either manually or by +editing \path{grub.conf}. + +\begin{description} +\item [ noreboot ] Don't reboot the machine automatically on errors. + This is useful to catch debug output if you aren't catching console + messages via the serial line. +\item [ nosmp ] Disable SMP support. This option is implied by + `ignorebiostables'. +\item [ watchdog ] Enable NMI watchdog which can report certain + failures. +\item [ noirqbalance ] Disable software IRQ balancing and affinity. + This can be used on systems such as Dell 1850/2850 that have + workarounds in hardware for IRQ-routing issues. +\item [ badpage=$<$page number$>$,$<$page number$>$, \ldots ] Specify + a list of pages not to be allocated for use because they contain bad + bytes. For example, if your memory tester says that byte 0x12345678 + is bad, you would place `badpage=0x12345' on Xen's command line. +\item [ com1=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ + com2=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ ] \mbox{}\\ + Xen supports up to two 16550-compatible serial ports. For example: + `com1=9600, 8n1, 0x408, 5' maps COM1 to a 9600-baud port, 8 data + bits, no parity, 1 stop bit, I/O port base 0x408, IRQ 5. If some + configuration options are standard (e.g., I/O base and IRQ), then + only a prefix of the full configuration string need be specified. If + the baud rate is pre-configured (e.g., by the bootloader) then you + can specify `auto' in place of a numeric baud rate. +\item [ console=$<$specifier list$>$ ] Specify the destination for Xen + console I/O. This is a comma-separated list of, for example: + \begin{description} + \item[ vga ] Use VGA console and allow keyboard input. + \item[ com1 ] Use serial port com1. + \item[ com2H ] Use serial port com2. Transmitted chars will have the + MSB set. Received chars must have MSB set. + \item[ com2L] Use serial port com2. Transmitted chars will have the + MSB cleared. Received chars must have MSB cleared. + \end{description} + The latter two examples allow a single port to be shared by two + subsystems (e.g.\ console and debugger). Sharing is controlled by + MSB of each transmitted/received character. [NB. Default for this + option is `com1,vga'] +\item [ sync\_console ] Force synchronous console output. This is + useful if you system fails unexpectedly before it has sent all + available output to the console. In most cases Xen will + automatically enter synchronous mode when an exceptional event + occurs, but this option provides a manual fallback. +\item [ conswitch=$<$switch-char$><$auto-switch-char$>$ ] Specify how + to switch serial-console input between Xen and DOM0. The required + sequence is CTRL-$<$switch-char$>$ pressed three times. Specifying + the backtick character disables switching. The + $<$auto-switch-char$>$ specifies whether Xen should auto-switch + input to DOM0 when it boots --- if it is `x' then auto-switching is + disabled. Any other value, or omitting the character, enables + auto-switching. [NB. Default switch-char is `a'.] +\item [ nmi=xxx ] + Specify what to do with an NMI parity or I/O error. \\ + `nmi=fatal': Xen prints a diagnostic and then hangs. \\ + `nmi=dom0': Inform DOM0 of the NMI. \\ + `nmi=ignore': Ignore the NMI. +\item [ mem=xxx ] Set the physical RAM address limit. Any RAM + appearing beyond this physical address in the memory map will be + ignored. This parameter may be specified with a B, K, M or G suffix, + representing bytes, kilobytes, megabytes and gigabytes respectively. + The default unit, if no suffix is specified, is kilobytes. +\item [ dom0\_mem=xxx ] Set the amount of memory to be allocated to + domain0. In Xen 3.x the parameter may be specified with a B, K, M or + G suffix, representing bytes, kilobytes, megabytes and gigabytes + respectively; if no suffix is specified, the parameter defaults to + kilobytes. In previous versions of Xen, suffixes were not supported + and the value is always interpreted as kilobytes. +\item [ tbuf\_size=xxx ] Set the size of the per-cpu trace buffers, in + pages (default 1). Note that the trace buffers are only enabled in + debug builds. Most users can ignore this feature completely. +\item [ sched=xxx ] Select the CPU scheduler Xen should use. The + current possibilities are `bvt' (default), `atropos' and `rrobin'. + For more information see Section~\ref{s:sched}. +\item [ apic\_verbosity=debug,verbose ] Print more detailed + information about local APIC and IOAPIC configuration. +\item [ lapic ] Force use of local APIC even when left disabled by + uniprocessor BIOS. +\item [ nolapic ] Ignore local APIC in a uniprocessor system, even if + enabled by the BIOS. +\item [ apic=bigsmp,default,es7000,summit ] Specify NUMA platform. + This can usually be probed automatically. +\end{description} + +In addition, the following options may be specified on the Xen command +line. Since domain 0 shares responsibility for booting the platform, +Xen will automatically propagate these options to its command line. +These options are taken from Linux's command-line syntax with +unchanged semantics. + +\begin{description} +\item [ acpi=off,force,strict,ht,noirq,\ldots ] Modify how Xen (and + domain 0) parses the BIOS ACPI tables. +\item [ acpi\_skip\_timer\_override ] Instruct Xen (and domain~0) to + ignore timer-interrupt override instructions specified by the BIOS + ACPI tables. +\item [ noapic ] Instruct Xen (and domain~0) to ignore any IOAPICs + that are present in the system, and instead continue to use the + legacy PIC. +\end{description} + + +\section{XenLinux Boot Options} + +In addition to the standard Linux kernel boot options, we support: +\begin{description} +\item[ xencons=xxx ] Specify the device node to which the Xen virtual + console driver is attached. The following options are supported: + \begin{center} + \begin{tabular}{l} + `xencons=off': disable virtual console \\ + `xencons=tty': attach console to /dev/tty1 (tty0 at boot-time) \\ + `xencons=ttyS': attach console to /dev/ttyS0 + \end{tabular} +\end{center} +The default is ttyS for dom0 and tty for all other domains. +\end{description} + + +\section{Debugging} +\label{s:keys} + +Xen has a set of debugging features that can be useful to try and +figure out what's going on. Hit `h' on the serial line (if you +specified a baud rate on the Xen command line) or ScrollLock-h on the +keyboard to get a list of supported commands. + +If you have a crash you'll likely get a crash dump containing an EIP +(PC) which, along with an \path{objdump -d image}, can be useful in +figuring out what's happened. Debug a Xenlinux image just as you +would any other Linux kernel. + +%% We supply a handy debug terminal program which you can find in +%% \path{/usr/local/src/xen-2.0.bk/tools/misc/miniterm/} This should +%% be built and executed on another machine that is connected via a +%% null modem cable. Documentation is included. Alternatively, if the +%% Xen machine is connected to a serial-port server then we supply a +%% dumb TCP terminal client, {\tt xencons}. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user/control_software.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/user/control_software.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,115 @@ +\chapter{Control Software} + +The Xen control software includes the \xend\ node control daemon +(which must be running), the xm command line tools, and the prototype +xensv web interface. + +\section{\Xend\ (node control daemon)} +\label{s:xend} + +The Xen Daemon (\Xend) performs system management functions related to +virtual machines. It forms a central point of control for a machine +and can be controlled using an HTTP-based protocol. \Xend\ must be +running in order to start and manage virtual machines. + +\Xend\ must be run as root because it needs access to privileged +system management functions. A small set of commands may be issued on +the \xend\ command line: + +\begin{tabular}{ll} + \verb!# xend start! & start \xend, if not already running \\ + \verb!# xend stop! & stop \xend\ if already running \\ + \verb!# xend restart! & restart \xend\ if running, otherwise start it \\ + % \verb!# xend trace_start! & start \xend, with very detailed debug logging \\ + \verb!# xend status! & indicates \xend\ status by its return code +\end{tabular} + +A SysV init script called {\tt xend} is provided to start \xend\ at +boot time. {\tt make install} installs this script in +\path{/etc/init.d}. To enable it, you have to make symbolic links in +the appropriate runlevel directories or use the {\tt chkconfig} tool, +where available. + +Once \xend\ is running, more sophisticated administration can be done +using the xm tool (see Section~\ref{s:xm}) and the experimental Xensv +web interface (see Section~\ref{s:xensv}). + +As \xend\ runs, events will be logged to \path{/var/log/xend.log} and, +if the migration assistant daemon (\path{xfrd}) has been started, +\path{/var/log/xfrd.log}. These may be of use for troubleshooting +problems. + +\section{Xm (command line interface)} +\label{s:xm} + +The xm tool is the primary tool for managing Xen from the console. +The general format of an xm command line is: + +\begin{verbatim} +# xm command [switches] [arguments] [variables] +\end{verbatim} + +The available \emph{switches} and \emph{arguments} are dependent on +the \emph{command} chosen. The \emph{variables} may be set using +declarations of the form {\tt variable=value} and command line +declarations override any of the values in the configuration file +being used, including the standard variables described above and any +custom variables (for instance, the \path{xmdefconfig} file uses a +{\tt vmid} variable). + +The available commands are as follows: + +\begin{description} +\item[set-mem] Request a domain to adjust its memory footprint. +\item[create] Create a new domain. +\item[destroy] Kill a domain immediately. +\item[list] List running domains. +\item[shutdown] Ask a domain to shutdown. +\item[dmesg] Fetch the Xen (not Linux!) boot output. +\item[consoles] Lists the available consoles. +\item[console] Connect to the console for a domain. +\item[help] Get help on xm commands. +\item[save] Suspend a domain to disk. +\item[restore] Restore a domain from disk. +\item[pause] Pause a domain's execution. +\item[unpause] Un-pause a domain. +\item[pincpu] Pin a domain to a CPU. +\item[bvt] Set BVT scheduler parameters for a domain. +\item[bvt\_ctxallow] Set the BVT context switching allowance for the + system. +\item[atropos] Set the atropos parameters for a domain. +\item[rrobin] Set the round robin time slice for the system. +\item[info] Get information about the Xen host. +\item[call] Call a \xend\ HTTP API function directly. +\end{description} + +For a detailed overview of switches, arguments and variables to each +command try +\begin{quote} +\begin{verbatim} +# xm help command +\end{verbatim} +\end{quote} + +\section{Xensv (web control interface)} +\label{s:xensv} + +Xensv is the experimental web control interface for managing a Xen +machine. It can be used to perform some (but not yet all) of the +management tasks that can be done using the xm tool. + +It can be started using: +\begin{quote} + \verb_# xensv start_ +\end{quote} +and stopped using: +\begin{quote} + \verb_# xensv stop_ +\end{quote} + +By default, Xensv will serve out the web interface on port 8080. This +can be changed by editing +\path{/usr/lib/python2.3/site-packages/xen/sv/params.py}. + +Once Xensv is running, the web interface can be used to create and +manage running domains. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user/debian.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/user/debian.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,154 @@ +\chapter{Installing Xen / XenLinux on Debian} + +The Debian project provides a tool called \path{debootstrap} which +allows a base Debian system to be installed into a filesystem without +requiring the host system to have any Debian-specific software (such +as \path{apt}). + +Here's some info how to install Debian 3.1 (Sarge) for an unprivileged +Xen domain: + +\begin{enumerate} + +\item Set up Xen and test that it's working, as described earlier in + this manual. + +\item Create disk images for rootfs and swap. Alternatively, you might + create dedicated partitions, LVM logical volumes, etc.\ if that + suits your setup. +\begin{verbatim} +dd if=/dev/zero of=/path/diskimage bs=1024k count=size_in_mbytes +dd if=/dev/zero of=/path/swapimage bs=1024k count=size_in_mbytes +\end{verbatim} + + If you're going to use this filesystem / disk image only as a + `template' for other vm disk images, something like 300 MB should be + enough. (of course it depends what kind of packages you are planning + to install to the template) + +\item Create the filesystem and initialise the swap image +\begin{verbatim} +mkfs.ext3 /path/diskimage +mkswap /path/swapimage +\end{verbatim} + +\item Mount the disk image for installation +\begin{verbatim} +mount -o loop /path/diskimage /mnt/disk +\end{verbatim} + +\item Install \path{debootstrap}. Make sure you have debootstrap + installed on the host. If you are running Debian Sarge (3.1 / + testing) or unstable you can install it by running \path{apt-get + install debootstrap}. Otherwise, it can be downloaded from the + Debian project website. + +\item Install Debian base to the disk image: +\begin{verbatim} +debootstrap --arch i386 sarge /mnt/disk \ + http://ftp.<countrycode>.debian.org/debian +\end{verbatim} + + You can use any other Debian http/ftp mirror you want. + +\item When debootstrap completes successfully, modify settings: +\begin{verbatim} +chroot /mnt/disk /bin/bash +\end{verbatim} + +Edit the following files using vi or nano and make needed changes: +\begin{verbatim} +/etc/hostname +/etc/hosts +/etc/resolv.conf +/etc/network/interfaces +/etc/networks +\end{verbatim} + +Set up access to the services, edit: +\begin{verbatim} +/etc/hosts.deny +/etc/hosts.allow +/etc/inetd.conf +\end{verbatim} + +Add Debian mirror to: +\begin{verbatim} +/etc/apt/sources.list +\end{verbatim} + +Create fstab like this: +\begin{verbatim} +/dev/sda1 / ext3 errors=remount-ro 0 1 +/dev/sda2 none swap sw 0 0 +proc /proc proc defaults 0 0 +\end{verbatim} + +Logout + +\item Unmount the disk image +\begin{verbatim} +umount /mnt/disk +\end{verbatim} + +\item Create Xen 2.0 configuration file for the new domain. You can + use the example-configurations coming with Xen as a template. + + Make sure you have the following set up: +\begin{verbatim} +disk = [ 'file:/path/diskimage,sda1,w', 'file:/path/swapimage,sda2,w' ] +root = "/dev/sda1 ro" +\end{verbatim} + +\item Start the new domain +\begin{verbatim} +xm create -f domain_config_file +\end{verbatim} + +Check that the new domain is running: +\begin{verbatim} +xm list +\end{verbatim} + +\item Attach to the console of the new domain. You should see + something like this when starting the new domain: + +\begin{verbatim} +Started domain testdomain2, console on port 9626 +\end{verbatim} + + There you can see the ID of the console: 26. You can also list the + consoles with \path{xm consoles} (ID is the last two digits of the + port number.) + + Attach to the console: + +\begin{verbatim} +xm console 26 +\end{verbatim} + + or by telnetting to the port 9626 of localhost (the xm console + program works better). + +\item Log in and run base-config + + As a default there's no password for the root. + + Check that everything looks OK, and the system started without + errors. Check that the swap is active, and the network settings are + correct. + + Run \path{/usr/sbin/base-config} to set up the Debian settings. + + Set up the password for root using passwd. + +\item Done. You can exit the console by pressing {\path{Ctrl + ]}} + +\end{enumerate} + + +If you need to create new domains, you can just copy the contents of +the `template'-image to the new disk images, either by mounting the +template and the new image, and using \path{cp -a} or \path{tar} or by +simply copying the image file. Once this is done, modify the +image-specific settings (hostname, network settings, etc). diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user/domain_configuration.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/user/domain_configuration.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,281 @@ +\chapter{Domain Configuration} +\label{cha:config} + +The following contains the syntax of the domain configuration files +and description of how to further specify networking, driver domain +and general scheduling behavior. + + +\section{Configuration Files} +\label{s:cfiles} + +Xen configuration files contain the following standard variables. +Unless otherwise stated, configuration items should be enclosed in +quotes: see \path{/etc/xen/xmexample1} and \path{/etc/xen/xmexample2} +for concrete examples of the syntax. + +\begin{description} +\item[kernel] Path to the kernel image. +\item[ramdisk] Path to a ramdisk image (optional). + % \item[builder] The name of the domain build function (e.g. + % {\tt'linux'} or {\tt'netbsd'}. +\item[memory] Memory size in megabytes. +\item[cpu] CPU to run this domain on, or {\tt -1} for auto-allocation. +\item[console] Port to export the domain console on (default 9600 + + domain ID). +\item[nics] Number of virtual network interfaces. +\item[vif] List of MAC addresses (random addresses are assigned if not + given) and bridges to use for the domain's network interfaces, e.g.\ +\begin{verbatim} +vif = [ 'mac=aa:00:00:00:00:11, bridge=xen-br0', + 'bridge=xen-br1' ] +\end{verbatim} + to assign a MAC address and bridge to the first interface and assign + a different bridge to the second interface, leaving \xend\ to choose + the MAC address. +\item[disk] List of block devices to export to the domain, e.g.\ \\ + \verb_disk = [ 'phy:hda1,sda1,r' ]_ \\ + exports physical device \path{/dev/hda1} to the domain as + \path{/dev/sda1} with read-only access. Exporting a disk read-write + which is currently mounted is dangerous -- if you are \emph{certain} + you wish to do this, you can specify \path{w!} as the mode. +\item[dhcp] Set to {\tt `dhcp'} if you want to use DHCP to configure + networking. +\item[netmask] Manually configured IP netmask. +\item[gateway] Manually configured IP gateway. +\item[hostname] Set the hostname for the virtual machine. +\item[root] Specify the root device parameter on the kernel command + line. +\item[nfs\_server] IP address for the NFS server (if any). +\item[nfs\_root] Path of the root filesystem on the NFS server (if + any). +\item[extra] Extra string to append to the kernel command line (if + any) +\item[restart] Three possible options: + \begin{description} + \item[always] Always restart the domain, no matter what its exit + code is. + \item[never] Never restart the domain. + \item[onreboot] Restart the domain iff it requests reboot. + \end{description} +\end{description} + +For additional flexibility, it is also possible to include Python +scripting commands in configuration files. An example of this is the +\path{xmexample2} file, which uses Python code to handle the +\path{vmid} variable. + + +%\part{Advanced Topics} + + +\section{Network Configuration} + +For many users, the default installation should work ``out of the +box''. More complicated network setups, for instance with multiple +Ethernet interfaces and/or existing bridging setups will require some +special configuration. + +The purpose of this section is to describe the mechanisms provided by +\xend\ to allow a flexible configuration for Xen's virtual networking. + +\subsection{Xen virtual network topology} + +Each domain network interface is connected to a virtual network +interface in dom0 by a point to point link (effectively a ``virtual +crossover cable''). These devices are named {\tt + vif$<$domid$>$.$<$vifid$>$} (e.g.\ {\tt vif1.0} for the first +interface in domain~1, {\tt vif3.1} for the second interface in +domain~3). + +Traffic on these virtual interfaces is handled in domain~0 using +standard Linux mechanisms for bridging, routing, rate limiting, etc. +Xend calls on two shell scripts to perform initial configuration of +the network and configuration of new virtual interfaces. By default, +these scripts configure a single bridge for all the virtual +interfaces. Arbitrary routing / bridging configurations can be +configured by customizing the scripts, as described in the following +section. + +\subsection{Xen networking scripts} + +Xen's virtual networking is configured by two shell scripts (by +default \path{network} and \path{vif-bridge}). These are called +automatically by \xend\ when certain events occur, with arguments to +the scripts providing further contextual information. These scripts +are found by default in \path{/etc/xen/scripts}. The names and +locations of the scripts can be configured in +\path{/etc/xen/xend-config.sxp}. + +\begin{description} +\item[network:] This script is called whenever \xend\ is started or + stopped to respectively initialize or tear down the Xen virtual + network. In the default configuration initialization creates the + bridge `xen-br0' and moves eth0 onto that bridge, modifying the + routing accordingly. When \xend\ exits, it deletes the Xen bridge + and removes eth0, restoring the normal IP and routing configuration. + + %% In configurations where the bridge already exists, this script + %% could be replaced with a link to \path{/bin/true} (for instance). + +\item[vif-bridge:] This script is called for every domain virtual + interface and can configure firewalling rules and add the vif to the + appropriate bridge. By default, this adds and removes VIFs on the + default Xen bridge. +\end{description} + +For more complex network setups (e.g.\ where routing is required or +integrate with existing bridges) these scripts may be replaced with +customized variants for your site's preferred configuration. + +%% There are two possible types of privileges: IO privileges and +%% administration privileges. + + +\section{Driver Domain Configuration} + +I/O privileges can be assigned to allow a domain to directly access +PCI devices itself. This is used to support driver domains. + +Setting back-end privileges is currently only supported in SXP format +config files. To allow a domain to function as a back-end for others, +somewhere within the {\tt vm} element of its configuration file must +be a {\tt back-end} element of the form {\tt (back-end ({\em type}))} +where {\tt \em type} may be either {\tt netif} or {\tt blkif}, +according to the type of virtual device this domain will service. +%% After this domain has been built, \xend will connect all new and +%% existing {\em virtual} devices (of the appropriate type) to that +%% back-end. + +Note that a block back-end cannot currently import virtual block +devices from other domains, and a network back-end cannot import +virtual network devices from other domains. Thus (particularly in the +case of block back-ends, which cannot import a virtual block device as +their root filesystem), you may need to boot a back-end domain from a +ramdisk or a network device. + +Access to PCI devices may be configured on a per-device basis. Xen +will assign the minimal set of hardware privileges to a domain that +are required to control its devices. This can be configured in either +format of configuration file: + +\begin{itemize} +\item SXP Format: Include device elements of the form: \\ + \centerline{ {\tt (device (pci (bus {\em x}) (dev {\em y}) (func {\em z})))}} \\ + inside the top-level {\tt vm} element. Each one specifies the + address of a device this domain is allowed to access --- the numbers + \emph{x},\emph{y} and \emph{z} may be in either decimal or + hexadecimal format. +\item Flat Format: Include a list of PCI device addresses of the + format: \\ + \centerline{{\tt pci = ['x,y,z', \ldots]}} \\ + where each element in the list is a string specifying the components + of the PCI device address, separated by commas. The components + ({\tt \em x}, {\tt \em y} and {\tt \em z}) of the list may be + formatted as either decimal or hexadecimal. +\end{itemize} + +%% \section{Administration Domains} + +%% Administration privileges allow a domain to use the `dom0 +%% operations' (so called because they are usually available only to +%% domain 0). A privileged domain can build other domains, set +%% scheduling parameters, etc. + +% Support for other administrative domains is not yet available... +% perhaps we should plumb it in some time + + +\section{Scheduler Configuration} +\label{s:sched} + +Xen offers a boot time choice between multiple schedulers. To select +a scheduler, pass the boot parameter \emph{sched=sched\_name} to Xen, +substituting the appropriate scheduler name. Details of the +schedulers and their parameters are included below; future versions of +the tools will provide a higher-level interface to these tools. + +It is expected that system administrators configure their system to +use the scheduler most appropriate to their needs. Currently, the BVT +scheduler is the recommended choice. + +\subsection{Borrowed Virtual Time} + +{\tt sched=bvt} (the default) \\ + +BVT provides proportional fair shares of the CPU time. It has been +observed to penalize domains that block frequently (e.g.\ I/O +intensive domains), but this can be compensated for by using warping. + +\subsubsection{Global Parameters} + +\begin{description} +\item[ctx\_allow] The context switch allowance is similar to the + ``quantum'' in traditional schedulers. It is the minimum time that + a scheduled domain will be allowed to run before being preempted. +\end{description} + +\subsubsection{Per-domain parameters} + +\begin{description} +\item[mcuadv] The MCU (Minimum Charging Unit) advance determines the + proportional share of the CPU that a domain receives. It is set + inversely proportionally to a domain's sharing weight. +\item[warp] The amount of ``virtual time'' the domain is allowed to + warp backwards. +\item[warpl] The warp limit is the maximum time a domain can run + warped for. +\item[warpu] The unwarp requirement is the minimum time a domain must + run unwarped for before it can warp again. +\end{description} + +\subsection{Atropos} + +{\tt sched=atropos} \\ + +Atropos is a soft real time scheduler. It provides guarantees about +absolute shares of the CPU, with a facility for sharing slack CPU time +on a best-effort basis. It can provide timeliness guarantees for +latency-sensitive domains. + +Every domain has an associated period and slice. The domain should +receive `slice' nanoseconds every `period' nanoseconds. This allows +the administrator to configure both the absolute share of the CPU a +domain receives and the frequency with which it is scheduled. + +%% When domains unblock, their period is reduced to the value of the +%% latency hint (the slice is scaled accordingly so that they still +%% get the same proportion of the CPU). For each subsequent period, +%% the slice and period times are doubled until they reach their +%% original values. + +Note: don't over-commit the CPU when using Atropos (i.e.\ don't reserve +more CPU than is available --- the utilization should be kept to +slightly less than 100\% in order to ensure predictable behavior). + +\subsubsection{Per-domain parameters} + +\begin{description} +\item[period] The regular time interval during which a domain is + guaranteed to receive its allocation of CPU time. +\item[slice] The length of time per period that a domain is guaranteed + to run for (in the absence of voluntary yielding of the CPU). +\item[latency] The latency hint is used to control how soon after + waking up a domain it should be scheduled. +\item[xtratime] This is a boolean flag that specifies whether a domain + should be allowed a share of the system slack time. +\end{description} + +\subsection{Round Robin} + +{\tt sched=rrobin} \\ + +The round robin scheduler is included as a simple demonstration of +Xen's internal scheduler API. It is not intended for production use. + +\subsubsection{Global Parameters} + +\begin{description} +\item[rr\_slice] The maximum time each domain runs before the next + scheduling decision is made. +\end{description} diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user/domain_filesystem.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/user/domain_filesystem.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,243 @@ +\chapter{Domain Filesystem Storage} + +It is possible to directly export any Linux block device in dom0 to +another domain, or to export filesystems / devices to virtual machines +using standard network protocols (e.g.\ NBD, iSCSI, NFS, etc.). This +chapter covers some of the possibilities. + + +\section{Exporting Physical Devices as VBDs} +\label{s:exporting-physical-devices-as-vbds} + +One of the simplest configurations is to directly export individual +partitions from domain~0 to other domains. To achieve this use the +\path{phy:} specifier in your domain configuration file. For example a +line like +\begin{quote} + \verb_disk = ['phy:hda3,sda1,w']_ +\end{quote} +specifies that the partition \path{/dev/hda3} in domain~0 should be +exported read-write to the new domain as \path{/dev/sda1}; one could +equally well export it as \path{/dev/hda} or \path{/dev/sdb5} should +one wish. + +In addition to local disks and partitions, it is possible to export +any device that Linux considers to be ``a disk'' in the same manner. +For example, if you have iSCSI disks or GNBD volumes imported into +domain~0 you can export these to other domains using the \path{phy:} +disk syntax. E.g.: +\begin{quote} + \verb_disk = ['phy:vg/lvm1,sda2,w']_ +\end{quote} + +\begin{center} + \framebox{\bf Warning: Block device sharing} +\end{center} +\begin{quote} + Block devices should typically only be shared between domains in a + read-only fashion otherwise the Linux kernel's file systems will get + very confused as the file system structure may change underneath + them (having the same ext3 partition mounted \path{rw} twice is a + sure fire way to cause irreparable damage)! \Xend\ will attempt to + prevent you from doing this by checking that the device is not + mounted read-write in domain~0, and hasn't already been exported + read-write to another domain. If you want read-write sharing, + export the directory to other domains via NFS from domain~0 (or use + a cluster file system such as GFS or ocfs2). +\end{quote} + + +\section{Using File-backed VBDs} + +It is also possible to use a file in Domain~0 as the primary storage +for a virtual machine. As well as being convenient, this also has the +advantage that the virtual block device will be \emph{sparse} --- +space will only really be allocated as parts of the file are used. So +if a virtual machine uses only half of its disk space then the file +really takes up half of the size allocated. + +For example, to create a 2GB sparse file-backed virtual block device +(actually only consumes 1KB of disk): +\begin{quote} + \verb_# dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=1_ +\end{quote} + +Make a file system in the disk file: +\begin{quote} + \verb_# mkfs -t ext3 vm1disk_ +\end{quote} + +(when the tool asks for confirmation, answer `y') + +Populate the file system e.g.\ by copying from the current root: +\begin{quote} +\begin{verbatim} +# mount -o loop vm1disk /mnt +# cp -ax /{root,dev,var,etc,usr,bin,sbin,lib} /mnt +# mkdir /mnt/{proc,sys,home,tmp} +\end{verbatim} +\end{quote} + +Tailor the file system by editing \path{/etc/fstab}, +\path{/etc/hostname}, etc.\ Don't forget to edit the files in the +mounted file system, instead of your domain~0 filesystem, e.g.\ you +would edit \path{/mnt/etc/fstab} instead of \path{/etc/fstab}. For +this example put \path{/dev/sda1} to root in fstab. + +Now unmount (this is important!): +\begin{quote} + \verb_# umount /mnt_ +\end{quote} + +In the configuration file set: +\begin{quote} + \verb_disk = ['file:/full/path/to/vm1disk,sda1,w']_ +\end{quote} + +As the virtual machine writes to its `disk', the sparse file will be +filled in and consume more space up to the original 2GB. + +{\bf Note that file-backed VBDs may not be appropriate for backing + I/O-intensive domains.} File-backed VBDs are known to experience +substantial slowdowns under heavy I/O workloads, due to the I/O +handling by the loopback block device used to support file-backed VBDs +in dom0. Better I/O performance can be achieved by using either +LVM-backed VBDs (Section~\ref{s:using-lvm-backed-vbds}) or physical +devices as VBDs (Section~\ref{s:exporting-physical-devices-as-vbds}). + +Linux supports a maximum of eight file-backed VBDs across all domains +by default. This limit can be statically increased by using the +\emph{max\_loop} module parameter if CONFIG\_BLK\_DEV\_LOOP is +compiled as a module in the dom0 kernel, or by using the +\emph{max\_loop=n} boot option if CONFIG\_BLK\_DEV\_LOOP is compiled +directly into the dom0 kernel. + + +\section{Using LVM-backed VBDs} +\label{s:using-lvm-backed-vbds} + +A particularly appealing solution is to use LVM volumes as backing for +domain file-systems since this allows dynamic growing/shrinking of +volumes as well as snapshot and other features. + +To initialize a partition to support LVM volumes: +\begin{quote} +\begin{verbatim} +# pvcreate /dev/sda10 +\end{verbatim} +\end{quote} + +Create a volume group named `vg' on the physical partition: +\begin{quote} +\begin{verbatim} +# vgcreate vg /dev/sda10 +\end{verbatim} +\end{quote} + +Create a logical volume of size 4GB named `myvmdisk1': +\begin{quote} +\begin{verbatim} +# lvcreate -L4096M -n myvmdisk1 vg +\end{verbatim} +\end{quote} + +You should now see that you have a \path{/dev/vg/myvmdisk1} Make a +filesystem, mount it and populate it, e.g.: +\begin{quote} +\begin{verbatim} +# mkfs -t ext3 /dev/vg/myvmdisk1 +# mount /dev/vg/myvmdisk1 /mnt +# cp -ax / /mnt +# umount /mnt +\end{verbatim} +\end{quote} + +Now configure your VM with the following disk configuration: +\begin{quote} +\begin{verbatim} + disk = [ 'phy:vg/myvmdisk1,sda1,w' ] +\end{verbatim} +\end{quote} + +LVM enables you to grow the size of logical volumes, but you'll need +to resize the corresponding file system to make use of the new space. +Some file systems (e.g.\ ext3) now support online resize. See the LVM +manuals for more details. + +You can also use LVM for creating copy-on-write (CoW) clones of LVM +volumes (known as writable persistent snapshots in LVM terminology). +This facility is new in Linux 2.6.8, so isn't as stable as one might +hope. In particular, using lots of CoW LVM disks consumes a lot of +dom0 memory, and error conditions such as running out of disk space +are not handled well. Hopefully this will improve in future. + +To create two copy-on-write clone of the above file system you would +use the following commands: + +\begin{quote} +\begin{verbatim} +# lvcreate -s -L1024M -n myclonedisk1 /dev/vg/myvmdisk1 +# lvcreate -s -L1024M -n myclonedisk2 /dev/vg/myvmdisk1 +\end{verbatim} +\end{quote} + +Each of these can grow to have 1GB of differences from the master +volume. You can grow the amount of space for storing the differences +using the lvextend command, e.g.: +\begin{quote} +\begin{verbatim} +# lvextend +100M /dev/vg/myclonedisk1 +\end{verbatim} +\end{quote} + +Don't let the `differences volume' ever fill up otherwise LVM gets +rather confused. It may be possible to automate the growing process by +using \path{dmsetup wait} to spot the volume getting full and then +issue an \path{lvextend}. + +In principle, it is possible to continue writing to the volume that +has been cloned (the changes will not be visible to the clones), but +we wouldn't recommend this: have the cloned volume as a `pristine' +file system install that isn't mounted directly by any of the virtual +machines. + + +\section{Using NFS Root} + +First, populate a root filesystem in a directory on the server +machine. This can be on a distinct physical machine, or simply run +within a virtual machine on the same node. + +Now configure the NFS server to export this filesystem over the +network by adding a line to \path{/etc/exports}, for instance: + +\begin{quote} + \begin{small} +\begin{verbatim} +/export/vm1root 1.2.3.4/24 (rw,sync,no_root_squash) +\end{verbatim} + \end{small} +\end{quote} + +Finally, configure the domain to use NFS root. In addition to the +normal variables, you should make sure to set the following values in +the domain's configuration file: + +\begin{quote} + \begin{small} +\begin{verbatim} +root = '/dev/nfs' +nfs_server = '2.3.4.5' # substitute IP address of server +nfs_root = '/path/to/root' # path to root FS on the server +\end{verbatim} + \end{small} +\end{quote} + +The domain will need network access at boot time, so either statically +configure an IP address using the config variables \path{ip}, +\path{netmask}, \path{gateway}, \path{hostname}; or enable DHCP +(\path{dhcp='dhcp'}). + +Note that the Linux NFS root implementation is known to have stability +problems under high load (this is not a Xen-specific problem), so this +configuration may not be appropriate for critical servers. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user/domain_mgmt.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/user/domain_mgmt.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,203 @@ +\chapter{Domain Management Tools} + +The previous chapter described a simple example of how to configure +and start a domain. This chapter summarises the tools available to +manage running domains. + + +\section{Command-line Management} + +Command line management tasks are also performed using the \path{xm} +tool. For online help for the commands available, type: +\begin{quote} + \verb_# xm help_ +\end{quote} + +You can also type \path{xm help $<$command$>$} for more information on +a given command. + +\subsection{Basic Management Commands} + +The most important \path{xm} commands are: +\begin{quote} + \verb_# xm list_: Lists all domains running.\\ + \verb_# xm consoles_: Gives information about the domain consoles.\\ + \verb_# xm console_: Opens a console to a domain (e.g.\ + \verb_# xm console myVM_) +\end{quote} + +\subsection{\tt xm list} + +The output of \path{xm list} is in rows of the following format: +\begin{center} {\tt name domid memory cpu state cputime console} +\end{center} + +\begin{quote} + \begin{description} + \item[name] The descriptive name of the virtual machine. + \item[domid] The number of the domain ID this virtual machine is + running in. + \item[memory] Memory size in megabytes. + \item[cpu] The CPU this domain is running on. + \item[state] Domain state consists of 5 fields: + \begin{description} + \item[r] running + \item[b] blocked + \item[p] paused + \item[s] shutdown + \item[c] crashed + \end{description} + \item[cputime] How much CPU time (in seconds) the domain has used so + far. + \item[console] TCP port accepting connections to the domain's + console. + \end{description} +\end{quote} + +The \path{xm list} command also supports a long output format when the +\path{-l} switch is used. This outputs the fulls details of the +running domains in \xend's SXP configuration format. + +For example, suppose the system is running the ttylinux domain as +described earlier. The list command should produce output somewhat +like the following: +\begin{verbatim} +# xm list +Name Id Mem(MB) CPU State Time(s) Console +Domain-0 0 251 0 r---- 172.2 +ttylinux 5 63 0 -b--- 3.0 9605 +\end{verbatim} + +Here we can see the details for the ttylinux domain, as well as for +domain~0 (which, of course, is always running). Note that the console +port for the ttylinux domain is 9605. This can be connected to by TCP +using a terminal program (e.g. \path{telnet} or, better, +\path{xencons}). The simplest way to connect is to use the +\path{xm~console} command, specifying the domain name or ID. To +connect to the console of the ttylinux domain, we could use any of the +following: +\begin{verbatim} +# xm console ttylinux +# xm console 5 +# xencons localhost 9605 +\end{verbatim} + +\section{Domain Save and Restore} + +The administrator of a Xen system may suspend a virtual machine's +current state into a disk file in domain~0, allowing it to be resumed +at a later time. + +The ttylinux domain described earlier can be suspended to disk using +the command: +\begin{verbatim} +# xm save ttylinux ttylinux.xen +\end{verbatim} + +This will stop the domain named `ttylinux' and save its current state +into a file called \path{ttylinux.xen}. + +To resume execution of this domain, use the \path{xm restore} command: +\begin{verbatim} +# xm restore ttylinux.xen +\end{verbatim} + +This will restore the state of the domain and restart it. The domain +will carry on as before and the console may be reconnected using the +\path{xm console} command, as above. + +\section{Live Migration} + +Live migration is used to transfer a domain between physical hosts +whilst that domain continues to perform its usual activities --- from +the user's perspective, the migration should be imperceptible. + +To perform a live migration, both hosts must be running Xen / \xend\ +and the destination host must have sufficient resources (e.g.\ memory +capacity) to accommodate the domain after the move. Furthermore we +currently require both source and destination machines to be on the +same L2 subnet. + +Currently, there is no support for providing automatic remote access +to filesystems stored on local disk when a domain is migrated. +Administrators should choose an appropriate storage solution (i.e.\ +SAN, NAS, etc.) to ensure that domain filesystems are also available +on their destination node. GNBD is a good method for exporting a +volume from one machine to another. iSCSI can do a similar job, but is +more complex to set up. + +When a domain migrates, it's MAC and IP address move with it, thus it +is only possible to migrate VMs within the same layer-2 network and IP +subnet. If the destination node is on a different subnet, the +administrator would need to manually configure a suitable etherip or +IP tunnel in the domain~0 of the remote node. + +A domain may be migrated using the \path{xm migrate} command. To live +migrate a domain to another machine, we would use the command: + +\begin{verbatim} +# xm migrate --live mydomain destination.ournetwork.com +\end{verbatim} + +Without the \path{--live} flag, \xend\ simply stops the domain and +copies the memory image over to the new node and restarts it. Since +domains can have large allocations this can be quite time consuming, +even on a Gigabit network. With the \path{--live} flag \xend\ attempts +to keep the domain running while the migration is in progress, +resulting in typical `downtimes' of just 60--300ms. + +For now it will be necessary to reconnect to the domain's console on +the new machine using the \path{xm console} command. If a migrated +domain has any open network connections then they will be preserved, +so SSH connections do not have this limitation. + + +\section{Managing Domain Memory} + +XenLinux domains have the ability to relinquish / reclaim machine +memory at the request of the administrator or the user of the domain. + +\subsection{Setting memory footprints from dom0} + +The machine administrator can request that a domain alter its memory +footprint using the \path{xm set-mem} command. For instance, we can +request that our example ttylinux domain reduce its memory footprint +to 32 megabytes. + +\begin{verbatim} +# xm set-mem ttylinux 32 +\end{verbatim} + +We can now see the result of this in the output of \path{xm list}: + +\begin{verbatim} +# xm list +Name Id Mem(MB) CPU State Time(s) Console +Domain-0 0 251 0 r---- 172.2 +ttylinux 5 31 0 -b--- 4.3 9605 +\end{verbatim} + +The domain has responded to the request by returning memory to Xen. We +can restore the domain to its original size using the command line: + +\begin{verbatim} +# xm set-mem ttylinux 64 +\end{verbatim} + +\subsection{Setting memory footprints from within a domain} + +The virtual file \path{/proc/xen/balloon} allows the owner of a domain +to adjust their own memory footprint. Reading the file (e.g.\ +\path{cat /proc/xen/balloon}) prints out the current memory footprint +of the domain. Writing the file (e.g.\ \path{echo new\_target > + /proc/xen/balloon}) requests that the kernel adjust the domain's +memory footprint to a new value. + +\subsection{Setting memory limits} + +Xen associates a memory size limit with each domain. By default, this +is the amount of memory the domain is originally started with, +preventing the domain from ever growing beyond this size. To permit a +domain to grow beyond its original allocation or to prevent a domain +you've shrunk from reclaiming the memory it relinquished, use the +\path{xm maxmem} command. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user/glossary.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/user/glossary.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,79 @@ +\chapter{Glossary of Terms} + +\begin{description} + +\item[Atropos] One of the CPU schedulers provided by Xen. Atropos + provides domains with absolute shares of the CPU, with timeliness + guarantees and a mechanism for sharing out `slack time'. + +\item[BVT] The BVT scheduler is used to give proportional fair shares + of the CPU to domains. + +\item[Exokernel] A minimal piece of privileged code, similar to a {\bf + microkernel} but providing a more `hardware-like' interface to the + tasks it manages. This is similar to a paravirtualising VMM like + {\bf Xen} but was designed as a new operating system structure, + rather than specifically to run multiple conventional OSs. + +\item[Domain] A domain is the execution context that contains a + running {\bf virtual machine}. The relationship between virtual + machines and domains on Xen is similar to that between programs and + processes in an operating system: a virtual machine is a persistent + entity that resides on disk (somewhat like a program). When it is + loaded for execution, it runs in a domain. Each domain has a {\bf + domain ID}. + +\item[Domain 0] The first domain to be started on a Xen machine. + Domain 0 is responsible for managing the system. + +\item[Domain ID] A unique identifier for a {\bf domain}, analogous to + a process ID in an operating system. + +\item[Full virtualisation] An approach to virtualisation which + requires no modifications to the hosted operating system, providing + the illusion of a complete system of real hardware devices. + +\item[Hypervisor] An alternative term for {\bf VMM}, used because it + means `beyond supervisor', since it is responsible for managing + multiple `supervisor' kernels. + +\item[Live migration] A technique for moving a running virtual machine + to another physical host, without stopping it or the services + running on it. + +\item[Microkernel] A small base of code running at the highest + hardware privilege level. A microkernel is responsible for sharing + CPU and memory (and sometimes other devices) between less privileged + tasks running on the system. This is similar to a VMM, particularly + a {\bf paravirtualising} VMM but typically addressing a different + problem space and providing different kind of interface. + +\item[NetBSD/Xen] A port of NetBSD to the Xen architecture. + +\item[Paravirtualisation] An approach to virtualisation which requires + modifications to the operating system in order to run in a virtual + machine. Xen uses paravirtualisation but preserves binary + compatibility for user space applications. + +\item[Shadow pagetables] A technique for hiding the layout of machine + memory from a virtual machine's operating system. Used in some {\bf + VMMs} to provide the illusion of contiguous physical memory, in + Xen this is used during {\bf live migration}. + +\item[Virtual Machine] The environment in which a hosted operating + system runs, providing the abstraction of a dedicated machine. A + virtual machine may be identical to the underlying hardware (as in + {\bf full virtualisation}, or it may differ, as in {\bf + paravirtualisation}). + +\item[VMM] Virtual Machine Monitor - the software that allows multiple + virtual machines to be multiplexed on a single physical machine. + +\item[Xen] Xen is a paravirtualising virtual machine monitor, + developed primarily by the Systems Research Group at the University + of Cambridge Computer Laboratory. + +\item[XenLinux] Official name for the port of the Linux kernel that + runs on Xen. + +\end{description} diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user/installation.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/user/installation.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,394 @@ +\chapter{Installation} + +The Xen distribution includes three main components: Xen itself, ports +of Linux 2.4 and 2.6 and NetBSD to run on Xen, and the userspace +tools required to manage a Xen-based system. This chapter describes +how to install the Xen~2.0 distribution from source. Alternatively, +there may be pre-built packages available as part of your operating +system distribution. + + +\section{Prerequisites} +\label{sec:prerequisites} + +The following is a full list of prerequisites. Items marked `$\dag$' +are required by the \xend\ control tools, and hence required if you +want to run more than one virtual machine; items marked `$*$' are only +required if you wish to build from source. +\begin{itemize} +\item A working Linux distribution using the GRUB bootloader and + running on a P6-class (or newer) CPU. +\item [$\dag$] The \path{iproute2} package. +\item [$\dag$] The Linux bridge-utils\footnote{Available from {\tt + http://bridge.sourceforge.net}} (e.g., \path{/sbin/brctl}) +\item [$\dag$] An installation of Twisted~v1.3 or + above\footnote{Available from {\tt http://www.twistedmatrix.com}}. + There may be a binary package available for your distribution; + alternatively it can be installed by running `{\sl make + install-twisted}' in the root of the Xen source tree. +\item [$*$] Build tools (gcc v3.2.x or v3.3.x, binutils, GNU make). +\item [$*$] Development installation of libcurl (e.g., libcurl-devel) +\item [$*$] Development installation of zlib (e.g., zlib-dev). +\item [$*$] Development installation of Python v2.2 or later (e.g., + python-dev). +\item [$*$] \LaTeX\ and transfig are required to build the + documentation. +\end{itemize} + +Once you have satisfied the relevant prerequisites, you can now +install either a binary or source distribution of Xen. + + +\section{Installing from Binary Tarball} + +Pre-built tarballs are available for download from the Xen download +page +\begin{quote} {\tt http://xen.sf.net} +\end{quote} + +Once you've downloaded the tarball, simply unpack and install: +\begin{verbatim} +# tar zxvf xen-2.0-install.tgz +# cd xen-2.0-install +# sh ./install.sh +\end{verbatim} + +Once you've installed the binaries you need to configure your system +as described in Section~\ref{s:configure}. + + +\section{Installing from Source} + +This section describes how to obtain, build, and install Xen from +source. + +\subsection{Obtaining the Source} + +The Xen source tree is available as either a compressed source tar +ball or as a clone of our master BitKeeper repository. + +\begin{description} +\item[Obtaining the Source Tarball]\mbox{} \\ + Stable versions (and daily snapshots) of the Xen source tree are + available as compressed tarballs from the Xen download page + \begin{quote} {\tt http://xen.sf.net} + \end{quote} + +\item[Using BitKeeper]\mbox{} \\ + If you wish to install Xen from a clone of our latest BitKeeper + repository then you will need to install the BitKeeper tools. + Download instructions for BitKeeper can be obtained by filling out + the form at: + \begin{quote} {\tt http://www.bitmover.com/cgi-bin/download.cgi} +\end{quote} +The public master BK repository for the 2.0 release lives at: +\begin{quote} {\tt bk://xen.bkbits.net/xen-2.0.bk} +\end{quote} +You can use BitKeeper to download it and keep it updated with the +latest features and fixes. + +Change to the directory in which you want to put the source code, then +run: +\begin{verbatim} +# bk clone bk://xen.bkbits.net/xen-2.0.bk +\end{verbatim} + +Under your current directory, a new directory named \path{xen-2.0.bk} +has been created, which contains all the source code for Xen, the OS +ports, and the control tools. You can update your repository with the +latest changes at any time by running: +\begin{verbatim} +# cd xen-2.0.bk # to change into the local repository +# bk pull # to update the repository +\end{verbatim} +\end{description} + +% \section{The distribution} +% +% The Xen source code repository is structured as follows: +% +% \begin{description} +% \item[\path{tools/}] Xen node controller daemon (Xend), command line +% tools, control libraries +% \item[\path{xen/}] The Xen VMM. +% \item[\path{linux-*-xen-sparse/}] Xen support for Linux. +% \item[\path{linux-*-patches/}] Experimental patches for Linux. +% \item[\path{netbsd-*-xen-sparse/}] Xen support for NetBSD. +% \item[\path{docs/}] Various documentation files for users and +% developers. +% \item[\path{extras/}] Bonus extras. +% \end{description} + +\subsection{Building from Source} + +The top-level Xen Makefile includes a target `world' that will do the +following: + +\begin{itemize} +\item Build Xen. +\item Build the control tools, including \xend. +\item Download (if necessary) and unpack the Linux 2.6 source code, + and patch it for use with Xen. +\item Build a Linux kernel to use in domain 0 and a smaller + unprivileged kernel, which can optionally be used for unprivileged + virtual machines. +\end{itemize} + +After the build has completed you should have a top-level directory +called \path{dist/} in which all resulting targets will be placed; of +particular interest are the two kernels XenLinux kernel images, one +with a `-xen0' extension which contains hardware device drivers and +drivers for Xen's virtual devices, and one with a `-xenU' extension +that just contains the virtual ones. These are found in +\path{dist/install/boot/} along with the image for Xen itself and the +configuration files used during the build. + +The NetBSD port can be built using: +\begin{quote} +\begin{verbatim} +# make netbsd20 +\end{verbatim} +\end{quote} +NetBSD port is built using a snapshot of the netbsd-2-0 cvs branch. +The snapshot is downloaded as part of the build process, if it is not +yet present in the \path{NETBSD\_SRC\_PATH} search path. The build +process also downloads a toolchain which includes all the tools +necessary to build the NetBSD kernel under Linux. + +To customize further the set of kernels built you need to edit the +top-level Makefile. Look for the line: + +\begin{quote} +\begin{verbatim} +KERNELS ?= mk.linux-2.6-xen0 mk.linux-2.6-xenU +\end{verbatim} +\end{quote} + +You can edit this line to include any set of operating system kernels +which have configurations in the top-level \path{buildconfigs/} +directory, for example \path{mk.linux-2.4-xenU} to build a Linux 2.4 +kernel containing only virtual device drivers. + +%% Inspect the Makefile if you want to see what goes on during a +%% build. Building Xen and the tools is straightforward, but XenLinux +%% is more complicated. The makefile needs a `pristine' Linux kernel +%% tree to which it will then add the Xen architecture files. You can +%% tell the makefile the location of the appropriate Linux compressed +%% tar file by +%% setting the LINUX\_SRC environment variable, e.g. \\ +%% \verb!# LINUX_SRC=/tmp/linux-2.6.11.tar.bz2 make world! \\ or by +%% placing the tar file somewhere in the search path of {\tt +%% LINUX\_SRC\_PATH} which defaults to `{\tt .:..}'. If the +%% makefile can't find a suitable kernel tar file it attempts to +%% download it from kernel.org (this won't work if you're behind a +%% firewall). + +%% After untaring the pristine kernel tree, the makefile uses the {\tt +%% mkbuildtree} script to add the Xen patches to the kernel. + + +%% The procedure is similar to build the Linux 2.4 port: \\ +%% \verb!# LINUX_SRC=/path/to/linux2.4/source make linux24! + + +%% \framebox{\parbox{5in}{ +%% {\bf Distro specific:} \\ +%% {\it Gentoo} --- if not using udev (most installations, +%% currently), you'll need to enable devfs and devfs mount at boot +%% time in the xen0 config. }} + +\subsection{Custom XenLinux Builds} + +% If you have an SMP machine you may wish to give the {\tt '-j4'} +% argument to make to get a parallel build. + +If you wish to build a customized XenLinux kernel (e.g. to support +additional devices or enable distribution-required features), you can +use the standard Linux configuration mechanisms, specifying that the +architecture being built for is \path{xen}, e.g: +\begin{quote} +\begin{verbatim} +# cd linux-2.6.11-xen0 +# make ARCH=xen xconfig +# cd .. +# make +\end{verbatim} +\end{quote} + +You can also copy an existing Linux configuration (\path{.config}) +into \path{linux-2.6.11-xen0} and execute: +\begin{quote} +\begin{verbatim} +# make ARCH=xen oldconfig +\end{verbatim} +\end{quote} + +You may be prompted with some Xen-specific options; we advise +accepting the defaults for these options. + +Note that the only difference between the two types of Linux kernel +that are built is the configuration file used for each. The `U' +suffixed (unprivileged) versions don't contain any of the physical +hardware device drivers, leading to a 30\% reduction in size; hence +you may prefer these for your non-privileged domains. The `0' +suffixed privileged versions can be used to boot the system, as well +as in driver domains and unprivileged domains. + +\subsection{Installing the Binaries} + +The files produced by the build process are stored under the +\path{dist/install/} directory. To install them in their default +locations, do: +\begin{quote} +\begin{verbatim} +# make install +\end{verbatim} +\end{quote} + +Alternatively, users with special installation requirements may wish +to install them manually by copying the files to their appropriate +destinations. + +%% Files in \path{install/boot/} include: +%% \begin{itemize} +%% \item \path{install/boot/xen-2.0.gz} Link to the Xen 'kernel' +%% \item \path{install/boot/vmlinuz-2.6-xen0} Link to domain 0 +%% XenLinux kernel +%% \item \path{install/boot/vmlinuz-2.6-xenU} Link to unprivileged +%% XenLinux kernel +%% \end{itemize} + +The \path{dist/install/boot} directory will also contain the config +files used for building the XenLinux kernels, and also versions of Xen +and XenLinux kernels that contain debug symbols (\path{xen-syms-2.0.6} +and \path{vmlinux-syms-2.6.11.11-xen0}) which are essential for +interpreting crash dumps. Retain these files as the developers may +wish to see them if you post on the mailing list. + + +\section{Configuration} +\label{s:configure} + +Once you have built and installed the Xen distribution, it is simple +to prepare the machine for booting and running Xen. + +\subsection{GRUB Configuration} + +An entry should be added to \path{grub.conf} (often found under +\path{/boot/} or \path{/boot/grub/}) to allow Xen / XenLinux to boot. +This file is sometimes called \path{menu.lst}, depending on your +distribution. The entry should look something like the following: + +{\small +\begin{verbatim} +title Xen 2.0 / XenLinux 2.6 + kernel /boot/xen-2.0.gz dom0_mem=131072 + module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro console=tty0 +\end{verbatim} +} + +The kernel line tells GRUB where to find Xen itself and what boot +parameters should be passed to it (in this case, setting domain 0's +memory allocation in kilobytes and the settings for the serial port). +For more details on the various Xen boot parameters see +Section~\ref{s:xboot}. + +The module line of the configuration describes the location of the +XenLinux kernel that Xen should start and the parameters that should +be passed to it (these are standard Linux parameters, identifying the +root device and specifying it be initially mounted read only and +instructing that console output be sent to the screen). Some +distributions such as SuSE do not require the \path{ro} parameter. + +%% \framebox{\parbox{5in}{ +%% {\bf Distro specific:} \\ +%% {\it SuSE} --- Omit the {\tt ro} option from the XenLinux +%% kernel command line, since the partition won't be remounted rw +%% during boot. }} + + +If you want to use an initrd, just add another \path{module} line to +the configuration, as usual: + +{\small +\begin{verbatim} + module /boot/my_initrd.gz +\end{verbatim} +} + +As always when installing a new kernel, it is recommended that you do +not delete existing menu options from \path{menu.lst} --- you may want +to boot your old Linux kernel in future, particularly if you have +problems. + +\subsection{Serial Console (optional)} + +%% kernel /boot/xen-2.0.gz dom0_mem=131072 com1=115200,8n1 +%% module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro + + +In order to configure Xen serial console output, it is necessary to +add an boot option to your GRUB config; e.g.\ replace the above kernel +line with: +\begin{quote} +{\small +\begin{verbatim} + kernel /boot/xen.gz dom0_mem=131072 com1=115200,8n1 +\end{verbatim}} +\end{quote} + +This configures Xen to output on COM1 at 115,200 baud, 8 data bits, 1 +stop bit and no parity. Modify these parameters for your set up. + +One can also configure XenLinux to share the serial console; to +achieve this append ``\path{console=ttyS0}'' to your module line. + +If you wish to be able to log in over the XenLinux serial console it +is necessary to add a line into \path{/etc/inittab}, just as per +regular Linux. Simply add the line: +\begin{quote} {\small {\tt c:2345:respawn:/sbin/mingetty ttyS0}} +\end{quote} + +and you should be able to log in. Note that to successfully log in as +root over the serial line will require adding \path{ttyS0} to +\path{/etc/securetty} in most modern distributions. + +\subsection{TLS Libraries} + +Users of the XenLinux 2.6 kernel should disable Thread Local Storage +(e.g.\ by doing a \path{mv /lib/tls /lib/tls.disabled}) before +attempting to run with a XenLinux kernel\footnote{If you boot without + first disabling TLS, you will get a warning message during the boot + process. In this case, simply perform the rename after the machine + is up and then run \texttt{/sbin/ldconfig} to make it take effect.}. +You can always reenable it by restoring the directory to its original +location (i.e.\ \path{mv /lib/tls.disabled /lib/tls}). + +The reason for this is that the current TLS implementation uses +segmentation in a way that is not permissible under Xen. If TLS is +not disabled, an emulation mode is used within Xen which reduces +performance substantially. + +We hope that this issue can be resolved by working with Linux +distribution vendors to implement a minor backward-compatible change +to the TLS library. + + +\section{Booting Xen} + +It should now be possible to restart the system and use Xen. Reboot +as usual but choose the new Xen option when the Grub screen appears. + +What follows should look much like a conventional Linux boot. The +first portion of the output comes from Xen itself, supplying low level +information about itself and the machine it is running on. The +following portion of the output comes from XenLinux. + +You may see some errors during the XenLinux boot. These are not +necessarily anything to worry about --- they may result from kernel +configuration differences between your XenLinux kernel and the one you +usually use. + +When the boot completes, you should be able to log into your system as +usual. If you are unable to log in to your system running Xen, you +should still be able to reboot with your normal Linux kernel. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user/introduction.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/user/introduction.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,143 @@ +\chapter{Introduction} + + +Xen is a \emph{paravirtualising} virtual machine monitor (VMM), or +`hypervisor', for the x86 processor architecture. Xen can securely +execute multiple virtual machines on a single physical system with +close-to-native performance. The virtual machine technology +facilitates enterprise-grade functionality, including: + +\begin{itemize} +\item Virtual machines with performance close to native hardware. +\item Live migration of running virtual machines between physical + hosts. +\item Excellent hardware support (supports most Linux device drivers). +\item Sandboxed, re-startable device drivers. +\end{itemize} + +Paravirtualisation permits very high performance virtualisation, even +on architectures like x86 that are traditionally very hard to +virtualise. + +The drawback of this approach is that it requires operating systems to +be \emph{ported} to run on Xen. Porting an OS to run on Xen is +similar to supporting a new hardware platform, however the process is +simplified because the paravirtual machine architecture is very +similar to the underlying native hardware. Even though operating +system kernels must explicitly support Xen, a key feature is that user +space applications and libraries \emph{do not} require modification. + +Xen support is available for increasingly many operating systems: +right now, Linux 2.4, Linux 2.6 and NetBSD are available for Xen 2.0. +A FreeBSD port is undergoing testing and will be incorporated into the +release soon. Other OS ports, including Plan 9, are in progress. We +hope that that arch-xen patches will be incorporated into the +mainstream releases of these operating systems in due course (as has +already happened for NetBSD). + +Possible usage scenarios for Xen include: + +\begin{description} +\item [Kernel development.] Test and debug kernel modifications in a + sandboxed virtual machine --- no need for a separate test machine. +\item [Multiple OS configurations.] Run multiple operating systems + simultaneously, for instance for compatibility or QA purposes. +\item [Server consolidation.] Move multiple servers onto a single + physical host with performance and fault isolation provided at + virtual machine boundaries. +\item [Cluster computing.] Management at VM granularity provides more + flexibility than separately managing each physical host, but better + control and isolation than single-system image solutions, + particularly by using live migration for load balancing. +\item [Hardware support for custom OSes.] Allow development of new + OSes while benefiting from the wide-ranging hardware support of + existing OSes such as Linux. +\end{description} + + +\section{Structure of a Xen-Based System} + +A Xen system has multiple layers, the lowest and most privileged of +which is Xen itself. + +Xen in turn may host multiple \emph{guest} operating systems, each of +which is executed within a secure virtual machine (in Xen terminology, +a \emph{domain}). Domains are scheduled by Xen to make effective use +of the available physical CPUs. Each guest OS manages its own +applications, which includes responsibility for scheduling each +application within the time allotted to the VM by Xen. + +The first domain, \emph{domain 0}, is created automatically when the +system boots and has special management privileges. Domain 0 builds +other domains and manages their virtual devices. It also performs +administrative tasks such as suspending, resuming and migrating other +virtual machines. + +Within domain 0, a process called \emph{xend} runs to manage the +system. \Xend is responsible for managing virtual machines and +providing access to their consoles. Commands are issued to \xend over +an HTTP interface, either from a command-line tool or from a web +browser. + + +\section{Hardware Support} + +Xen currently runs only on the x86 architecture, requiring a `P6' or +newer processor (e.g. Pentium Pro, Celeron, Pentium II, Pentium III, +Pentium IV, Xeon, AMD Athlon, AMD Duron). Multiprocessor machines are +supported, and we also have basic support for HyperThreading (SMT), +although this remains a topic for ongoing research. A port +specifically for x86/64 is in progress, although Xen already runs on +such systems in 32-bit legacy mode. In addition a port to the IA64 +architecture is approaching completion. We hope to add other +architectures such as PPC and ARM in due course. + +Xen can currently use up to 4GB of memory. It is possible for x86 +machines to address up to 64GB of physical memory but there are no +current plans to support these systems: The x86/64 port is the planned +route to supporting larger memory sizes. + +Xen offloads most of the hardware support issues to the guest OS +running in Domain~0. Xen itself contains only the code required to +detect and start secondary processors, set up interrupt routing, and +perform PCI bus enumeration. Device drivers run within a privileged +guest OS rather than within Xen itself. This approach provides +compatibility with the majority of device hardware supported by Linux. +The default XenLinux build contains support for relatively modern +server-class network and disk hardware, but you can add support for +other hardware by configuring your XenLinux kernel in the normal way. + + +\section{History} + +Xen was originally developed by the Systems Research Group at the +University of Cambridge Computer Laboratory as part of the XenoServers +project, funded by the UK-EPSRC. + +XenoServers aim to provide a `public infrastructure for global +distributed computing', and Xen plays a key part in that, allowing us +to efficiently partition a single machine to enable multiple +independent clients to run their operating systems and applications in +an environment providing protection, resource isolation and +accounting. The project web page contains further information along +with pointers to papers and technical reports: +\path{http://www.cl.cam.ac.uk/xeno} + +Xen has since grown into a fully-fledged project in its own right, +enabling us to investigate interesting research issues regarding the +best techniques for virtualising resources such as the CPU, memory, +disk and network. The project has been bolstered by support from +Intel Research Cambridge, and HP Labs, who are now working closely +with us. + +Xen was first described in a paper presented at SOSP in +2003\footnote{\tt + http://www.cl.cam.ac.uk/netos/papers/2003-xensosp.pdf}, and the +first public release (1.0) was made that October. Since then, Xen has +significantly matured and is now used in production scenarios on many +sites. + +Xen 2.0 features greatly enhanced hardware support, configuration +flexibility, usability and a larger complement of supported operating +systems. This latest release takes Xen a step closer to becoming the +definitive open source solution for virtualisation. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user/redhat.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/user/redhat.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,61 @@ +\chapter{Installing Xen / XenLinux on Red~Hat or Fedora Core} + +When using Xen / XenLinux on a standard Linux distribution there are a +couple of things to watch out for: + +Note that, because domains greater than 0 don't have any privileged +access at all, certain commands in the default boot sequence will fail +e.g.\ attempts to update the hwclock, change the console font, update +the keytable map, start apmd (power management), or gpm (mouse +cursor). Either ignore the errors (they should be harmless), or +remove them from the startup scripts. Deleting the following links +are a good start: {\path{S24pcmcia}}, {\path{S09isdn}}, +{\path{S17keytable}}, {\path{S26apmd}}, {\path{S85gpm}}. + +If you want to use a single root file system that works cleanly for +both domain~0 and unprivileged domains, a useful trick is to use +different `init' run levels. For example, use run level 3 for +domain~0, and run level 4 for other domains. This enables different +startup scripts to be run in depending on the run level number passed +on the kernel command line. + +If using NFS root files systems mounted either from an external server +or from domain0 there are a couple of other gotchas. The default +{\path{/etc/sysconfig/iptables}} rules block NFS, so part way through +the boot sequence things will suddenly go dead. + +If you're planning on having a separate NFS {\path{/usr}} partition, +the RH9 boot scripts don't make life easy - they attempt to mount NFS +file systems way to late in the boot process. The easiest way I found +to do this was to have a {\path{/linuxrc}} script run ahead of +{\path{/sbin/init}} that mounts {\path{/usr}}: + +\begin{quote} + \begin{small}\begin{verbatim} + #!/bin/bash + /sbin/ipconfig lo 127.0.0.1 + /sbin/portmap + /bin/mount /usr + exec /sbin/init "$@" <>/dev/console 2>&1 +\end{verbatim}\end{small} +\end{quote} + +%% $ XXX SMH: font lock fix :-) + +The one slight complication with the above is that +{\path{/sbin/portmap}} is dynamically linked against +{\path{/usr/lib/libwrap.so.0}} Since this is in {\path{/usr}}, it +won't work. This can be solved by copying the file (and link) below +the {\path{/usr}} mount point, and just let the file be `covered' when +the mount happens. + +In some installations, where a shared read-only {\path{/usr}} is being +used, it may be desirable to move other large directories over into +the read-only {\path{/usr}}. For example, you might replace +{\path{/bin}}, {\path{/lib}} and {\path{/sbin}} with links into +{\path{/usr/root/bin}}, {\path{/usr/root/lib}} and +{\path{/usr/root/sbin}} respectively. This creates other problems for +running the {\path{/linuxrc}} script, requiring bash, portmap, mount, +ifconfig, and a handful of other shared libraries to be copied below +the mount point --- a simple statically-linked C program would solve +this problem. diff -r a6b72464a042 -r f8e7af29daa1 docs/src/user/start_addl_dom.tex --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/docs/src/user/start_addl_dom.tex Tue Sep 20 09:43:46 2005 @@ -0,0 +1,172 @@ +\chapter{Starting Additional Domains} + +The first step in creating a new domain is to prepare a root +filesystem for it to boot from. Typically, this might be stored in a +normal partition, an LVM or other volume manager partition, a disk +file or on an NFS server. A simple way to do this is simply to boot +from your standard OS install CD and install the distribution into +another partition on your hard drive. + +To start the \xend\ control daemon, type +\begin{quote} + \verb!# xend start! +\end{quote} + +If you wish the daemon to start automatically, see the instructions in +Section~\ref{s:xend}. Once the daemon is running, you can use the +\path{xm} tool to monitor and maintain the domains running on your +system. This chapter provides only a brief tutorial. We provide full +details of the \path{xm} tool in the next chapter. + +% \section{From the web interface} +% +% Boot the Xen machine and start Xensv (see Chapter~\ref{cha:xensv} +% for more details) using the command: \\ +% \verb_# xensv start_ \\ +% This will also start Xend (see Chapter~\ref{cha:xend} for more +% information). +% +% The domain management interface will then be available at {\tt +% http://your\_machine:8080/}. This provides a user friendly wizard +% for starting domains and functions for managing running domains. +% +% \section{From the command line} + + +\section{Creating a Domain Configuration File} + +Before you can start an additional domain, you must create a +configuration file. We provide two example files which you can use as +a starting point: +\begin{itemize} +\item \path{/etc/xen/xmexample1} is a simple template configuration + file for describing a single VM. + +\item \path{/etc/xen/xmexample2} file is a template description that + is intended to be reused for multiple virtual machines. Setting the + value of the \path{vmid} variable on the \path{xm} command line + fills in parts of this template. +\end{itemize} + +Copy one of these files and edit it as appropriate. Typical values +you may wish to edit include: + +\begin{quote} +\begin{description} +\item[kernel] Set this to the path of the kernel you compiled for use + with Xen (e.g.\ \path{kernel = `/boot/vmlinuz-2.6-xenU'}) +\item[memory] Set this to the size of the domain's memory in megabytes + (e.g.\ \path{memory = 64}) +\item[disk] Set the first entry in this list to calculate the offset + of the domain's root partition, based on the domain ID. Set the + second to the location of \path{/usr} if you are sharing it between + domains (e.g.\ \path{disk = [`phy:your\_hard\_drive\%d,sda1,w' \% + (base\_partition\_number + vmid), + `phy:your\_usr\_partition,sda6,r' ]} +\item[dhcp] Uncomment the dhcp variable, so that the domain will + receive its IP address from a DHCP server (e.g.\ \path{dhcp=`dhcp'}) +\end{description} +\end{quote} + +You may also want to edit the {\bf vif} variable in order to choose +the MAC address of the virtual ethernet interface yourself. For +example: +\begin{quote} +\verb_vif = [`mac=00:06:AA:F6:BB:B3']_ +\end{quote} +If you do not set this variable, \xend\ will automatically generate a +random MAC address from an unused range. + + +\section{Booting the Domain} + +The \path{xm} tool provides a variety of commands for managing +domains. Use the \path{create} command to start new domains. Assuming +you've created a configuration file \path{myvmconf} based around +\path{/etc/xen/xmexample2}, to start a domain with virtual machine +ID~1 you should type: + +\begin{quote} +\begin{verbatim} +# xm create -c myvmconf vmid=1 +\end{verbatim} +\end{quote} + +The \path{-c} switch causes \path{xm} to turn into the domain's +console after creation. The \path{vmid=1} sets the \path{vmid} +variable used in the \path{myvmconf} file. + +You should see the console boot messages from the new domain appearing +in the terminal in which you typed the command, culminating in a login +prompt. + + +\section{Example: ttylinux} + +Ttylinux is a very small Linux distribution, designed to require very +few resources. We will use it as a concrete example of how to start a +Xen domain. Most users will probably want to install a full-featured +distribution once they have mastered the basics\footnote{ttylinux is + maintained by Pascal Schmidt. You can download source packages from + the distribution's home page: {\tt + http://www.minimalinux.org/ttylinux/}}. + +\begin{enumerate} +\item Download and extract the ttylinux disk image from the Files + section of the project's SourceForge site (see + \path{http://sf.net/projects/xen/}). +\item Create a configuration file like the following: +\begin{verbatim} +kernel = "/boot/vmlinuz-2.6-xenU" +memory = 64 +name = "ttylinux" +nics = 1 +ip = "1.2.3.4" +disk = ['file:/path/to/ttylinux/rootfs,sda1,w'] +root = "/dev/sda1 ro" +\end{verbatim} +\item Now start the domain and connect to its console: +\begin{verbatim} +xm create configfile -c +\end{verbatim} +\item Login as root, password root. +\end{enumerate} + + +\section{Starting / Stopping Domains Automatically} + +It is possible to have certain domains start automatically at boot +time and to have dom0 wait for all running domains to shutdown before +it shuts down the system. + +To specify a domain is to start at boot-time, place its configuration +file (or a link to it) under \path{/etc/xen/auto/}. + +A Sys-V style init script for Red Hat and LSB-compliant systems is +provided and will be automatically copied to \path{/etc/init.d/} +during install. You can then enable it in the appropriate way for +your distribution. + +For instance, on Red Hat: + +\begin{quote} + \verb_# chkconfig --add xendomains_ +\end{quote} + +By default, this will start the boot-time domains in runlevels 3, 4 +and 5. + +You can also use the \path{service} command to run this script +manually, e.g: + +\begin{quote} + \verb_# service xendomains start_ + + Starts all the domains with config files under /etc/xen/auto/. +\end{quote} + +\begin{quote} + \verb_# service xendomains stop_ + + Shuts down ALL running Xen domains. +\end{quote} diff -r a6b72464a042 -r f8e7af29daa1 tools/firmware/vmxassist/acpi_madt.c --- /dev/null Tue Sep 20 09:43:29 2005 +++ b/tools/firmware/vmxassist/acpi_madt.c Tue Sep 20 09:43:46 2005 @@ -0,0 +1,145 @@ +/* + * acpi_madt.c: Update ACPI MADT table for multiple processor guest. + * + * Yu Ke, ke.yu@xxxxxxxxx + * Copyright (c) 2005, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ +#include "../acpi/acpi2_0.h" +#include "../acpi/acpi_madt.h" + +#define NULL ((void*)0) + +extern int puts(const char *s); + +#define VCPU_MAGIC 0x76637075 /* "vcpu" */ + +/* xc_vmx_builder wrote vcpu block at 0x9F800. Return it. */ +static int +get_vcpus(void) +{ + unsigned long *vcpus; + + vcpus = (unsigned long *)0x9F800; + if (vcpus[0] != VCPU_MAGIC) { + puts("Bad vcpus magic, set vcpu number=1\n"); + return 1; + } + + return vcpus[1]; +} + +static void * +acpi_madt_get_madt(unsigned char *acpi_start) +{ + ACPI_2_0_RSDP *rsdp=NULL; + ACPI_2_0_RSDT *rsdt=NULL; + ACPI_MULTIPLE_APIC_DESCRIPTION_TABLE *madt; + + rsdp = (ACPI_2_0_RSDP *)(acpi_start + sizeof(ACPI_2_0_FACS)); + if (rsdp->Signature != ACPI_2_0_RSDP_SIGNATURE) { + puts("Bad RSDP signature\n"); + return NULL; + } + + rsdt= (ACPI_2_0_RSDT *) + (acpi_start + rsdp->RsdtAddress - ACPI_PHYSICAL_ADDRESS); + if (rsdt->Header.Signature != ACPI_2_0_RSDT_SIGNATURE) { + puts("Bad RSDT signature\n"); + return NULL; + } + + madt = (ACPI_MULTIPLE_APIC_DESCRIPTION_TABLE *) + ( acpi_start+ rsdt->Entry[1] - ACPI_PHYSICAL_ADDRESS); + if (madt->Header.Header.Signature != + ACPI_MULTIPLE_APIC_DESCRIPTION_TABLE_SIGNATURE) { + puts("Bad MADT signature \n"); + return NULL; + } + + return madt; +} + +static void +set_checksum(void *start, int checksum_offset, int len) +{ + unsigned char sum = 0; + unsigned char *ptr; + + ptr = start; + ptr[checksum_offset] = 0; + while (len--) + sum += *ptr++; + + ptr = start; + ptr[checksum_offset] = -sum; +} + +static int +acpi_madt_set_local_apics( + int nr_vcpu, + ACPI_MULTIPLE_APIC_DESCRIPTION_TABLE *madt) +{ + int i; + + if ((nr_vcpu > MAX_VIRT_CPUS) || (nr_vcpu < 0) || !madt) + return -1; + + for (i = 0; i < nr_vcpu; i++) { + madt->LocalApic[i].Type = ACPI_PROCESSOR_LOCAL_APIC; + madt->LocalApic[i].Length = sizeof (ACPI_LOCAL_APIC_STRUCTURE); + madt->LocalApic[i].AcpiProcessorId = i; + madt->LocalApic[i].ApicId = i; + madt->LocalApic[i].Flags = 1; + } + + madt->Header.Header.Length = + sizeof(ACPI_MULTIPLE_APIC_DESCRIPTION_TABLE) - + (MAX_VIRT_CPUS - nr_vcpu)* sizeof(ACPI_LOCAL_APIC_STRUCTURE); + + return 0; +} + +#define FIELD_OFFSET(TYPE,Field) ((unsigned int)(&(((TYPE *) 0)->Field))) + +int acpi_madt_update(unsigned char *acpi_start) +{ + int rc; + ACPI_MULTIPLE_APIC_DESCRIPTION_TABLE *madt; + + madt = acpi_madt_get_madt(acpi_start); + if (!madt) + return -1; + + rc = acpi_madt_set_local_apics(get_vcpus(), madt); + if (rc != 0) + return rc; + + set_checksum( + madt, FIELD_OFFSET(ACPI_TABLE_HEADER, Checksum), + madt->Header.Header.Length); + + return 0; +} + +/* + * Local variables: + * c-file-style: "linux" + * indent-tabs-mode: t + * c-indent-level: 8 + * c-basic-offset: 8 + * tab-width: 8 + * End: + */ _______________________________________________ Xen-changelog mailing list Xen-changelog@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-changelog
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |