[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3] docs: add PVH specification
El 20/09/14 a les 21.15, Konrad Rzeszutek Wilk ha escrit: > On Thu, Sep 18, 2014 at 07:19:24PM +0200, Roger Pau Monne wrote: >> Introduce a document that describes the interfaces used on PVH. This >> document has been designed from a guest OS point of view (i.e.: what a guest >> needs to do in order to support PVH). >> >> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> >> Acked-by: David Vrabel <david.vrabel@xxxxxxxxxx> >> Cc: Jan Beulich <JBeulich@xxxxxxxx> >> Cc: Mukesh Rathor <mukesh.rathor@xxxxxxxxxx> >> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> >> Cc: David Vrabel <david.vrabel@xxxxxxxxxx> >> --- >> The document is still far from complete IMHO, but it might be best to just >> commit what we currently have rather than wait for a full document. >> >> I will try to fill the gaps as I go implementing new features on FreeBSD. >> >> I've retained David's Ack from v2 in this version. >> --- >> docs/misc/pvh.markdown | 367 >> +++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 367 insertions(+) >> create mode 100644 docs/misc/pvh.markdown >> >> diff --git a/docs/misc/pvh.markdown b/docs/misc/pvh.markdown >> new file mode 100644 >> index 0000000..120ede7 >> --- /dev/null >> +++ b/docs/misc/pvh.markdown >> @@ -0,0 +1,367 @@ >> +# PVH Specification # >> + >> +## Rationale ## >> + >> +PVH is a new kind of guest that has been introduced on Xen 4.4 as a DomU, >> and >> +on Xen 4.5 as a Dom0. The aim of PVH is to make use of the hardware >> +virtualization extensions present in modern x86 CPUs in order to >> +improve performance. >> + >> +PVH is considered a mix between PV and HVM, and can be seen as a PV guest >> +that runs inside of an HVM container, or as a PVHVM guest without any >> emulated >> +devices. The design goal of PVH is to provide the best performance possible >> and >> +to reduce the amount of modifications needed for a guest OS to run in this >> mode >> +(compared to pure PV). >> + >> +This document tries to describe the interfaces used by PVH guests, focusing >> +on how an OS should make use of them in order to support PVH. >> + >> +## Early boot ## >> + >> +PVH guests use the PV boot mechanism, that means that the kernel is loaded >> and >> +directly launched by Xen (by jumping into the entry point). In order to do >> this >> +Xen ELF Notes need to be added to the guest kernel, so that they contain the >> +information needed by Xen. Here is an example of the ELF Notes added to the >> +FreeBSD amd64 kernel in order to boot as PVH: >> + >> + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "FreeBSD") >> + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz, >> __XSTRING(__FreeBSD_version)) >> + ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0") >> + ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, .quad, KERNBASE) >> + ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET, .quad, KERNBASE) >> + ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, .quad, xen_start) >> + ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .quad, hypercall_page) >> + ELFNOTE(Xen, XEN_ELFNOTE_HV_START_LOW, .quad, HYPERVISOR_VIRT_START) >> + ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .asciz, >> "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector") >> + ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "yes") >> + ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID, .long, PG_V, PG_V) >> + ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic") >> + ELFNOTE(Xen, XEN_ELFNOTE_SUSPEND_CANCEL, .long, 0) >> + ELFNOTE(Xen, XEN_ELFNOTE_BSD_SYMTAB, .asciz, "yes") >> + >> +On the linux side, the above can be found in `arch/x86/xen/xen-head.S`. > > s/linux/Linux/ Done. > >> + >> +It is important to highlight the following notes: >> + >> + * `XEN_ELFNOTE_ENTRY`: contains the virtual memory address of the kernel >> entry >> + point. >> + * `XEN_ELFNOTE_HYPERCALL_PAGE`: contains the virtual memory address of the >> + hypercal page inside of the guest kernel (this memory region will be >> filled >> + by Xen prior to booting). >> + * `XEN_ELFNOTE_FEATURES`: contains the list of features supported by the >> kernel. >> + In the example above the kernel is only able to boot as a PVH guest, but >> + those options can be mixed with the ones used by pure PV guests in >> order to >> + have a kernel that supports both PV and PVH (like Linux). The list of >> + options available can be found in the `features.h` public header. >> + > > > Note that 'hvm_callback_vector' is in XEN_ELFNOTE_FEATURES. Older hypervisor > will > balk at this being part of it, so it can also be put in > XEN_ELFNOTE_SUPPORTED_FEATURES which older hypervisors will ignore. Added to the XEN_ELFNOTE_FEATURES comment, thanks for the info. >> +Xen will jump into the kernel entry point defined in `XEN_ELFNOTE_ENTRY` >> with >> +paging enabled (either long mode or protected mode with paging turned on >> +depending on the kernel bitness) and some basic page tables setup. An >> important >> +distinction for a 64bit PVH is that it is launched at privilege level 0 as >> +opposed to a 64bit PV guest which is launched at privilege level 3. >> + >> +Also, the `rsi` (`esi` on 32bits) register is going to contain the virtual >> +memory address were Xen has placed the `start_info` structure. The `rsp` >> (`esp` >> +on 32bits) will point to the top of an initial single page stack, that can >> be >> +used by the guest kernel. The `start_info` structure contains all the info >> the >> +guest needs in order to initialize. More information about the contents can >> be >> +found on the `xen.h` public header. > > s/on/in/ >> + >> +### Initial amd64 control registers values ### >> + >> +Initial values for the control registers are set up by Xen before booting >> the >> +guest kernel. The guest kernel can expect to find the following features >> +enabled by Xen. >> + >> +`CR0` has the following bits set by Xen: >> + >> + * PE (bit 0): protected mode enable. >> + * ET (bit 4): 387 or newer processor. >> + * PG (bit 31): paging enabled. > > Also TS (at least that is what the Linux code says: > > /* Some of these are setup in 'secondary_startup_64'. The others: > * X86_CR0_TS, X86_CR0_PE, X86_CR0_ET are set by Xen for HVM guests > * (which PVH shared codepaths), while X86_CR0_PG is for PVH. */ > > Perhaps it is incorrect? I think this comment is outdated/incorrect. This is the CR0 value I see on a FreeBSD PVH start-of-day: 0x80000011 (PE, ET and PG bits set) > >> + >> +`CR4` has the following bits set by Xen: >> + >> + * PAE (bit 5): PAE enabled. >> + >> +And finally in `EFER` the following features are enabled: >> + >> + * LME (bit 8): Long mode enable. >> + * LMA (bit 10): Long mode active. >> + >> +At least the following flags in `EFER` are guaranteed to be disabled: >> + >> + * SCE (bit 0): System call extensions disabled. >> + * NXE (bit 11): No-Execute disabled. >> + >> +There's no guarantee about the state of the other bits in the `EFER` >> register. >> + >> +All the segments selectors are set with a flat base at zero. >> + >> +The `cs` segment selector attributes are set to 0x0a09b, which describes an >> +executable and readable code segment only accessible by the most privileged >> +level. The segment is also set as a 64-bit code segment (`L` flag set, `D` >> flag >> +unset). >> + >> +The remaining segment selectors (`ds`, `ss`, `es`, `fs` and `gs`) are all >> set >> +to the same values. The attributes are set to 0x0c093, which implies a read >> and >> +write data segment only accessible by the most privileged level. > > I think the SS, ES, FS, GS are set to the null selector in 64-bit mode. This is what I see when I dump the vcpu state of a PVH guest created with the -p option (so that the guest is never started): (XEN) CS: sel=0x0000, attr=0x0a09b, limit=0xffffffff, base=0x0000000000000000 (XEN) DS: sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000 (XEN) SS: sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000 (XEN) ES: sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000 (XEN) FS: sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000 (XEN) GS: sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000 Am I missing something? I don't see a difference between SS, ES, FS, GS and DS. In construct_vmcs on Xen we seem to set all the segments to the same values with the exception of CS attributes. >> + >> +The `FS.base` and `GS.base` MSRs are zeroed out. > > .. and 'KERNEL_GS.base' Done. >> + >> +The `IDT` and `GDT` are also zeroed, so the guest must be specially careful >> to >> +not trigger a fault until after they have been properly set. The way of >> setting >> +the IDT and the GDT is using the native instructions as would be done on >> bare >> +metal. >> + >> +The `RFLAGS` register is guaranteed to be clear when jumping into the kernel >> +entry point, with the exception of the reserved bit 1 set. [...] >> +## Interrupts ## >> + >> +All interrupts on PVH guests are routed over event channels, see >> +[Event Channel Internals][event_channels] for more detailed information >> about >> +event channels. In order to inject interrupts into the guest an IDT vector >> is >> +used. This is the same mechanism used on PVHVM guests, and allows having >> +per-cpu interrupts that can be used to deliver timers or IPIs. >> + >> +In order to register the callback IDT vector the `HVMOP_set_param` hypercall >> +is used with the following values: >> + >> + domid = DOMID_SELF >> + index = HVM_PARAM_CALLBACK_IRQ >> + value = (0x2 << 56) | vector_value > > And naturally the OS has to program the IDT for the 'vector_value' using > the baremetal mechanism. Added. [...] >> +## CPUID ## >> + >> +*TDOD*: describe which cpuid flags a guest should ignore and also which >> flags >> +describe features can be used. It would also be good to describe the set of >> +cpuid flags that will always be present when running as PVH. > > Perhaps start with: > The cpuid instruction that should be used is the normal 'cpuid', not > the emulated 'cpuid' that PV guests usually require. Done. > >> + >> +## Final notes ## >> + >> +All the other hardware functionality not described in this document should >> be >> +assumed to be performed in the same way as native. >> + >> +[event_channels]: http://wiki.xen.org/wiki/Event_Channel_Internals > > And with those changes: > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > >> -- >> 1.8.5.2 (Apple Git-48) >> > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |