|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] FreeBSD PVH guest support
On Mon, Oct 28, 2013 at 02:35:03PM +0100, Roger Pau Monné wrote:
> Hello,
>
> The Xen community is working on a new virtualization mode (or maybe I
> should say an extension of HVM) to be able to run PV guests inside HVM
> containers without requiring a device-model (Qemu). One of the
> advantages of this new virtualization mode is that now it is much more
> easier to port guests to run under it (as compared to pure PV guests).
>
> Given that FreeBSD already supports PVHVM, adding PVH support is quite
> easy, we only need some glue for the PV entry point and then support
> for diverging some early init functions (like fetching the e820 map or
> starting the APs).
>
> The attached patch contains all this changes, and allows a SMP FreeBSD
> guest to fully boot (and AFAIK work) under this new PVH mode. The patch
> can also be found on my git repo:
>
> git://xenbits.xen.org/people/royger/freebsd.git pvh_v2
Awesome! That is really fantastic!
>
> The patch touches quite a lot of the early init, so I've Cced the
> persons that maintain those areas, so they can review it.
>
> In order to test it, and since the PVH changes are not yet merged into
> upstream Xen, the use of a patched Xen is necessary. I've collected the
> patches for PVH guest support from George Dunlap (v13) and fixed some
> bugs on top of them, the tree can be found at:
>
> git://xenbits.xen.org/people/royger/xen.git fix_pvh
>
> For those curious, here is a dmesg of a FreeBSD PVH guest booting:
>
> GDB: no debug ports present
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> SMAP type=01 base=0000000000000000 len=0000000138800000
> ACPI BIOS Error (bug): A valid RSDP was not found (20130823/tbxfroot-223)
> APIC: Using the Xen PV enumerator.
> SMP: Added CPU 0 (BSP)
> SMP: Added CPU 2 (AP)
> SMP: Added CPU 4 (AP)
> SMP: Added CPU 6 (AP)
> SMP: Added CPU 8 (AP)
> SMP: Added CPU 10 (AP)
> SMP: Added CPU 12 (AP)
> Copyright (c) 1992-2013 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 11.0-CURRENT #420: Mon Oct 28 13:07:53 CET 2013
> root@odin:/usr/obj/usr/src/sys/GENERIC amd64
> FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
> WARNING: WITNESS option enabled, expect reduced performance.
> Hypervisor: Origin = "XenVMMXenVMM"
> Calibrating TSC clock ... TSC clock: 3066775691 Hz
> CPU: Intel(R) Xeon(R) CPU W3550 @ 3.07GHz (3066.78-MHz K8-class
> CPU)
> Origin = "GenuineIntel" Id = 0x106a5 Family = 0x6 Model = 0x1a Stepping
> = 5
>
> Features=0x1fc98b75<FPU,DE,TSC,MSR,PAE,CX8,APIC,SEP,CMOV,PAT,CLFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT>
> Features2=0x80982201<SSE3,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,HV>
> AMD Features=0x20100800<SYSCALL,NX,LM>
> AMD Features2=0x1<LAHF>
> real memory = 5242880000 (5000 MB)
> Physical memory chunk(s):
> 0x0000000000010000 - 0x00000000001fffff, 2031616 bytes (496 pages)
> 0x0000000002708000 - 0x0000000130864fff, 5068148736 bytes (1237341 pages)
> avail memory = 5035581440 (4802 MB)
> INTR: Adding local APIC 2 as a target
> INTR: Adding local APIC 4 as a target
> INTR: Adding local APIC 6 as a target
> INTR: Adding local APIC 8 as a target
> INTR: Adding local APIC 10 as a target
> INTR: Adding local APIC 12 as a target
> FreeBSD/SMP: Multiprocessor System Detected: 7 CPUs
> FreeBSD/SMP: 1 package(s) x 7 core(s)
> cpu0 (BSP): APIC ID: 0
> cpu1 (AP): APIC ID: 2
> cpu2 (AP): APIC ID: 4
> cpu3 (AP): APIC ID: 6
> cpu4 (AP): APIC ID: 8
> cpu5 (AP): APIC ID: 10
> cpu6 (AP): APIC ID: 12
> XEN: CPU 0 has VCPU ID 0
> XEN: CPU 1 has VCPU ID 1
> XEN: CPU 2 has VCPU ID 2
> XEN: CPU 3 has VCPU ID 3
> XEN: CPU 4 has VCPU ID 4
> XEN: CPU 5 has VCPU ID 5
> XEN: CPU 6 has VCPU ID 6
> x86bios: IVT 0x000000-0x0004ff at 0xfffff80000000000
> x86bios: SSEG 0x010000-0x010fff at 0xfffffe012e79d000
> x86bios: ROM 0x0a0000-0x0fefff at 0xfffff800000a0000
> random device not loaded; using insecure entropy
> ULE: setup cpu 0
> ULE: setup cpu 1
> ULE: setup cpu 2
> ULE: setup cpu 3
> ULE: setup cpu 4
> ULE: setup cpu 5
> ULE: setup cpu 6
> Event-channel device installed.
> snd_unit_init() u=0x00ff8000 [512] d=0x00007c00 [32] c=0x000003ff [1024]
> feeder_register: snd_unit=-1 snd_maxautovchans=16 latency=5 feeder_rate_min=1
> feeder_rate_max=2016000 feeder_rate_round=25
> wlan: <802.11 Link Layer>
> Hardware, VIA Nehemiah Padlock RNG: VIA Padlock RNG not present
> Hardware, Intel IvyBridge+ RNG: RDRAND is not present
> null: <null device, zero device>
> Falling back to <Software, Yarrow> random adaptor
> random: <Software, Yarrow> initialized
> nfslock: pseudo-device
> kbd0 at kbdmux0
> module_register_init: MOD_LOAD (vesa, 0xffffffff80d21c60, 0) error 19
> io: <I/O>
> VMBUS: load
> mem: <memory>
> hpt27xx: RocketRAID 27xx controller driver v1.1
> hptrr: RocketRAID 17xx/2xxx SATA controller driver v1.2
> hptnr: R750/DC7280 controller driver v1.0
> ACPI BIOS Error (bug): A valid RSDP was not found (20130823/tbxfroot-223)
> ACPI: Table initialisation failed: AE_NOT_FOUND
> ACPI: Try disabling either ACPI or apic support.
> xenstore0: <XenStore> on motherboard
> Grant table initialized
> xc0: <Xen Console> on motherboard
> xen_et0: <Xen PV Clock> on motherboard
> Event timer "XENTIMER" frequency 1000000000 Hz quality 950
> Timecounter "XENTIMER" frequency 1000000000 Hz quality 950
> xen_et0: registered as a time-of-day clock (resolution 10000000us, adjustment
> 5.000000000s)
> pvcpu0: <Xen PV CPU> on motherboard
> pvcpu1: <Xen PV CPU> on motherboard
> pvcpu2: <Xen PV CPU> on motherboard
> pvcpu3: <Xen PV CPU> on motherboard
> pvcpu4: <Xen PV CPU> on motherboard
> pvcpu5: <Xen PV CPU> on motherboard
> pvcpu6: <Xen PV CPU> on motherboard
> legacy_pcib_identify: no bridge found, adding pcib0 anyway
> pcib0 pcibus 0 on motherboard
> pci0: <PCI bus> on pcib0
> pci0: domain=0, physical bus=0
> cpu0 on motherboard
> cpu1 on motherboard
> cpu2 on motherboard
> cpu3 on motherboard
> cpu4 on motherboard
> cpu5 on motherboard
> cpu6 on motherboard
> isa0: <ISA bus> on motherboard
> qpi0: <QPI system bus> on motherboard
> ex_isa_identify()
> isa_probe_children: disabling PnP devices
> isa_probe_children: probing non-PnP devices
> fb: new array size 4
> sc0: <System console> on isa0
> sc0: MDA <16 virtual consoles, flags=0x100>
> sc0: fb0, kbd0, terminal emulator: scteken (teken terminal)
> vga0: <Generic ISA VGA> at port 0x3b0-0x3bb iomem 0xb0000-0xb7fff on isa0
> isa_probe_children: probing PnP devices
> Device configuration finished.
> procfs registered
> Timecounters tick every 1.000 msec
> vlan: initialized, using hash tables with chaining
> tcp_init: net.inet.tcp.tcbhashsize auto tuned to 65536
> lo0: bpf attached
> hpt27xx: no controller detected.
> hptrr: no controller detected.
> hptnr: no controller detected.
> xenbusb_front0: <Xen Frontend Devices> on xenstore0
> xenbusb_add_device: Device device/suspend/event-channel ignored. State 6
> xn0: <Virtual Network Interface> at device/vif/0 on xenbusb_front0
> xn0: bpf attached
> xn0: Ethernet address: 00:16:3e:0b:a4:b1
> xenbusb_back0: <Xen Backend Devices> on xenstore0
> xctrl0: <Xen Control Device> on xenstore0
> xn0: backend features: feature-sg feature-gso-tcp4
> xbd0: 20480MB <Virtual Block Device> at device/vbd/51712 on xenbusb_front0
> xbd0: features: flush, write_barrier
> xbd0: synchronize cache commands enabled.
> GEOM: new disk xbd0
> random: unblocking device.
> Netvsc initializing... SMP: AP CPU #5 Launched!
> SMP: AP CPU #2 Launched!
> SMP: AP CPU #1 Launched!
> SMP: AP CPU #3 Launched!
> SMP: AP CPU #6 Launched!
> SMP: AP CPU #4 Launched!
> TSC timecounter discards lower 1 bit(s)
> Timecounter "TSC-low" frequency 1533387845 Hz quality -100
> WARNING: WITNESS option enabled, expect reduced performance.
> Trying to mount root from ufs:/dev/xbd0p2 []...
> start_init: trying /sbin/init
> Setting hostuuid: c9230f36-1a54-489e-877c-1d15b8f463e9.
> Setting hostid: 0xd52252c7.
> ZFS filesystem version: 5
> ZFS storage pool version: features support (5000)
> Entropy harvesting: interrupts ethernet point_to_pointsha256: /kernel: No
> such file or directory
> kickstart.
> Starting file system checks:
> /dev/xbd0p2: FILE SYSTEM CLEAN; SKIPPING CHECKS
> /dev/xbd0p2: clean, 2213647 free (17111 frags, 274567 blocks, 0.4%
> fragmentation)
> Mounting local file systems:.
> Writing entropy file:.
> xn0: link state changed to DOWN
> xn0: link state changed to UP
> Starting Network: lo0 xn0.
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
> options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
> inet6 ::1 prefixlen 128
> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
> inet 127.0.0.1 netmask 0xff000000
> nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
> xn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> options=503<RXCSUM,TXCSUM,TSO4,LRO>
> ether 00:16:3e:0b:a4:b1
> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> media: Ethernet manual
> status: active
> Starting devd.
> Starting dhclient.
> DHCPDISCOVER on xn0 to 255.255.255.255 port 67 interval 7
> DHCPOFFER from 172.16.1.1
> DHCPREQUEST on xn0 to 255.255.255.255 port 67
> DHCPACK from 172.16.1.1
> bound to 172.16.1.149 -- renewal in 43200 seconds.
> add net ::ffff:0.0.0.0: gateway ::1
> add net ::0.0.0.0: gateway ::1
> add net fe80::: gateway ::1
> add net ff02::: gateway ::1
> ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib
> 32-bit compatibility ldconfig path: /usr/lib32
> Creating and/or trimming log files.
> Starting syslogd.
> No core dumps found.
> lock order reversal:
> 1st 0xfffffe012e861e28 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3050
> 2nd 0xfffff80005b87c00 dirhash (dirhash) @
> /usr/src/sys/ufs/ufs/ufs_dirhash.c:284
> KDB: stack backtrace:
> X_db_symbol_values() at X_db_symbol_values+0x10b/frame 0xfffffe012fb8c410
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe012fb8c4c0
> witness_checkorder() at witness_checkorder+0xd23/frame 0xfffffe012fb8c550
> _sx_xlock() at _sx_xlock+0x75/frame 0xfffffe012fb8c590
> ufsdirhash_add() at ufsdirhash_add+0x3b/frame 0xfffffe012fb8c5d0
> ufs_direnter() at ufs_direnter+0x688/frame 0xfffffe012fb8c690
> ufs_vinit() at ufs_vinit+0x33f3/frame 0xfffffe012fb8c890
> VOP_MKDIR_APV() at VOP_MKDIR_APV+0xf0/frame 0xfffffe012fb8c8c0
> kern_mkdirat() at kern_mkdirat+0x1ff/frame 0xfffffe012fb8cae0
> amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe012fb8cbf0
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe012fb8cbf0
> --- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x80092faaa, rsp =
> 0x7fffffffd788, rbp = 0x7fffffffdc70 ---
> Clearing /tmp (X related).
> Updating motd:.
> Configuring syscons: keymap blanktime.
> Performing sanity check on sshd configuration.
> Starting sshd.
> Starting cron.
> Starting background file system checks in 60 seconds.
>
> Mon Oct 28 13:22:52 CET 2013
>
> FreeBSD/amd64 (Amnesiac) (xc0)
> >From 16de1566ada65e5838105870df576ab8258ed8b6 Mon Sep 17 00:00:00 2001
> From: Roger Pau Monne <roger.pau@xxxxxxxxxx>
> Date: Mon, 14 Oct 2013 18:33:17 +0200
> Subject: [PATCH] Xen x86 PVH support
>
> This is still very experimental, and PVH support has not yet been
> merged into upstream Xen.
>
> PVH mode is basically a PV guest inside an HVM container, and shares
> a great amount of code with PVHVM. The main difference is the way the
> guest is started, PVH uses the PV start sequence, jumping directly
> into the kernel entry point in long mode and with page tables set.
> The main work of this patch consists in setting the environment as
> similar as possible to what native FreeBSD expects, and then adding
> hooks to the PV ops when necessary.
>
> sys/amd64/amd64/locore.S:
> * Add PV entry point, hypervisor_page and the necessary elfnotes.
>
> sys/amd64/amd64/machdep.c:
> * Add hooks to replace bare metal operations that should use a PV
> helper, this includes:
> - Preload metadata
> - i8254_init and i8254_delay
> - Fetching the e820 memory map
> - Reserve of the MP bootstrap region
>
> * Create a DELAY function that uses the PV hooks.
> * Introduce a new hammer_time_xen that sets the necessary stuff when
> running in PVH mode.
>
> sys/amd64/amd64/mp_machdep.c:
> * Introduce a hook to replace start_all_aps.
> * Introduce a lapic_disabled variable to prevent polluting the code
> with xen specific gates.
>
> sys/amd64/include/asmacros.h:
> * Copy the ELFNOTE macro from the i386 Xen PV port.
>
> sys/amd64/include/clock.h:
> sys/i386/include/clock.h:
> * Prototypes for the xen early delay initialization and usage.
>
> sys/amd64/include/cpu.h:
> * Introduce a new cpu hook to init APs.
>
> sys/amd64/include/sysarch.h:
> * Declare the init_ops structure.
>
> sys/amd64/include/xen/hypercall.h:
> sys/i386/include/xen/hypercall.h
> * Switch to the PV style hypercall mechanism for HVM also.
>
> sys/conf/files:
> * Make the PV console available on XENHVM also.
>
> sys/conf/files.amd64:
> * Include the new files for the PVH port.
>
> sys/dev/xen/console/console.c:
> sys/dev/xen/console/xencons_ring.c:
> * Gate the PV console attach so it is only used on PV ports.
> * Use HYPERVISOR_start_info instead of xen_start_info.
> * Use HYPERVISOR_event_channel_op to kick the event channel before
> xen interrupts are setup.
>
> sys/dev/xen/control/control.c:
> * Use the PV shutdown on PVH.
>
> sys/dev/xen/timer/timer.c:
> * Pass a vcpu_info to xen_fetch_vcpu_time, this allows using this
> function at very early init, before per-cpu vcpu_info is set.
> * Remove critical_{enter/exit} from xen_fetch_vcpu_time so it can be
> used at early boot, instead place them on the callers.
> * Introduce two new functions, xen_delay_init and xen_delay that can
> be used at early boot to implement the generic DELAY function.
>
> sys/i386/i386/locore.s:
> * Reserve space for the hypercall page.
>
> sys/i386/i386/machdep.c:
> * Create a generic DELAY function.
>
> sys/i386/xen/xen_machdep.c:
> * Set HYPERVISOR_start_info.
>
> sys/x86/isa/clock.c:
> * Rename the generic DELAY function to i8254_delay.
>
> sys/x86/x86/delay.c:
> * Put generic delay helpers here, get_tsc and delay_tc.
>
> sys/x86/x86/local_apic.c:
> * Prevent the local apic from attaching when running on PVH mode.
>
> sys/x86/xen/hvm.c:
> * Set the start_all_aps hook.
> * Fix the setting of the hypercall page now that we are using the
> same mechanism as the PV port.
> * Initialize Xen CPU hooks for the PVH port.
> * Introduce the xen_early_printf debug function, which prints
> directly to the hypervisor console.
>
> sys/x86/xen/mptable.c:
> * Create a dummy PV CPU enumerator for the PVH port.
>
> sys/x86/xen/pv.c:
> * Implement the PV functions for the early boot hooks,
> parse_preload_data and fetch_e820_map.
> * Implement the PV function for the start_all_aps hook.
>
> sys/x86/xen/pvcpu.c:
> * Dummy Xen PV CPU device, that we use to set the per-cpu pc_device.
>
> sys/xen/gnttab.c:
> * Allocate resume_frames for the PVH port.
>
> sys/xen/interface/arch-x86/xen.h:
> * Interface change for the PVH port (not used on FreeBSD).
>
> sys/xen/pv.h:
> * Header that exports the specific PV functions.
>
> sys/xen/xen-os.h:
> * Declare prototypes for the newly added functions.
>
> sys/xen/xenstore/xenstore.c:
> * Make the xenstore driver hang from both xenpci and the nexus when
> running XENHVM, this is because we don't have a xenpci device on
> the PVH port.
> * Gate xenstore addition to parent == xenpci on the HVM case.
> ---
> sys/amd64/amd64/locore.S | 53 ++++++++
> sys/amd64/amd64/machdep.c | 179 ++++++++++++++++++++++----
> sys/amd64/amd64/mp_machdep.c | 27 +++--
> sys/amd64/include/asmacros.h | 26 ++++
> sys/amd64/include/clock.h | 6 +
> sys/amd64/include/cpu.h | 1 +
> sys/amd64/include/sysarch.h | 19 +++
> sys/amd64/include/xen/hypercall.h | 7 -
> sys/conf/files | 4 +-
> sys/conf/files.amd64 | 4 +
> sys/conf/files.i386 | 1 +
> sys/dev/xen/console/console.c | 23 +++-
> sys/dev/xen/console/xencons_ring.c | 15 ++-
> sys/dev/xen/control/control.c | 37 +++---
> sys/dev/xen/timer/timer.c | 59 +++++++--
> sys/i386/i386/locore.s | 9 ++
> sys/i386/i386/machdep.c | 9 ++
> sys/i386/include/clock.h | 6 +
> sys/i386/include/xen/hypercall.h | 7 -
> sys/i386/xen/xen_machdep.c | 4 +-
> sys/x86/isa/clock.c | 53 +--------
> sys/x86/x86/delay.c | 95 ++++++++++++++
> sys/x86/x86/local_apic.c | 8 +-
> sys/x86/xen/hvm.c | 93 ++++++++++----
> sys/x86/xen/mptable.c | 136 ++++++++++++++++++++
> sys/x86/xen/pv.c | 247
> ++++++++++++++++++++++++++++++++++++
> sys/x86/xen/pvcpu.c | 98 ++++++++++++++
> sys/xen/gnttab.c | 21 +++-
> sys/xen/interface/arch-x86/xen.h | 11 ++-
> sys/xen/pv.h | 29 ++++
> sys/xen/xen-os.h | 8 +
> sys/xen/xenstore/xenstore.c | 32 ++++--
> 32 files changed, 1141 insertions(+), 186 deletions(-)
> create mode 100644 sys/x86/x86/delay.c
> create mode 100644 sys/x86/xen/mptable.c
> create mode 100644 sys/x86/xen/pv.c
> create mode 100644 sys/x86/xen/pvcpu.c
> create mode 100644 sys/xen/pv.h
>
> diff --git a/sys/amd64/amd64/locore.S b/sys/amd64/amd64/locore.S
> index 55cda3a..e04cc48 100644
> --- a/sys/amd64/amd64/locore.S
> +++ b/sys/amd64/amd64/locore.S
> @@ -31,6 +31,12 @@
> #include <machine/pmap.h>
> #include <machine/specialreg.h>
>
> +#ifdef XENHVM
> +#include <xen/xen-os.h>
> +#define __ASSEMBLY__
> +#include <xen/interface/elfnote.h>
> +#endif
> +
> #include "assym.s"
>
> /*
> @@ -86,3 +92,50 @@ NON_GPROF_ENTRY(btext)
> ALIGN_DATA /* just to be sure */
> .space 0x1000 /* space for bootstack - temporary
> stack */
> bootstack:
> +
> +#ifdef XENHVM
> +/* Xen */
> +.section __xen_guest
> + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "FreeBSD")
> + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz, "HEAD")
> + ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0")
> + ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, .quad, KERNBASE)
> + ELFNOTE(Xen, XEN_ELFNOTE_PADDR_OFFSET, .quad, KERNBASE) /* Xen
> honours elf->p_paddr; compensate for this */
> + ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, .quad, xen_start)
> + ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .quad, hypercall_page)
> + ELFNOTE(Xen, XEN_ELFNOTE_HV_START_LOW, .quad, HYPERVISOR_VIRT_START)
> + ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .asciz,
> "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector")
> + ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE, .asciz, "yes")
> + ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID, .long, PG_V, PG_V)
> + ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz, "generic")
> + ELFNOTE(Xen, XEN_ELFNOTE_SUSPEND_CANCEL, .long, 0)
> + ELFNOTE(Xen, XEN_ELFNOTE_BSD_SYMTAB, .asciz, "yes")
> +
> + .text
> +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */
> +
> +NON_GPROF_ENTRY(hypercall_page)
> + .skip 0x1000, 0x90 /* Fill with "nop"s */
> +
> +NON_GPROF_ENTRY(xen_start)
> + /* Don't trust what the loader gives for rflags. */
> + pushq $PSL_KERNEL
> + popfq
> +
> + /* Parameters for the xen init function */
> + movq %rsi, %rdi /* shared_info (arg 1) */
> + movq %rsp, %rsi /* xenstack (arg 2) */
> +
> + /* Use our own stack */
> + movq $bootstack,%rsp
> + xorl %ebp, %ebp
> +
> + /* u_int64_t hammer_time_xen(start_info_t *si, u_int64_t xenstack); */
> + call hammer_time_xen
> + movq %rax, %rsp /* set up kstack for mi_startup() */
> + call mi_startup /* autoconfiguration, mountroot etc */
> +
> + /* NOTREACHED */
> +0: hlt
> + jmp 0b
> +#endif
> diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
> index 2b2e47f..b649def 100644
> --- a/sys/amd64/amd64/machdep.c
> +++ b/sys/amd64/amd64/machdep.c
> @@ -127,6 +127,7 @@ __FBSDID("$FreeBSD$");
> #include <machine/reg.h>
> #include <machine/sigframe.h>
> #include <machine/specialreg.h>
> +#include <machine/sysarch.h>
> #ifdef PERFMON
> #include <machine/perfmon.h>
> #endif
> @@ -147,10 +148,20 @@ __FBSDID("$FreeBSD$");
> #include <isa/isareg.h>
> #include <isa/rtc.h>
>
> +#ifdef XENHVM
> +/* Xen */
> +#include <xen/xen-os.h>
> +#include <xen/hvm.h>
> +#include <xen/pv.h>
> +#endif
> +
> /* Sanity check for __curthread() */
> CTASSERT(offsetof(struct pcpu, pc_curthread) == 0);
>
> extern u_int64_t hammer_time(u_int64_t, u_int64_t);
> +#ifdef XENHVM
> +extern u_int64_t hammer_time_xen(start_info_t *, u_int64_t);
> +#endif
>
> extern void printcpuinfo(void); /* XXX header file */
> extern void identify_cpu(void);
> @@ -166,6 +177,23 @@ static int set_fpcontext(struct thread *td, const
> mcontext_t *mcp,
> char *xfpustate, size_t xfpustate_len);
> SYSINIT(cpu, SI_SUB_CPU, SI_ORDER_FIRST, cpu_startup, NULL);
>
> +/* Preload data parse function */
> +static caddr_t native_parse_preload_data(u_int64_t);
> +
> +/* Native function to fetch the e820 map */
> +static void native_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *);
> +
> +/* Default init_ops implementation. */
> +struct init_ops init_ops = {
> + .parse_preload_data = native_parse_preload_data,
> + .early_delay_init = i8254_init,
> + .early_delay = i8254_delay,
> + .fetch_e820_map = native_fetch_e820_map,
> +#ifdef SMP
> + .mp_bootaddress = mp_bootaddress,
> +#endif
> +};
> +
> /*
> * The file "conf/ldscript.amd64" defines the symbol "kernphys". Its value
> is
> * the physical address at which the kernel is loaded.
> @@ -216,6 +244,15 @@ struct mem_range_softc mem_range_softc;
>
> struct mtx dt_lock; /* lock for GDT and LDT */
>
> +void
> +DELAY(int n)
> +{
> + if (delay_tc(n))
> + return;
> +
> + init_ops.early_delay(n);
> +}
> +
> static void
> cpu_startup(dummy)
> void *dummy;
> @@ -1408,6 +1445,24 @@ add_smap_entry(struct bios_smap *smap, vm_paddr_t
> *physmap, int *physmap_idxp)
> return (1);
> }
>
> +static void
> +native_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size)
> +{
> + /*
> + * get memory map from INT 15:E820, kindly supplied by the
> + * loader.
> + *
> + * subr_module.c says:
> + * "Consumer may safely assume that size value precedes data."
> + * ie: an int32_t immediately precedes smap.
> + */
> + *smap = (struct bios_smap *)preload_search_info(kmdp,
> + MODINFO_METADATA | MODINFOMD_SMAP);
> + if (*smap == NULL)
> + panic("No BIOS smap info from loader!");
> + *size = *((u_int32_t *)*smap - 1);
> +}
> +
> /*
> * Populate the (physmap) array with base/bound pairs describing the
> * available physical memory in the system, then test this memory and
> @@ -1433,19 +1488,8 @@ getmemsize(caddr_t kmdp, u_int64_t first)
> basemem = 0;
> physmap_idx = 0;
>
> - /*
> - * get memory map from INT 15:E820, kindly supplied by the loader.
> - *
> - * subr_module.c says:
> - * "Consumer may safely assume that size value precedes data."
> - * ie: an int32_t immediately precedes smap.
> - */
> - smapbase = (struct bios_smap *)preload_search_info(kmdp,
> - MODINFO_METADATA | MODINFOMD_SMAP);
> - if (smapbase == NULL)
> - panic("No BIOS smap info from loader!");
> + init_ops.fetch_e820_map(kmdp, &smapbase, &smapsize);
>
> - smapsize = *((u_int32_t *)smapbase - 1);
> smapend = (struct bios_smap *)((uintptr_t)smapbase + smapsize);
>
> for (smap = smapbase; smap < smapend; smap++)
> @@ -1467,7 +1511,8 @@ getmemsize(caddr_t kmdp, u_int64_t first)
>
> #ifdef SMP
> /* make hole for AP bootstrap code */
> - physmap[1] = mp_bootaddress(physmap[1] / 1024);
> + if (init_ops.mp_bootaddress)
> + physmap[1] = init_ops.mp_bootaddress(physmap[1] / 1024);
> #endif
>
> /*
> @@ -1681,6 +1726,98 @@ do_next:
> msgbufp = (struct msgbuf *)PHYS_TO_DMAP(phys_avail[pa_indx]);
> }
>
> +static caddr_t
> +native_parse_preload_data(u_int64_t modulep)
> +{
> + caddr_t kmdp;
> +
> + preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE);
> + preload_bootstrap_relocate(KERNBASE);
> + kmdp = preload_search_by_type("elf kernel");
> + if (kmdp == NULL)
> + kmdp = preload_search_by_type("elf64 kernel");
> + boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int);
> + kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE;
> +#ifdef DDB
> + ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t);
> + ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t);
> +#endif
> +
> + return (kmdp);
> +}
> +
> +#ifdef XENHVM
> +/*
> + * First function called by the Xen PVH boot sequence.
> + *
> + * Set some Xen global variables and prepare the environment so it is
> + * as similar as possible to what native FreeBSD init function expects.
> + */
> +u_int64_t
> +hammer_time_xen(start_info_t *si, u_int64_t xenstack)
> +{
> + u_int64_t physfree;
> + u_int64_t *PT4 = (u_int64_t *)xenstack;
> + u_int64_t *PT3 = (u_int64_t *)(xenstack + PAGE_SIZE);
> + u_int64_t *PT2 = (u_int64_t *)(xenstack + 2 * PAGE_SIZE);
> + int i;
> +
> + KASSERT((si != NULL && xenstack != 0),
> + ("invalid start_info or xenstack"));
> +
> + xen_early_printf("FreeBSD PVH running on %s\n", si->magic);
> +
> + /* We use 3 pages of xen stack for the boot pagetables */
> + physfree = xenstack + 3 * PAGE_SIZE - KERNBASE;
> +
> + /* Setup Xen global variables */
> + HYPERVISOR_start_info = si;
> + HYPERVISOR_shared_info =
> + (shared_info_t *)(si->shared_info + KERNBASE);
> +
> + /*
> + * Setup some misc global variables for Xen devices
> + *
> + * XXX: devices that need this specific variables should
> + * be rewritten to fetch this info by themselves from the
> + * start_info page.
> + */
> + console_page =
> + (char *)(ptoa(si->console.domU.mfn) + KERNBASE);
> + xen_store = (struct xenstore_domain_interface *)
> + (ptoa(si->store_mfn) + KERNBASE);
> +
> + xen_domain_type = XEN_PV_DOMAIN;
> + vm_guest = VM_GUEST_XEN;
> +
> + /*
> + * Use the stack Xen gives us to build the page tables
> + * as native FreeBSD expects to find them (created
> + * by the boot trampoline).
> + */
> + for (i = 0; i < 512; i++) {
> + /* Each slot of the level 4 pages points to the same level 3
> page */
> + PT4[i] = ((u_int64_t)&PT3[0]) - KERNBASE;
> + PT4[i] |= PG_V | PG_RW | PG_U;
> +
> + /* Each slot of the level 3 pages points to the same level 2
> page */
> + PT3[i] = ((u_int64_t)&PT2[0]) - KERNBASE;
> + PT3[i] |= PG_V | PG_RW | PG_U;
> +
> + /* The level 2 page slots are mapped with 2MB pages for 1GB. */
> + PT2[i] = i * (2 * 1024 * 1024);
> + PT2[i] |= PG_V | PG_RW | PG_PS | PG_U;
> + }
> + load_cr3(((u_int64_t)&PT4[0]) - KERNBASE);
> +
> + /* Set the hooks for early functions that diverge from bare metal */
> + xen_pv_set_init_ops();
> +
> + /* Now we can jump into the native init function */
> + return hammer_time(0, physfree);
> +}
> +#endif
> +
> u_int64_t
> hammer_time(u_int64_t modulep, u_int64_t physfree)
> {
> @@ -1705,17 +1842,7 @@ hammer_time(u_int64_t modulep, u_int64_t physfree)
> */
> proc_linkup0(&proc0, &thread0);
>
> - preload_metadata = (caddr_t)(uintptr_t)(modulep + KERNBASE);
> - preload_bootstrap_relocate(KERNBASE);
> - kmdp = preload_search_by_type("elf kernel");
> - if (kmdp == NULL)
> - kmdp = preload_search_by_type("elf64 kernel");
> - boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int);
> - kern_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *) + KERNBASE;
> -#ifdef DDB
> - ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t);
> - ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t);
> -#endif
> + kmdp = init_ops.parse_preload_data(modulep);
>
> /* Init basic tunables, hz etc */
> init_param1();
> @@ -1799,10 +1926,10 @@ hammer_time(u_int64_t modulep, u_int64_t physfree)
> lidt(&r_idt);
>
> /*
> - * Initialize the i8254 before the console so that console
> + * Initialize the early delay before the console so that console
> * initialization can use DELAY().
> */
> - i8254_init();
> + init_ops.early_delay_init();
>
> /*
> * Initialize the console before we print anything out.
> diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c
> index 4ef4b3d..44c2a45 100644
> --- a/sys/amd64/amd64/mp_machdep.c
> +++ b/sys/amd64/amd64/mp_machdep.c
> @@ -90,7 +90,8 @@ extern struct pcpu __pcpu[];
>
> /* AP uses this during bootstrap. Do not staticize. */
> char *bootSTK;
> -static int bootAP;
> +int bootAP;
> +bool lapic_disabled = false;
>
> /* Free these after use */
> void *bootstacks[MAXCPU];
> @@ -122,9 +123,12 @@ u_long *ipi_rendezvous_counts[MAXCPU];
> static u_long *ipi_hardclock_counts[MAXCPU];
> #endif
>
> +int native_start_all_aps(void);
> +
> /* Default cpu_ops implementation. */
> struct cpu_ops cpu_ops = {
> - .ipi_vectored = lapic_ipi_vectored
> + .ipi_vectored = lapic_ipi_vectored,
> + .start_all_aps = native_start_all_aps,
> };
>
> extern inthand_t IDTVEC(fast_syscall), IDTVEC(fast_syscall32);
> @@ -138,7 +142,7 @@ extern int pmap_pcid_enabled;
> static volatile cpuset_t ipi_nmi_pending;
>
> /* used to hold the AP's until we are ready to release them */
> -static struct mtx ap_boot_mtx;
> +struct mtx ap_boot_mtx;
>
> /* Set to 1 once we're ready to let the APs out of the pen. */
> static volatile int aps_ready = 0;
> @@ -165,7 +169,6 @@ static int cpu_cores; /* cores per
> package */
>
> static void assign_cpu_ids(void);
> static void set_interrupt_apic_ids(void);
> -static int start_all_aps(void);
> static int start_ap(int apic_id);
> static void release_aps(void *dummy);
>
> @@ -569,7 +572,7 @@ cpu_mp_start(void)
> assign_cpu_ids();
>
> /* Start each Application Processor */
> - start_all_aps();
> + cpu_ops.start_all_aps();
>
> set_interrupt_apic_ids();
> }
> @@ -707,7 +710,8 @@ init_secondary(void)
> wrmsr(MSR_SF_MASK, PSL_NT|PSL_T|PSL_I|PSL_C|PSL_D);
>
> /* Disable local APIC just to be sure. */
> - lapic_disable();
> + if (!lapic_disabled)
> + lapic_disable();
>
> /* signal our startup to the BSP. */
> mp_naps++;
> @@ -733,7 +737,7 @@ init_secondary(void)
>
> /* A quick check from sanity claus */
> cpuid = PCPU_GET(cpuid);
> - if (PCPU_GET(apic_id) != lapic_id()) {
> + if (!lapic_disabled && PCPU_GET(apic_id) != lapic_id()) {
> printf("SMP: cpuid = %d\n", cpuid);
> printf("SMP: actual apic_id = %d\n", lapic_id());
> printf("SMP: correct apic_id = %d\n", PCPU_GET(apic_id));
> @@ -749,7 +753,8 @@ init_secondary(void)
> mtx_lock_spin(&ap_boot_mtx);
>
> /* Init local apic for irq's */
> - lapic_setup(1);
> + if (!lapic_disabled)
> + lapic_setup(1);
>
> /* Set memory range attributes for this CPU to match the BSP */
> mem_range_AP_init();
> @@ -764,7 +769,7 @@ init_secondary(void)
> if (cpu_logical > 1 && PCPU_GET(apic_id) % cpu_logical != 0)
> CPU_SET(cpuid, &logical_cpus_mask);
>
> - if (bootverbose)
> + if (!lapic_disabled && bootverbose)
> lapic_dump("AP");
>
> if (smp_cpus == mp_ncpus) {
> @@ -908,8 +913,8 @@ assign_cpu_ids(void)
> /*
> * start each AP in our list
> */
> -static int
> -start_all_aps(void)
> +int
> +native_start_all_aps(void)
> {
> vm_offset_t va = boot_address + KERNBASE;
> u_int64_t *pt4, *pt3, *pt2;
> diff --git a/sys/amd64/include/asmacros.h b/sys/amd64/include/asmacros.h
> index 1fb592a..ce8dce4 100644
> --- a/sys/amd64/include/asmacros.h
> +++ b/sys/amd64/include/asmacros.h
> @@ -201,4 +201,30 @@
>
> #endif /* LOCORE */
>
> +#ifdef __STDC__
> +#define ELFNOTE(name, type, desctype, descdata...) \
> +.pushsection .note.name ; \
> + .align 4 ; \
> + .long 2f - 1f /* namesz */ ; \
> + .long 4f - 3f /* descsz */ ; \
> + .long type ; \
> +1:.asciz #name ; \
> +2:.align 4 ; \
> +3:desctype descdata ; \
> +4:.align 4 ; \
> +.popsection
> +#else /* !__STDC__, i.e. -traditional */
> +#define ELFNOTE(name, type, desctype, descdata) \
> +.pushsection .note.name ; \
> + .align 4 ; \
> + .long 2f - 1f /* namesz */ ; \
> + .long 4f - 3f /* descsz */ ; \
> + .long type ; \
> +1:.asciz "name" ; \
> +2:.align 4 ; \
> +3:desctype descdata ; \
> +4:.align 4 ; \
> +.popsection
> +#endif /* __STDC__ */
> +
> #endif /* !_MACHINE_ASMACROS_H_ */
> diff --git a/sys/amd64/include/clock.h b/sys/amd64/include/clock.h
> index d7f7d82..e7817ab 100644
> --- a/sys/amd64/include/clock.h
> +++ b/sys/amd64/include/clock.h
> @@ -25,6 +25,12 @@ extern int smp_tsc;
> #endif
>
> void i8254_init(void);
> +void i8254_delay(int);
> +#ifdef XENHVM
> +void xen_delay_init(void);
> +void xen_delay(int);
> +#endif
> +int delay_tc(int);
>
> /*
> * Driver to clock driver interface.
> diff --git a/sys/amd64/include/cpu.h b/sys/amd64/include/cpu.h
> index 3d9ff531..ed9f1db 100644
> --- a/sys/amd64/include/cpu.h
> +++ b/sys/amd64/include/cpu.h
> @@ -64,6 +64,7 @@ struct cpu_ops {
> void (*cpu_init)(void);
> void (*cpu_resume)(void);
> void (*ipi_vectored)(u_int, int);
> + int (*start_all_aps)(void);
> };
>
> extern struct cpu_ops cpu_ops;
> diff --git a/sys/amd64/include/sysarch.h b/sys/amd64/include/sysarch.h
> index cd380d4..27fd3ba 100644
> --- a/sys/amd64/include/sysarch.h
> +++ b/sys/amd64/include/sysarch.h
> @@ -4,3 +4,22 @@
> /* $FreeBSD$ */
>
> #include <x86/sysarch.h>
> +
> +#include <machine/pc/bios.h>
> +/*
> + * Struct containing pointers to init functions whose
> + * implementation is run time selectable. Selection can be made,
> + * for example, based on detection of a BIOS variant or
> + * hypervisor environment.
> + */
> +struct init_ops {
> + caddr_t (*parse_preload_data)(u_int64_t);
> + void (*early_delay_init)(void);
> + void (*early_delay)(int);
> + void (*fetch_e820_map)(caddr_t, struct bios_smap **, u_int32_t *);
> +#ifdef SMP
> + u_int (*mp_bootaddress)(u_int);
> +#endif
> +};
> +
> +extern struct init_ops init_ops;
> diff --git a/sys/amd64/include/xen/hypercall.h
> b/sys/amd64/include/xen/hypercall.h
> index a1b2a5c..499fb4d 100644
> --- a/sys/amd64/include/xen/hypercall.h
> +++ b/sys/amd64/include/xen/hypercall.h
> @@ -51,15 +51,8 @@
> #define CONFIG_XEN_COMPAT 0x030002
> #define __must_check
>
> -#ifdef XEN
> #define HYPERCALL_STR(name) \
> "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)"
> -#else
> -#define HYPERCALL_STR(name) \
> - "mov $("STR(__HYPERVISOR_##name)" * 32),%%eax; "\
> - "add hypercall_stubs(%%rip),%%rax; " \
> - "call *%%rax"
> -#endif
>
> #define _hypercall0(type, name) \
> ({ \
> diff --git a/sys/conf/files b/sys/conf/files
> index f3e298c..6040447 100644
> --- a/sys/conf/files
> +++ b/sys/conf/files
> @@ -2508,8 +2508,8 @@ dev/xe/if_xe_pccard.c optional xe pccard
> dev/xen/balloon/balloon.c optional xen | xenhvm
> dev/xen/blkfront/blkfront.c optional xen | xenhvm
> dev/xen/blkback/blkback.c optional xen | xenhvm
> -dev/xen/console/console.c optional xen
> -dev/xen/console/xencons_ring.c optional xen
> +dev/xen/console/console.c optional xen | xenhvm
> +dev/xen/console/xencons_ring.c optional xen | xenhvm
> dev/xen/control/control.c optional xen | xenhvm
> dev/xen/netback/netback.c optional xen | xenhvm
> dev/xen/netfront/netfront.c optional xen | xenhvm
> diff --git a/sys/conf/files.amd64 b/sys/conf/files.amd64
> index 1914c48..bd52e8f 100644
> --- a/sys/conf/files.amd64
> +++ b/sys/conf/files.amd64
> @@ -554,5 +554,9 @@ x86/x86/mptable_pci.c optional mptable
> pci
> x86/x86/msi.c optional pci
> x86/x86/nexus.c standard
> x86/x86/tsc.c standard
> +x86/x86/delay.c standard
> x86/xen/hvm.c optional xenhvm
> x86/xen/xen_intr.c optional xen | xenhvm
> +x86/xen/mptable.c optional xenhvm
> +x86/xen/pvcpu.c optional xenhvm
> +x86/xen/pv.c optional xenhvm
> diff --git a/sys/conf/files.i386 b/sys/conf/files.i386
> index e259659..15a3aae 100644
> --- a/sys/conf/files.i386
> +++ b/sys/conf/files.i386
> @@ -577,5 +577,6 @@ x86/x86/mptable_pci.c optional apic native pci
> x86/x86/msi.c optional apic pci
> x86/x86/nexus.c standard
> x86/x86/tsc.c standard
> +x86/x86/delay.c standard
> x86/xen/hvm.c optional xenhvm
> x86/xen/xen_intr.c optional xen | xenhvm
> diff --git a/sys/dev/xen/console/console.c b/sys/dev/xen/console/console.c
> index 65a0e7d..86dc2a4 100644
> --- a/sys/dev/xen/console/console.c
> +++ b/sys/dev/xen/console/console.c
> @@ -69,11 +69,14 @@ struct mtx cn_mtx;
> static char wbuf[WBUF_SIZE];
> static char rbuf[RBUF_SIZE];
> static int rc, rp;
> -static unsigned int cnsl_evt_reg;
> +unsigned int cnsl_evt_reg;
> static unsigned int wc, wp; /* write_cons, write_prod */
> xen_intr_handle_t xen_intr_handle;
> device_t xencons_dev;
>
> +/* Virt address of the shared console page */
> +char *console_page;
> +
> #ifdef KDB
> static int xc_altbrk;
> #endif
> @@ -113,6 +116,9 @@ static struct ttydevsw xc_ttydevsw = {
> static void
> xc_cnprobe(struct consdev *cp)
> {
> + if (!xen_pv_domain())
> + return;
> +
> cp->cn_pri = CN_REMOTE;
> sprintf(cp->cn_name, "%s0", driver_name);
> }
> @@ -175,7 +181,7 @@ static void
> xc_cnputc(struct consdev *dev, int c)
> {
>
> - if (xen_start_info->flags & SIF_INITDOMAIN)
> + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN)
> xc_cnputc_dom0(dev, c);
> else
> xc_cnputc_domu(dev, c);
> @@ -206,8 +212,7 @@ xcons_putc(int c)
> xcons_force_flush();
> #endif
> }
> - if (cnsl_evt_reg)
> - __xencons_tx_flush();
> + __xencons_tx_flush();
>
> /* inform start path that we're pretty full */
> return ((wp - wc) >= WBUF_SIZE - 100) ? TRUE : FALSE;
> @@ -217,6 +222,10 @@ static void
> xc_identify(driver_t *driver, device_t parent)
> {
> device_t child;
> +
> + if (!xen_pv_domain())
> + return;
> +
> child = BUS_ADD_CHILD(parent, 0, driver_name, 0);
> device_set_driver(child, driver);
> device_set_desc(child, "Xen Console");
> @@ -245,7 +254,7 @@ xc_attach(device_t dev)
> cnsl_evt_reg = 1;
> callout_reset(&xc_callout, XC_POLLTIME, xc_timeout, xccons);
>
> - if (xen_start_info->flags & SIF_INITDOMAIN) {
> + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) {
> error = xen_intr_bind_virq(dev, VIRQ_CONSOLE, 0, NULL,
> xencons_priv_interrupt, NULL,
> INTR_TYPE_TTY, &xen_intr_handle);
> @@ -309,7 +318,7 @@ __xencons_tx_flush(void)
> sz = wp - wc;
> if (sz > (WBUF_SIZE - WBUF_MASK(wc)))
> sz = WBUF_SIZE - WBUF_MASK(wc);
> - if (xen_start_info->flags & SIF_INITDOMAIN) {
> + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN) {
> HYPERVISOR_console_io(CONSOLEIO_write, sz,
> &wbuf[WBUF_MASK(wc)]);
> wc += sz;
> } else {
> @@ -424,7 +433,7 @@ xcons_force_flush(void)
> {
> int sz;
>
> - if (xen_start_info->flags & SIF_INITDOMAIN)
> + if (HYPERVISOR_start_info->flags & SIF_INITDOMAIN)
> return;
>
> /* Spin until console data is flushed through to the domain controller.
> */
> diff --git a/sys/dev/xen/console/xencons_ring.c
> b/sys/dev/xen/console/xencons_ring.c
> index 3701551..3046498 100644
> --- a/sys/dev/xen/console/xencons_ring.c
> +++ b/sys/dev/xen/console/xencons_ring.c
> @@ -32,9 +32,9 @@ __FBSDID("$FreeBSD$");
>
> #define console_evtchn console.domU.evtchn
> xen_intr_handle_t console_handle;
> -extern char *console_page;
> extern struct mtx cn_mtx;
> extern device_t xencons_dev;
> +extern int cnsl_evt_reg;
>
> static inline struct xencons_interface *
> xencons_interface(void)
> @@ -60,6 +60,7 @@ xencons_ring_send(const char *data, unsigned len)
> struct xencons_interface *intf;
> XENCONS_RING_IDX cons, prod;
> int sent;
> + struct evtchn_send send = { .port =
> HYPERVISOR_start_info->console.domU.evtchn };
>
> intf = xencons_interface();
> cons = intf->out_cons;
> @@ -76,7 +77,11 @@ xencons_ring_send(const char *data, unsigned len)
> wmb();
> intf->out_prod = prod;
>
> - xen_intr_signal(console_handle);
> + if (cnsl_evt_reg)
> + xen_intr_signal(console_handle);
> + else
> + HYPERVISOR_event_channel_op(EVTCHNOP_send, &send);
> +
>
> return sent;
>
> @@ -125,11 +130,11 @@ xencons_ring_init(void)
> {
> int err;
>
> - if (!xen_start_info->console_evtchn)
> + if (!HYPERVISOR_start_info->console_evtchn)
> return 0;
>
> err = xen_intr_bind_local_port(xencons_dev,
> - xen_start_info->console_evtchn, NULL, xencons_handle_input, NULL,
> + HYPERVISOR_start_info->console_evtchn, NULL, xencons_handle_input,
> NULL,
> INTR_TYPE_MISC | INTR_MPSAFE, &console_handle);
> if (err) {
> return err;
> @@ -145,7 +150,7 @@ void
> xencons_suspend(void)
> {
>
> - if (!xen_start_info->console_evtchn)
> + if (!HYPERVISOR_start_info->console_evtchn)
> return;
>
> xen_intr_unbind(&console_handle);
> diff --git a/sys/dev/xen/control/control.c b/sys/dev/xen/control/control.c
> index a9f8d1b..35c923d 100644
> --- a/sys/dev/xen/control/control.c
> +++ b/sys/dev/xen/control/control.c
> @@ -317,21 +317,6 @@ xctrl_suspend()
> EVENTHANDLER_INVOKE(power_resume);
> }
>
> -static void
> -xen_pv_shutdown_final(void *arg, int howto)
> -{
> - /*
> - * Inform the hypervisor that shutdown is complete.
> - * This is not necessary in HVM domains since Xen
> - * emulates ACPI in that mode and FreeBSD's ACPI
> - * support will request this transition.
> - */
> - if (howto & (RB_HALT | RB_POWEROFF))
> - HYPERVISOR_shutdown(SHUTDOWN_poweroff);
> - else
> - HYPERVISOR_shutdown(SHUTDOWN_reboot);
> -}
> -
> #else
>
> /* HVM mode suspension. */
> @@ -447,6 +432,21 @@ xctrl_halt()
> shutdown_nice(RB_HALT);
> }
>
> +static void
> +xen_pv_shutdown_final(void *arg, int howto)
> +{
> + /*
> + * Inform the hypervisor that shutdown is complete.
> + * This is not necessary in HVM domains since Xen
> + * emulates ACPI in that mode and FreeBSD's ACPI
> + * support will request this transition.
> + */
> + if (howto & (RB_HALT | RB_POWEROFF))
> + HYPERVISOR_shutdown(SHUTDOWN_poweroff);
> + else
> + HYPERVISOR_shutdown(SHUTDOWN_reboot);
> +}
> +
> /*------------------------------ Event Reception
> -----------------------------*/
> static void
> xctrl_on_watch_event(struct xs_watch *watch, const char **vec, unsigned int
> len)
> @@ -529,10 +529,9 @@ xctrl_attach(device_t dev)
> xctrl->xctrl_watch.callback_data = (uintptr_t)xctrl;
> xs_register_watch(&xctrl->xctrl_watch);
>
> -#ifndef XENHVM
> - EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final, NULL,
> - SHUTDOWN_PRI_LAST);
> -#endif
> + if (xen_pv_domain())
> + EVENTHANDLER_REGISTER(shutdown_final, xen_pv_shutdown_final,
> NULL,
> + SHUTDOWN_PRI_LAST);
>
> return (0);
> }
> diff --git a/sys/dev/xen/timer/timer.c b/sys/dev/xen/timer/timer.c
> index 824c75b..13bd852 100644
> --- a/sys/dev/xen/timer/timer.c
> +++ b/sys/dev/xen/timer/timer.c
> @@ -59,6 +59,9 @@ __FBSDID("$FreeBSD$");
> #include <machine/_inttypes.h>
> #include <machine/smp.h>
>
> +/* For the declaration of clock_lock */
> +#include <isa/rtc.h>
> +
> #include "clock_if.h"
>
> static devclass_t xentimer_devclass;
> @@ -234,18 +237,16 @@ xen_fetch_vcpu_tinfo(struct vcpu_time_info *dst, struct
> vcpu_time_info *src)
> * it happens to be less than another CPU's previously determined
> value.
> */
> static uint64_t
> -xen_fetch_vcpu_time(void)
> +xen_fetch_vcpu_time(struct vcpu_info *vcpu)
> {
> struct vcpu_time_info dst;
> struct vcpu_time_info *src;
> uint32_t pre_version;
> uint64_t now;
> volatile uint64_t last;
> - struct vcpu_info *vcpu = DPCPU_GET(vcpu_info);
>
> src = &vcpu->time;
>
> - critical_enter();
> do {
> pre_version = xen_fetch_vcpu_tinfo(&dst, src);
> barrier();
> @@ -266,16 +267,19 @@ xen_fetch_vcpu_time(void)
> }
> } while (!atomic_cmpset_64(&xen_timer_last_time, last, now));
>
> - critical_exit();
> -
> return (now);
> }
>
> static uint32_t
> xentimer_get_timecount(struct timecounter *tc)
> {
> + uint32_t xen_time;
> +
> + critical_enter();
> + xen_time = (uint32_t)xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) &
> UINT_MAX;
> + critical_exit();
>
> - return ((uint32_t)xen_fetch_vcpu_time() & UINT_MAX);
> + return xen_time;
> }
>
> /**
> @@ -305,7 +309,12 @@ xen_fetch_wallclock(struct timespec *ts)
> static void
> xen_fetch_uptime(struct timespec *ts)
> {
> - uint64_t uptime = xen_fetch_vcpu_time();
> + uint64_t uptime;
> +
> + critical_enter();
> + uptime = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info));
> + critical_exit();
> +
> ts->tv_sec = uptime / NSEC_IN_SEC;
> ts->tv_nsec = uptime % NSEC_IN_SEC;
> }
> @@ -354,7 +363,7 @@ xentimer_intr(void *arg)
> struct xentimer_softc *sc = (struct xentimer_softc *)arg;
> struct xentimer_pcpu_data *pcpu = DPCPU_PTR(xentimer_pcpu);
>
> - pcpu->last_processed = xen_fetch_vcpu_time();
> + pcpu->last_processed = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info));
> if (pcpu->timer != 0 && sc->et.et_active)
> sc->et.et_event_cb(&sc->et, sc->et.et_arg);
>
> @@ -415,7 +424,9 @@ xentimer_et_start(struct eventtimer *et,
> do {
> if (++i == 60)
> panic("can't schedule timer");
> - next_time = xen_fetch_vcpu_time() + first_in_ns;
> + critical_enter();
> + next_time = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)) +
> first_in_ns;
> + critical_exit();
> error = xentimer_vcpu_start_timer(cpu, next_time);
> } while (error == -ETIME);
>
> @@ -573,6 +584,36 @@ xentimer_suspend(device_t dev)
> return (0);
> }
>
> +/*
> + * Xen delay early init
> + */
> +void xen_delay_init(void)
> +{
> + /* Init the clock lock */
> + mtx_init(&clock_lock, "clk", NULL, MTX_SPIN | MTX_NOPROFILE);
> +}
> +/*
> + * Xen PV DELAY function
> + *
> + * When running on PVH mode we don't have an emulated i8524, so
> + * make use of the Xen time info in order to code a simple DELAY
> + * function that can be used during early boot.
> + */
> +void xen_delay(int n)
> +{
> + uint64_t end_ns;
> + uint64_t current;
> +
> + end_ns = xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]);
> + end_ns += n * NSEC_IN_USEC;
> +
> + for (;;) {
> + current =
> xen_fetch_vcpu_time(&HYPERVISOR_shared_info->vcpu_info[0]);
> + if (current >= end_ns)
> + break;
> + }
> +}
> +
> static device_method_t xentimer_methods[] = {
> DEVMETHOD(device_identify, xentimer_identify),
> DEVMETHOD(device_probe, xentimer_probe),
> diff --git a/sys/i386/i386/locore.s b/sys/i386/i386/locore.s
> index 68cb430..bd136b1 100644
> --- a/sys/i386/i386/locore.s
> +++ b/sys/i386/i386/locore.s
> @@ -898,3 +898,12 @@ done_pde:
> #endif
>
> ret
> +
> +#ifdef XENHVM
> +/* Xen Hypercall page */
> + .text
> +.p2align PAGE_SHIFT, 0x90 /* Hypercall_page needs to be PAGE aligned */
> +
> +NON_GPROF_ENTRY(hypercall_page)
> + .skip 0x1000, 0x90 /* Fill with "nop"s */
> +#endif
> diff --git a/sys/i386/i386/machdep.c b/sys/i386/i386/machdep.c
> index c430316..8bd9a8e 100644
> --- a/sys/i386/i386/machdep.c
> +++ b/sys/i386/i386/machdep.c
> @@ -254,6 +254,15 @@ struct mtx icu_lock;
>
> struct mem_range_softc mem_range_softc;
>
> +void
> +DELAY(int n)
> +{
> + if (delay_tc(n))
> + return;
> +
> + i8254_delay(n);
> +}
> +
> static void
> cpu_startup(dummy)
> void *dummy;
> diff --git a/sys/i386/include/clock.h b/sys/i386/include/clock.h
> index d980ec7..287b2c8 100644
> --- a/sys/i386/include/clock.h
> +++ b/sys/i386/include/clock.h
> @@ -22,6 +22,12 @@ extern int tsc_is_invariant;
> extern int tsc_perf_stat;
>
> void i8254_init(void);
> +void i8254_delay(int);
> +#ifdef XENHVM
> +void xen_delay_init(void);
> +void xen_delay(int);
> +#endif
> +int delay_tc(int);
>
> /*
> * Driver to clock driver interface.
> diff --git a/sys/i386/include/xen/hypercall.h
> b/sys/i386/include/xen/hypercall.h
> index edc13f4..1c15b0f 100644
> --- a/sys/i386/include/xen/hypercall.h
> +++ b/sys/i386/include/xen/hypercall.h
> @@ -40,15 +40,8 @@
> #define CONFIG_XEN_COMPAT 0x030002
>
>
> -#if defined(XEN)
> #define HYPERCALL_STR(name) \
> "call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)"
> -#else
> -#define HYPERCALL_STR(name) \
> - "mov hypercall_stubs,%%eax; " \
> - "add $("STR(__HYPERVISOR_##name)" * 32),%%eax; " \
> - "call *%%eax"
> -#endif
>
> #define _hypercall0(type, name) \
> ({ \
> diff --git a/sys/i386/xen/xen_machdep.c b/sys/i386/xen/xen_machdep.c
> index 7049be6..1b1c74d 100644
> --- a/sys/i386/xen/xen_machdep.c
> +++ b/sys/i386/xen/xen_machdep.c
> @@ -89,6 +89,7 @@ IDTVEC(div), IDTVEC(dbg), IDTVEC(nmi), IDTVEC(bpt),
> IDTVEC(ofl),
>
> int xendebug_flags;
> start_info_t *xen_start_info;
> +start_info_t *HYPERVISOR_start_info;
> shared_info_t *HYPERVISOR_shared_info;
> xen_pfn_t *xen_machine_phys = machine_to_phys_mapping;
> xen_pfn_t *xen_phys_machine;
> @@ -744,7 +745,7 @@ void initvalues(start_info_t *startinfo);
> struct xenstore_domain_interface;
> extern struct xenstore_domain_interface *xen_store;
>
> -char *console_page;
> +extern char *console_page;
>
> void *
> bootmem_alloc(unsigned int size)
> @@ -927,6 +928,7 @@ initvalues(start_info_t *startinfo)
> HYPERVISOR_vm_assist(VMASST_CMD_enable,
> VMASST_TYPE_4gb_segments_notify);
> #endif
> xen_start_info = startinfo;
> + HYPERVISOR_start_info = startinfo;
> xen_phys_machine = (xen_pfn_t *)startinfo->mfn_list;
>
> IdlePTD = (pd_entry_t *)((uint8_t *)startinfo->pt_base + PAGE_SIZE);
> diff --git a/sys/x86/isa/clock.c b/sys/x86/isa/clock.c
> index a12e175..a5aed1c 100644
> --- a/sys/x86/isa/clock.c
> +++ b/sys/x86/isa/clock.c
> @@ -247,61 +247,13 @@ getit(void)
> return ((high << 8) | low);
> }
>
> -#ifndef DELAYDEBUG
> -static u_int
> -get_tsc(__unused struct timecounter *tc)
> -{
> -
> - return (rdtsc32());
> -}
> -
> -static __inline int
> -delay_tc(int n)
> -{
> - struct timecounter *tc;
> - timecounter_get_t *func;
> - uint64_t end, freq, now;
> - u_int last, mask, u;
> -
> - tc = timecounter;
> - freq = atomic_load_acq_64(&tsc_freq);
> - if (tsc_is_invariant && freq != 0) {
> - func = get_tsc;
> - mask = ~0u;
> - } else {
> - if (tc->tc_quality <= 0)
> - return (0);
> - func = tc->tc_get_timecount;
> - mask = tc->tc_counter_mask;
> - freq = tc->tc_frequency;
> - }
> - now = 0;
> - end = freq * n / 1000000;
> - if (func == get_tsc)
> - sched_pin();
> - last = func(tc) & mask;
> - do {
> - cpu_spinwait();
> - u = func(tc) & mask;
> - if (u < last)
> - now += mask - last + u + 1;
> - else
> - now += u - last;
> - last = u;
> - } while (now < end);
> - if (func == get_tsc)
> - sched_unpin();
> - return (1);
> -}
> -#endif
> -
> /*
> * Wait "n" microseconds.
> * Relies on timer 1 counting down from (i8254_freq / hz)
> * Note: timer had better have been programmed before this is first used!
> */
> void
> -DELAY(int n)
> +i8254_delay(int n)
> {
> int delta, prev_tick, tick, ticks_left;
> #ifdef DELAYDEBUG
> @@ -317,9 +269,6 @@ DELAY(int n)
> }
> if (state == 1)
> printf("DELAY(%d)...", n);
> -#else
> - if (delay_tc(n))
> - return;
> #endif
> /*
> * Read the counter first, so that the rest of the setup overhead is
> diff --git a/sys/x86/x86/delay.c b/sys/x86/x86/delay.c
> new file mode 100644
> index 0000000..7ea70b1
> --- /dev/null
> +++ b/sys/x86/x86/delay.c
> @@ -0,0 +1,95 @@
> +/*-
> + * Copyright (c) 1990 The Regents of the University of California.
> + * Copyright (c) 2010 Alexander Motin <mav@xxxxxxxxxxx>
> + * All rights reserved.
> + *
> + * This code is derived from software contributed to Berkeley by
> + * William Jolitz and Don Ahn.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + * 4. Neither the name of the University nor the names of its contributors
> + * may be used to endorse or promote products derived from this software
> + * without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + *
> + * from: @(#)clock.c 7.2 (Berkeley) 5/12/91
> + */
> +
> +#include <sys/cdefs.h>
> +__FBSDID("$FreeBSD$");
> +
> +/* Generic x86 routines to handle delay */
> +
> +#include <sys/param.h>
> +#include <sys/systm.h>
> +#include <sys/timetc.h>
> +#include <sys/proc.h>
> +#include <sys/kernel.h>
> +#include <sys/sched.h>
> +
> +#include <machine/clock.h>
> +#include <machine/cpu.h>
> +
> +static u_int
> +get_tsc(__unused struct timecounter *tc)
> +{
> +
> + return (rdtsc32());
> +}
> +
> +int
> +delay_tc(int n)
> +{
> + struct timecounter *tc;
> + timecounter_get_t *func;
> + uint64_t end, freq, now;
> + u_int last, mask, u;
> +
> + tc = timecounter;
> + freq = atomic_load_acq_64(&tsc_freq);
> + if (tsc_is_invariant && freq != 0) {
> + func = get_tsc;
> + mask = ~0u;
> + } else {
> + if (tc->tc_quality <= 0)
> + return (0);
> + func = tc->tc_get_timecount;
> + mask = tc->tc_counter_mask;
> + freq = tc->tc_frequency;
> + }
> + now = 0;
> + end = freq * n / 1000000;
> + if (func == get_tsc)
> + sched_pin();
> + last = func(tc) & mask;
> + do {
> + cpu_spinwait();
> + u = func(tc) & mask;
> + if (u < last)
> + now += mask - last + u + 1;
> + else
> + now += u - last;
> + last = u;
> + } while (now < end);
> + if (func == get_tsc)
> + sched_unpin();
> + return (1);
> +}
> diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c
> index 8c8eef6..d8d7701 100644
> --- a/sys/x86/x86/local_apic.c
> +++ b/sys/x86/x86/local_apic.c
> @@ -1368,9 +1368,13 @@ apic_setup_io(void *dummy __unused)
> if (retval != 0)
> printf("%s: Failed to setup I/O APICs: returned %d\n",
> best_enum->apic_name, retval);
> -#ifdef XEN
> - return;
> +
> +#if defined(XEN) || defined(XENHVM)
> + /* There's no lapic on PV Xen */
> + if (xen_pv_domain())
> + return;
> #endif
> +
> /*
> * Finish setting up the local APIC on the BSP once we know how to
> * properly program the LINT pins.
> diff --git a/sys/x86/xen/hvm.c b/sys/x86/xen/hvm.c
> index 72811dc..be15594 100644
> --- a/sys/x86/xen/hvm.c
> +++ b/sys/x86/xen/hvm.c
> @@ -35,15 +35,21 @@ __FBSDID("$FreeBSD$");
> #include <sys/proc.h>
> #include <sys/smp.h>
> #include <sys/systm.h>
> +#include <sys/lock.h>
> +#include <sys/mutex.h>
> +#include <sys/reboot.h>
>
> #include <vm/vm.h>
> #include <vm/pmap.h>
> +#include <vm/vm_kern.h>
> +#include <vm/vm_extern.h>
>
> #include <dev/pci/pcivar.h>
>
> #include <machine/cpufunc.h>
> #include <machine/cpu.h>
> #include <machine/smp.h>
> +#include <machine/stdarg.h>
>
> #include <x86/apicreg.h>
>
> @@ -52,6 +58,9 @@ __FBSDID("$FreeBSD$");
> #include <xen/gnttab.h>
> #include <xen/hypervisor.h>
> #include <xen/hvm.h>
> +#ifdef __amd64__
> +#include <xen/pv.h>
> +#endif
> #include <xen/xen_intr.h>
>
> #include <xen/interface/hvm/params.h>
> @@ -97,6 +106,11 @@ extern void pmap_lazyfix_action(void);
> /* Variables used by mp_machdep to perform the bitmap IPI */
> extern volatile u_int cpu_ipi_pending[MAXCPU];
>
> +#ifdef __amd64__
> +/* Native AP start used on PVHVM */
> +extern int native_start_all_aps(void);
> +#endif
> +
> /*---------------------------------- Macros
> ----------------------------------*/
> #define IPI_TO_IDX(ipi) ((ipi) - APIC_IPI_INTS)
>
> @@ -119,7 +133,10 @@ enum xen_domain_type xen_domain_type = XEN_NATIVE;
> struct cpu_ops xen_hvm_cpu_ops = {
> .ipi_vectored = lapic_ipi_vectored,
> .cpu_init = xen_hvm_cpu_init,
> - .cpu_resume = xen_hvm_cpu_resume
> + .cpu_resume = xen_hvm_cpu_resume,
> +#ifdef __amd64__
> + .start_all_aps = native_start_all_aps,
> +#endif
> };
>
> static MALLOC_DEFINE(M_XENHVM, "xen_hvm", "Xen HVM PV Support");
> @@ -157,8 +174,9 @@ DPCPU_DEFINE(xen_intr_handle_t,
> ipi_handle[nitems(xen_ipis)]);
>
> /*------------------ Hypervisor Access Shared Memory Regions
> -----------------*/
> /** Hypercall table accessed via HYPERVISOR_*_op() methods. */
> -char *hypercall_stubs;
> +extern char *hypercall_page;
> shared_info_t *HYPERVISOR_shared_info;
> +start_info_t *HYPERVISOR_start_info;
>
> #ifdef SMP
> /*---------------------------- XEN PV IPI Handlers
> ---------------------------*/
> @@ -522,7 +540,7 @@ xen_setup_cpus(void)
> {
> int i;
>
> - if (!xen_hvm_domain() || !xen_vector_callback_enabled)
> + if (!xen_vector_callback_enabled)
> return;
>
> #ifdef __amd64__
> @@ -558,7 +576,7 @@ xen_hvm_cpuid_base(void)
> * Allocate and fill in the hypcall page.
> */
> static int
> -xen_hvm_init_hypercall_stubs(void)
> +xen_hvm_init_hypercall_stubs(enum xen_hvm_init_type init_type)
> {
> uint32_t base, regs[4];
> int i;
> @@ -567,7 +585,7 @@ xen_hvm_init_hypercall_stubs(void)
> if (base == 0)
> return (ENXIO);
>
> - if (hypercall_stubs == NULL) {
> + if (init_type == XEN_HVM_INIT_COLD) {
> do_cpuid(base + 1, regs);
> printf("XEN: Hypervisor version %d.%d detected.\n",
> regs[0] >> 16, regs[0] & 0xffff);
> @@ -577,18 +595,9 @@ xen_hvm_init_hypercall_stubs(void)
> * Find the hypercall pages.
> */
> do_cpuid(base + 2, regs);
> -
> - if (hypercall_stubs == NULL) {
> - size_t call_region_size;
> -
> - call_region_size = regs[0] * PAGE_SIZE;
> - hypercall_stubs = malloc(call_region_size, M_XENHVM, M_NOWAIT);
> - if (hypercall_stubs == NULL)
> - panic("Unable to allocate Xen hypercall region");
> - }
>
> for (i = 0; i < regs[0]; i++)
> - wrmsr(regs[1], vtophys(hypercall_stubs + i * PAGE_SIZE) + i);
> + wrmsr(regs[1], vtophys(&hypercall_page + i * PAGE_SIZE) + i);
>
> return (0);
> }
> @@ -677,8 +686,6 @@ xen_hvm_disable_emulated_devices(void)
> if (inw(XEN_MAGIC_IOPORT) != XMI_MAGIC)
> return;
>
> - if (bootverbose)
> - printf("XEN: Disabling emulated block and network devices\n");
> outw(XEN_MAGIC_IOPORT, XMI_UNPLUG_IDE_DISKS|XMI_UNPLUG_NICS);
> }
>
> @@ -691,7 +698,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type)
> if (init_type == XEN_HVM_INIT_CANCELLED_SUSPEND)
> return;
>
> - error = xen_hvm_init_hypercall_stubs();
> + if (xen_pv_domain()) {
> + /* hypercall page is already set in the PV case */
> + error = 0;
> + } else {
> + error = xen_hvm_init_hypercall_stubs(init_type);
> + }
>
> switch (init_type) {
> case XEN_HVM_INIT_COLD:
> @@ -701,6 +713,12 @@ xen_hvm_init(enum xen_hvm_init_type init_type)
> setup_xen_features();
> cpu_ops = xen_hvm_cpu_ops;
> vm_guest = VM_GUEST_XEN;
> +#ifdef __amd64__
> + if (xen_pv_domain())
> + cpu_ops.start_all_aps = xen_pv_start_all_aps;
> + else
> +#endif
> + printf("XEN: Disabling emulated block and network
> devices\n");
> break;
> case XEN_HVM_INIT_RESUME:
> if (error != 0)
> @@ -715,10 +733,13 @@ xen_hvm_init(enum xen_hvm_init_type init_type)
> }
>
> xen_vector_callback_enabled = 0;
> - xen_domain_type = XEN_HVM_DOMAIN;
> - xen_hvm_init_shared_info_page();
> xen_hvm_set_callback(NULL);
> - xen_hvm_disable_emulated_devices();
> +
> + if (!xen_pv_domain()) {
> + xen_domain_type = XEN_HVM_DOMAIN;
> + xen_hvm_init_shared_info_page();
> + xen_hvm_disable_emulated_devices();
> + }
> }
>
> void
> @@ -749,10 +770,11 @@ xen_set_vcpu_id(void)
> struct pcpu *pc;
> int i;
>
> - /* Set vcpu_id to acpi_id */
> + /* Set vcpu_id to acpi_id for PVHVM guests */
> CPU_FOREACH(i) {
> pc = pcpu_find(i);
> - pc->pc_vcpu_id = pc->pc_acpi_id;
> + if (xen_hvm_domain())
> + pc->pc_vcpu_id = pc->pc_acpi_id;
> if (bootverbose)
> printf("XEN: CPU %u has VCPU ID %u\n",
> i, pc->pc_vcpu_id);
> @@ -790,6 +812,31 @@ xen_hvm_cpu_init(void)
> DPCPU_SET(vcpu_info, vcpu_info);
> }
>
> +/*----------------------------- Debug functions
> ------------------------------*/
> +#define PRINTK_BUFSIZE 1024
> +static int
> +vprintk(const char *fmt, __va_list ap)
> +{
> + int retval, len;
> + static char buf[PRINTK_BUFSIZE];
> +
> + retval = vsnprintf(buf, PRINTK_BUFSIZE - 1, fmt, ap);
> + buf[retval] = 0;
> + len = strlen(buf);
> + retval = HYPERVISOR_console_io(CONSOLEIO_write, len, (char *)buf);
> + return retval;
> +}
> +
> +void
> +xen_early_printf(const char *fmt, ...)
> +{
> + __va_list ap;
> +
> + va_start(ap, fmt);
> + vprintk(fmt, ap);
> + va_end(ap);
> +}
> +
> SYSINIT(xen_hvm_init, SI_SUB_HYPERVISOR, SI_ORDER_FIRST, xen_hvm_sysinit,
> NULL);
> #ifdef SMP
> SYSINIT(xen_setup_cpus, SI_SUB_SMP, SI_ORDER_FIRST, xen_setup_cpus, NULL);
> diff --git a/sys/x86/xen/mptable.c b/sys/x86/xen/mptable.c
> new file mode 100644
> index 0000000..8916314
> --- /dev/null
> +++ b/sys/x86/xen/mptable.c
> @@ -0,0 +1,136 @@
> +/*-
> + * Copyright (c) 2003 John Baldwin <jhb@xxxxxxxxxxx>
> + * Copyright (c) 2013 Roger Pau Monné <roger.pau@xxxxxxxxxx>
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of the author nor the names of any co-contributors
> + * may be used to endorse or promote products derived from this software
> + * without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +
> +#include <sys/cdefs.h>
> +__FBSDID("$FreeBSD$");
> +
> +#include <sys/param.h>
> +#include <sys/systm.h>
> +#include <sys/bus.h>
> +#include <sys/kernel.h>
> +#include <sys/smp.h>
> +#include <sys/pcpu.h>
> +#include <vm/vm.h>
> +#include <vm/pmap.h>
> +
> +#include <machine/intr_machdep.h>
> +#include <machine/apicvar.h>
> +
> +#include <machine/cpu.h>
> +#include <machine/smp.h>
> +
> +#include <xen/xen-os.h>
> +#include <xen/hypervisor.h>
> +
> +#include <xen/interface/vcpu.h>
> +
> +static int xenpv_probe(void);
> +static int xenpv_probe_cpus(void);
> +static int xenpv_setup_local(void);
> +static int xenpv_setup_io(void);
> +
> +static struct apic_enumerator xenpv_enumerator = {
> + "Xen PV",
> + xenpv_probe,
> + xenpv_probe_cpus,
> + xenpv_setup_local,
> + xenpv_setup_io
> +};
> +
> +/*
> + * Look for an ACPI Multiple APIC Description Table ("APIC")
> + */
> +static int
> +xenpv_probe(void)
> +{
> + return (-100);
> +}
> +
> +/*
> + * Run through the MP table enumerating CPUs.
> + */
> +static int
> +xenpv_probe_cpus(void)
> +{
> + int i, ret;
> +
> + for (i = 0; i < MAXCPU; i++) {
> + ret = HYPERVISOR_vcpu_op(VCPUOP_is_up, i, NULL);
> + if (ret >= 0)
> + cpu_add((i * 2), (i == 0));
> + }
> +
> + return (0);
> +}
> +
> +/*
> + * Initialize the local APIC on the BSP.
> + */
> +static int
> +xenpv_setup_local(void)
> +{
> + PCPU_SET(vcpu_id, 0);
> + return (0);
> +}
> +
> +/*
> + * Enumerate I/O APICs and setup interrupt sources.
> + */
> +static int
> +xenpv_setup_io(void)
> +{
> + return (0);
> +}
> +
> +static void
> +xenpv_register(void *dummy __unused)
> +{
> + if (xen_pv_domain()) {
> + apic_register_enumerator(&xenpv_enumerator);
> + }
> +}
> +SYSINIT(xenpv_register, SI_SUB_TUNABLES - 1, SI_ORDER_FIRST, xenpv_register,
> NULL);
> +
> +/*
> + * Setup per-CPU ACPI IDs.
> + */
> +static void
> +xenpv_set_ids(void *dummy)
> +{
> + struct pcpu *pc;
> + int i;
> +
> + CPU_FOREACH(i) {
> + pc = pcpu_find(i);
> + pc->pc_vcpu_id = i;
> + }
> + return;
> +}
> +SYSINIT(xenpv_set_ids, SI_SUB_CPU, SI_ORDER_MIDDLE, xenpv_set_ids, NULL);
> diff --git a/sys/x86/xen/pv.c b/sys/x86/xen/pv.c
> new file mode 100644
> index 0000000..6756dec
> --- /dev/null
> +++ b/sys/x86/xen/pv.c
> @@ -0,0 +1,247 @@
> +/*
> + * Copyright (c) 2004 Christian Limpach.
> + * Copyright (c) 2004-2006,2008 Kip Macy
> + * Copyright (c) 2013 Roger Pau Monné <roger.pau@xxxxxxxxxx>
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +
> +#include <sys/cdefs.h>
> +__FBSDID("$FreeBSD$");
> +
> +#include <sys/param.h>
> +#include <sys/bus.h>
> +#include <sys/kernel.h>
> +#include <sys/malloc.h>
> +#include <sys/proc.h>
> +#include <sys/smp.h>
> +#include <sys/systm.h>
> +#include <sys/lock.h>
> +#include <sys/mutex.h>
> +#include <sys/reboot.h>
> +
> +#include <vm/vm.h>
> +#include <vm/pmap.h>
> +#include <vm/vm_kern.h>
> +#include <vm/vm_extern.h>
> +
> +#include <dev/pci/pcivar.h>
> +
> +#include <machine/cpufunc.h>
> +#include <machine/cpu.h>
> +#include <machine/smp.h>
> +#include <machine/tss.h>
> +#include <machine/sysarch.h>
> +#include <machine/clock.h>
> +
> +#include <x86/apicreg.h>
> +
> +#include <xen/xen-os.h>
> +#include <xen/features.h>
> +#include <xen/gnttab.h>
> +#include <xen/hypervisor.h>
> +#include <xen/hvm.h>
> +#include <xen/pv.h>
> +#include <xen/xen_intr.h>
> +
> +#include <xen/interface/hvm/params.h>
> +#include <xen/interface/vcpu.h>
> +
> +#define MAX_E820_ENTRIES 128
> +
> +/*--------------------------- Forward Declarations
> ---------------------------*/
> +static caddr_t xen_pv_parse_preload_data(u_int64_t);
> +static void xen_pv_fetch_e820_map(caddr_t, struct bios_smap **, u_int32_t *);
> +
> +/*---------------------------- Extern Declarations
> ---------------------------*/
> +/* Variables used by amd64 mp_machdep to start APs */
> +extern struct mtx ap_boot_mtx;
> +extern void *bootstacks[];
> +extern char *doublefault_stack;
> +extern char *nmi_stack;
> +extern void *dpcpu;
> +extern int bootAP;
> +extern char *bootSTK;
> +extern bool lapic_disabled;
> +
> +/*-------------------------------- Global Data
> -------------------------------*/
> +/* Xen init_ops implementation. */
> +struct init_ops xen_init_ops = {
> + .parse_preload_data = xen_pv_parse_preload_data,
> + .early_delay_init = xen_delay_init,
> + .early_delay = xen_delay,
> + .fetch_e820_map = xen_pv_fetch_e820_map,
> +};
> +
> +static struct
> +{
> + const char *ev;
> + int mask;
> +} howto_names[] = {
> + {"boot_askname", RB_ASKNAME},
> + {"boot_single", RB_SINGLE},
> + {"boot_nosync", RB_NOSYNC},
> + {"boot_halt", RB_ASKNAME},
> + {"boot_serial", RB_SERIAL},
> + {"boot_cdrom", RB_CDROM},
> + {"boot_gdb", RB_GDB},
> + {"boot_gdb_pause", RB_RESERVED1},
> + {"boot_verbose", RB_VERBOSE},
> + {"boot_multicons", RB_MULTIPLE},
> + {NULL, 0}
> +};
> +
> +static struct bios_smap xen_smap[MAX_E820_ENTRIES];
> +
> +static int
> +start_xen_ap(int cpu)
> +{
> + struct vcpu_guest_context *ctxt;
> + int ms, cpus = mp_naps;
> +
> + ctxt = malloc(sizeof(*ctxt), M_TEMP, M_NOWAIT | M_ZERO);
> + if (ctxt == NULL)
> + panic("unable to allocate memory");
> +
> + ctxt->flags = VGCF_IN_KERNEL;
> + ctxt->user_regs.rip = (unsigned long) init_secondary;
> + ctxt->user_regs.rsp = (unsigned long) bootSTK;
> +
> + /* Set the CPU to use the same page tables and CR4 value */
> + ctxt->ctrlreg[3] = KPML4phys;
> + ctxt->ctrlreg[4] = rcr4();
> +
> + if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
> + panic("unable to initialize CPU#%d\n", cpu);
> +
> + free(ctxt, M_TEMP);
> +
> + /* Launch the vCPU */
> + if (HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL))
> + panic("unable to start AP#%d\n", cpu);
> +
> + /* Wait up to 5 seconds for it to start. */
> + for (ms = 0; ms < 5000; ms++) {
> + if (mp_naps > cpus)
> + return 1; /* return SUCCESS */
> + DELAY(1000);
> + }
> +
> + return 0;
> +}
> +
> +int
> +xen_pv_start_all_aps(void)
> +{
> + int cpu;
> +
> + mtx_init(&ap_boot_mtx, "ap boot", NULL, MTX_SPIN);
> + lapic_disabled = true;
> +
> + for (cpu = 1; cpu < mp_ncpus; cpu++) {
> +
> + /* allocate and set up an idle stack data page */
> + bootstacks[cpu] = (void *)kmem_malloc(kernel_arena,
> + KSTACK_PAGES * PAGE_SIZE, M_WAITOK | M_ZERO);
> + doublefault_stack = (char *)kmem_malloc(kernel_arena,
> + PAGE_SIZE, M_WAITOK | M_ZERO);
> + nmi_stack = (char *)kmem_malloc(kernel_arena, PAGE_SIZE,
> + M_WAITOK | M_ZERO);
> + dpcpu = (void *)kmem_malloc(kernel_arena, DPCPU_SIZE,
> + M_WAITOK | M_ZERO);
> +
> + bootSTK = (char *)bootstacks[cpu] + KSTACK_PAGES * PAGE_SIZE -
> 8;
> + bootAP = cpu;
> +
> + /* attempt to start the Application Processor */
> + if (!start_xen_ap(cpu))
> + panic("AP #%d failed to start!", cpu);
> +
> + CPU_SET(cpu, &all_cpus); /* record AP in CPU map */
> + }
> +
> + return mp_naps;
> +}
> +
> +/*
> + * Functions to convert the "extra" parameters passed by Xen
> + * into FreeBSD boot options (from the i386 Xen port).
> + */
> +static char *
> +xen_setbootenv(char *cmd_line)
> +{
> + char *cmd_line_next;
> +
> + /* Skip leading spaces */
> + for (; *cmd_line == ' '; cmd_line++);
> +
> + for (cmd_line_next = cmd_line; strsep(&cmd_line_next, ",") != NULL;);
> + return (cmd_line);
> +}
> +
> +static int
> +xen_boothowto(char *envp)
> +{
> + int i, howto = 0;
> +
> + /* get equivalents from the environment */
> + for (i = 0; howto_names[i].ev != NULL; i++)
> + if (getenv(howto_names[i].ev) != NULL)
> + howto |= howto_names[i].mask;
> + return (howto);
> +}
> +
> +static caddr_t
> +xen_pv_parse_preload_data(u_int64_t modulep)
> +{
> + /* Parse the extra boot information given by Xen */
> + if (HYPERVISOR_start_info->cmd_line)
> + kern_envp = xen_setbootenv(HYPERVISOR_start_info->cmd_line);
> + boothowto |= xen_boothowto(kern_envp);
> +
> + return (NULL);
> +}
> +
> +static void
> +xen_pv_fetch_e820_map(caddr_t kmdp, struct bios_smap **smap, u_int32_t *size)
> +{
> + struct xen_memory_map memmap;
> + int rc;
> +
> + /* Fetch the E820 map from Xen */
> + memmap.nr_entries = MAX_E820_ENTRIES;
> + set_xen_guest_handle(memmap.buffer, xen_smap);
> + rc = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap);
> + if (rc)
> + panic("unable to fetch Xen E820 memory map");
> +
> + *smap = xen_smap;
> + *size = memmap.nr_entries * sizeof(xen_smap[0]);
> +}
> +
> +void
> +xen_pv_set_init_ops(void)
> +{
> + /* Init ops for Xen PV */
> + init_ops = xen_init_ops;
> +}
> diff --git a/sys/x86/xen/pvcpu.c b/sys/x86/xen/pvcpu.c
> new file mode 100644
> index 0000000..00e063b
> --- /dev/null
> +++ b/sys/x86/xen/pvcpu.c
> @@ -0,0 +1,98 @@
> +/*
> + * Copyright (c) 2013 Roger Pau Monné <roger.pau@xxxxxxxxxx>
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +
> +#include <sys/cdefs.h>
> +__FBSDID("$FreeBSD$");
> +
> +#include <sys/param.h>
> +#include <sys/systm.h>
> +#include <sys/bus.h>
> +#include <sys/kernel.h>
> +#include <sys/module.h>
> +#include <sys/pcpu.h>
> +#include <sys/smp.h>
> +
> +#include <xen/xen-os.h>
> +
> +static void
> +xenpvcpu_identify(driver_t *driver, device_t parent)
> +{
> + int i;
> +
> + if (!xen_pv_domain())
> + return;
> +
> + CPU_FOREACH(i)
> + BUS_ADD_CHILD(parent, 0, "pvcpu", i);
> +}
> +
> +static int
> +xenpvcpu_probe(device_t dev)
> +{
> + if (!xen_pv_domain())
> + return (ENXIO);
> +
> + device_set_desc(dev, "Xen PV CPU");
> + return (0);
> +}
> +
> +static int
> +xenpvcpu_attach(device_t dev)
> +{
> + struct pcpu *pc;
> + int cpu;
> +
> + cpu = device_get_unit(dev);
> + pc = pcpu_find(cpu);
> + pc->pc_device = dev;
> + return (0);
> +}
> +
> +static int
> +xenpvcpu_detach(device_t dev)
> +{
> +
> + return (0);
> +}
> +
> +static device_method_t xenpvcpu_methods[] = {
> + DEVMETHOD(device_identify, xenpvcpu_identify),
> + DEVMETHOD(device_probe, xenpvcpu_probe),
> + DEVMETHOD(device_attach, xenpvcpu_attach),
> + DEVMETHOD(device_detach, xenpvcpu_detach),
> + DEVMETHOD_END
> +};
> +
> +static driver_t xenpvcpu_driver = {
> + "pvcpu",
> + xenpvcpu_methods,
> + 0,
> +};
> +
> +devclass_t xenpvcpu_devclass;
> +
> +DRIVER_MODULE(xenpvcpu, nexus, xenpvcpu_driver, xenpvcpu_devclass, 0, 0);
> +MODULE_DEPEND(xenpvcpu, nexus, 1, 1, 1);
> diff --git a/sys/xen/gnttab.c b/sys/xen/gnttab.c
> index 03c32b7..909378a 100644
> --- a/sys/xen/gnttab.c
> +++ b/sys/xen/gnttab.c
> @@ -25,6 +25,7 @@ __FBSDID("$FreeBSD$");
> #include <sys/lock.h>
> #include <sys/malloc.h>
> #include <sys/mman.h>
> +#include <sys/limits.h>
>
> #include <xen/xen-os.h>
> #include <xen/hypervisor.h>
> @@ -607,6 +608,7 @@ gnttab_resume(void)
> {
> int error;
> unsigned int max_nr_gframes, nr_gframes;
> + void *alloc_mem;
>
> nr_gframes = nr_grant_frames;
> max_nr_gframes = max_nr_grant_frames();
> @@ -614,11 +616,20 @@ gnttab_resume(void)
> return (ENOSYS);
>
> if (!resume_frames) {
> - error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes,
> - &resume_frames);
> - if (error) {
> - printf("error mapping gnttab share frames\n");
> - return (error);
> + if (xen_pv_domain()) {
> + alloc_mem = contigmalloc(max_nr_gframes * PAGE_SIZE,
> + M_DEVBUF, M_NOWAIT, 0,
> + ULONG_MAX, PAGE_SIZE, 0);
> + KASSERT((alloc_mem != NULL),
> + ("unable to alloc memory for gnttab"));
> + resume_frames = vtophys(alloc_mem);
> + } else {
> + error = xenpci_alloc_space(PAGE_SIZE * max_nr_gframes,
> + &resume_frames);
> + if (error) {
> + printf("error mapping gnttab share frames\n");
> + return (error);
> + }
> }
> }
>
> diff --git a/sys/xen/interface/arch-x86/xen.h
> b/sys/xen/interface/arch-x86/xen.h
> index 1c186d7..6cc15d3 100644
> --- a/sys/xen/interface/arch-x86/xen.h
> +++ b/sys/xen/interface/arch-x86/xen.h
> @@ -147,7 +147,16 @@ struct vcpu_guest_context {
> struct cpu_user_regs user_regs; /* User-level CPU registers
> */
> struct trap_info trap_ctxt[256]; /* Virtual IDT
> */
> unsigned long ldt_base, ldt_ents; /* LDT (linear address, # ents)
> */
> - unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents)
> */
> + union {
> + struct {
> + /* PV: GDT (machine frames, # ents).*/
> + unsigned long gdt_frames[16], gdt_ents;
> + } pv;
> + struct {
> + /* PVH: GDTR addr and size */
> + unsigned long gdtaddr, gdtsz;
> + } pvh;
> + } u;
> unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1)
> */
> /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */
> unsigned long ctrlreg[8]; /* CR0-CR7 (control registers)
> */
> diff --git a/sys/xen/pv.h b/sys/xen/pv.h
> new file mode 100644
> index 0000000..bbb1048
> --- /dev/null
> +++ b/sys/xen/pv.h
> @@ -0,0 +1,29 @@
> +/*
> + * Permission is hereby granted, free of charge, to any person obtaining a
> copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense,
> and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> + * $FreeBSD$
> + */
> +
> +#ifndef __XEN_PV_H__
> +#define __XEN_PV_H__
> +
> +int xen_pv_start_all_aps(void);
> +void xen_pv_set_init_ops(void);
> +
> +#endif /* __XEN_PV_H__ */
> \ No newline at end of file
> diff --git a/sys/xen/xen-os.h b/sys/xen/xen-os.h
> index 95e8c6a..d3dccad 100644
> --- a/sys/xen/xen-os.h
> +++ b/sys/xen/xen-os.h
> @@ -53,6 +53,11 @@ void force_evtchn_callback(void);
> extern int gdtset;
>
> extern shared_info_t *HYPERVISOR_shared_info;
> +extern start_info_t *HYPERVISOR_start_info;
> +
> +/* XXX: we need to get rid of this and use HYPERVISOR_start_info directly */
> +extern struct xenstore_domain_interface *xen_store;
> +extern char *console_page;
>
> enum xen_domain_type {
> XEN_NATIVE, /* running on bare hardware */
> @@ -80,6 +85,9 @@ xen_hvm_domain(void)
> return (xen_domain_type == XEN_HVM_DOMAIN);
> }
>
> +/* Debug function, prints directly to hypervisor console */
> +void xen_early_printf(const char *, ...);
> +
> #ifndef xen_mb
> #define xen_mb() mb()
> #endif
> diff --git a/sys/xen/xenstore/xenstore.c b/sys/xen/xenstore/xenstore.c
> index d404862..b9885af 100644
> --- a/sys/xen/xenstore/xenstore.c
> +++ b/sys/xen/xenstore/xenstore.c
> @@ -1082,6 +1082,19 @@ xs_init_comms(void)
> static void
> xs_identify(driver_t *driver, device_t parent)
> {
> + const char *parent_name;
> +
> + if (!xen_domain())
> + return;
> +
> + /*
> + * On HVM domains we will get called twice, once from the nexus
> + * and another time after the xenpci device is attached, we should
> + * only attach after the xenpci device has been added.
> + */
> + parent_name = device_get_name(parent);
> + if (xen_hvm_domain() && strncmp(parent_name, "xenpci", 6) != 0)
> + return;
>
> BUS_ADD_CHILD(parent, 0, "xenstore", 0);
> }
> @@ -1147,13 +1160,15 @@ xs_attach(device_t dev)
> /* Initialize the interface to xenstore. */
> struct proc *p;
>
> -#ifdef XENHVM
> - xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN);
> - xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN);
> - xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE);
> -#else
> - xs.evtchn = xen_start_info->store_evtchn;
> -#endif
> + if (xen_hvm_domain()) {
> + xs.evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN);
> + xs.gpfn = hvm_get_parameter(HVM_PARAM_STORE_PFN);
> + xen_store = pmap_mapdev(xs.gpfn * PAGE_SIZE, PAGE_SIZE);
> + } else if (xen_pv_domain()) {
> + xs.evtchn = HYPERVISOR_start_info->store_evtchn;
> + } else {
> + panic("Unknown domain type, cannot initialize xenstore\n");
> + }
>
> TAILQ_INIT(&xs.reply_list);
> TAILQ_INIT(&xs.watch_events);
> @@ -1263,9 +1278,8 @@ static devclass_t xenstore_devclass;
>
> #ifdef XENHVM
> DRIVER_MODULE(xenstore, xenpci, xenstore_driver, xenstore_devclass, 0, 0);
> -#else
> -DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0);
> #endif
> +DRIVER_MODULE(xenstore, nexus, xenstore_driver, xenstore_devclass, 0, 0);
>
> /*------------------------------- Sysctl Data
> --------------------------------*/
> /* XXX Shouldn't the node be somewhere else? */
> --
> 1.7.7.5 (Apple Git-26)
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |