[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xen article for the FreeBSD Journal



Hello,

I've got a first draft from the editor, attached to this email.

I've found a couple of typos and readability adjustments, plus the
Driver domains diagram needs the devices aligned to the domains that
have them assigned.

I can forward any other feedback you have.

Regards, Roger.

On Thu, Nov 14, 2024 at 04:16:54PM +0100, Roger Pau Monné wrote:
> Hello,
> 
> I've been asked to write a Xen article for the FreeBSD Journal [0], a
> free FreeBSD magazine published by the FreeBSD Foundation.
> 
> The request was something about Xen for an upcoming virtualization
> focused issue, but it could can be a status report, new features on
> Xen, or even a tutorial about how to use Xen on FreeBSD.
> 
> I've decided to do an article about the Xen architecture, specific
> features that make Xen different from other hypervisors, plus some
> FreeBSD background about how Xen support is implemented in FreeBSD.
> 
> I feel like I'm missing a bit more ARM content to balance the x86
> stuff, but sadly I'm not sure what's worth highlighting there, so any
> recommendations are welcome.
> 
> I'm attaching an early draft for the article to this email in plain
> text.  Note the diagrams will be re-done for them to match the style
> of the publication, and are currently hand drawings since I didn't
> want to waste time on something that will be redone anyway by the
> layout editor.
> 
> I've been told the article should be submitted by the middle of
> November, I'm a bit short on time.  I would like to give them
> something on Monday so they also have time to provide feedback.
> 
> Thanks, Roger.
> 
> [0] https://freebsdfoundation.org/our-work/journal/

> Xen on FreeBSD
> =============
> 
> Introduction
> 
> The Xen Hypervisor started at the University of Cambridge Computer Laboratory
> in the late 1990s under the project name Xenoservers. At that time, 
> Xenoservers
> aimed to provide "a new distributed computing paradigm, termed 'global public
> computing,' which allows any user to run any code anywhere. Such platforms
> price computing resources, and ultimately charge users for resources 
> consumed".
> 
> Using a hypervisor allows sharing the hardware resources of a physical machine
> among several OSes in a secure way. The hypervisor is the piece of software
> that manages all those OSes (usually called guests or viertual machines), and
> provides separation and isolation between them.  First released in 2003 as an
> open-source hypervisor under the GPLv2, Xen's design is OS agnostic, which
> makes it easy to add Xen support into new OSes. Since its first release more
> than 20 years ago, Xen receives broad support from a large community of
> individual developers and corporate contributors.
> 
> The Architecture
> 
> Hypervisors can be divided into two categories:
> 
>  * Type 1: those that run directly on bare metal and are in direct control of
>    the hardware.
>  * Type 2: hypervisors that are part of an operating system.
> 
> Common Type 1 hypervisors are VMware ESX/ESXi and Microsoft Hyper-V, while
> VMware Workstation and VirtualBox are clear examples of Type 2 hypervisors.
> Xen is a Type 1 hypervisor with a twist—its design resembles a microkernel in
> many ways.  Xen itself only takes control of the CPUs, the local and IO APICs,
> the MMU, the IOMMU and a timer. The rest is taken care of by the control 
> domain
> (Dom0), a specialized guest granted elevated privileges by the hypervisor. 
> This
> allows Dom0 to manage all other hardware in the system, as well as all other
> guests running on the hypervisor. It is also important to realize that Xen
> contains almost no hardware drivers, preventing code duplication with the
> drivers already present in OSes.
> 
> (insert architecture.jpg)
> 
> When Xen was initially designed there was no hardware vitalization extensions
> on x86, options for virtualization either involved full software emulation, or
> binary translation.  Both options are very expensive in terms of performance,
> so Xen took a different approach.  Instead of intending to emulate the
> current x86 interfaces a new interface was provided to guests.  The purpose of
> such new interface was to avoid the overhead of having to deal with the
> emulation of hardware interfaces in the hypervisor, and instead use a new
> interface between the guest and Xen that's more natural to implement for
> both.
> 
> However this requires the guest to be aware it's running under Xen, and to use
> a different set of interfaces compared to running natively.  That set of
> interfaces was named as ParaVirtualized, and hence the guests that used those
> interfaces where usually referred to as PV guests.  The following interfaces
> are replaced with PV equivalent on PV guests:
> 
>  * Disk and network.
>  * Interrupts and timers.
>  * Kernel entry point.
>  * Page tables.
>  * Privileged instructions.
> 
> The main limitation with such approach is that it requires extensive changes 
> to
> core parts of guests kernel OSes, and so far the only OSes to have x86 Xen PV
> support are Linux and NetBSD.
> 
> With the addition of hardware virtualization extensions to x86 CPUs Xen also
> gained support to run unmodified (non-PV) guests.  Such guests rely on the
> usage of hardware virtualization plus emulation of hardware devices.  On a Xen
> system such emulation is either done by the hypervisor itself (for performance
> critical devices) or offloaded to an external emulator running in user-space,
> by default QEMU.  This hardware virtualized guests that emulate a full PC
> compliant environment are called HVM in Xen terminology.
> 
> So now we have gone over two very different type of guests, on one side we 
> have
> PV guests that use PV interfaces in order to avoid emulation, and on the other
> side we have HVM guests that rely on hardware support and software emulation 
> in
> order to run unmodified guests.
> 
> Emulated IO devices used by HVM guests, such as disks or network cards, don't
> perform very well due to the amount of logic required to handle data 
> transfers.
> To avoid this overhead Xen HVM guests also get the option to use PV interfaces
> for IO.  Some other PV interfaces are available to HVM guests (like a
> one-shot PV timer) in an attempt to reduce the possible overhead of using
> emulated devices.
> 
> While HVM allows to run every possible unmodified x86 guest, it also has a 
> wide
> attack surface due to emulating all devices required for a PC compatible
> environment.  To reduce the amount of interfaces (and thus the surface of
> attack) exposed to guest a slightly modified version of HVM guests was 
> created,
> named PVH.  Such guest is a slimmed down version of HVM, where a lot of
> emulated devices that would be present on HVM guests are not available.  For
> example a PVH guests only gets an emulated local APIC and maybe an emulated IO
> APIC, but there's no emulated HPET, PIT or legacy PIC (8259).  PVH mode 
> however
> requires modifications in the guest OS kernel so it's aware it's running under
> Xen and some devices are not available.  PVH mode also uses a specific kernel
> entry point, that allows directly booting into the guest kernel, without the
> need to rely on an emulated firmware (SeaBIOS or OVMF), thus greatly speeding
> up the boot process.  Note however OVMF can also be run in PVH mode to
> chainload OS-specific bootloaders when startup speed is not of great concern,
> and ease of use is preferred.  See the table below for a brief comparison of
> the different guest modes on x86.
> 
>                          |  PV           |  PVH                  | HVM
> =======================================================================================
> I/O devices              |  PV (xenbus)  |  PV (xenbus)          | emulated + 
> PV
> Legacy devices           |  NO           |  NO                   | YES
> Privileged instructions  |  PV           |  hardware virtualized | hardware 
> virtualized
> System configuration     |  PV (xenbus)  |  ACPI + PV (xenbus)   | ACPI + PV 
> (xenbus)
> Kernel entry point       |  PV           |  PV + native*         | native
> 
> * it's possible for PVH guests to re-use the native entry point when booted
>   with firmware, but that requires adding logic to the native entry point
>   to detect when booted in a PVH environment.  Not all OSes support this.
> 
> The PVH approach has also been adopted by other virtualization technologies,
> like Firecraker from AWS.  While Firecraker is based on KVM it re-uses the
> Linux Xen PVH entry point and applies the same attack surface reduction by not
> exposing (and thus emulating) legacy x86 devices.
> 
> Speaking about ARM architecture, the fact that the Xen port was developed once
> ARM already had support for hardware virtualization extensions lead to a
> different approach when compared to x86.  ARM has only one guest type, and it
> would be the equivalent of PVH on x86.  The focus is also to attempt to not
> expose an excess of emulated devices, both to reduce the complexity and the
> attack surface.
> 
> It's quite likely that the upcoming RISC-V and PowerPC ports will take the 
> same
> approach of supporting only one guest type, more akin to HVM or PVH on x86.
> Those platforms also have hardware virtualization extensions that forego the
> need of something similar to classic PV support.
> 
> Usages and unique features
> 
> The first commercial usages of Xen where strictly focused on server
> virtualization, either on premise usage of Xen based products, or through 
> cloud
> offerings.  However due to its versatility Xen has now also extended into
> the client and embedded space.  Xen's small footprint and security focus
> makes it suitable for a wide range of environments.
> 
> A great example of a client (desktop) usage of Xen is QubesOS, a Linux based 
> OS
> that's focused on security through isolation of different processes in virtual
> machines, all running on top of the Xen hypervisor and even supporting the
> usage of Windows applications.  QubesOS relies heavily in some key 
> Xen-specific
> features:
> 
>  * Driver domains: network cards and USB drivers are run in separate VMs, so
>    that security issues from the usage of those devices cannot compromise the
>    entire system.  See diagram about driver domains.
>  * Stub domains: the QEMU instance that handles the emulation for each HVM
>    guests is not run in dom0, but rather in a separate PV or PVH domain.  This
>    isolation prevents security issues in QEMU from compromising the entire
>    system.
>  * Limited memory sharing: by using the grant sharing interfaces, a domain can
>    decide what pages of memory are shared to which domains, thus preventing
>    other domains (even semi-privileged ones) from being able to access all 
> guest
>    memory.
> 
> (insert driver_domains.jpg)
> 
> Since the addition of the ARM port there's been a wide interest in using Xen 
> on
> embedded deployments, from industrial to automotive.  Apart from the small
> footprint and security focus, there are some key features of Xen that make it
> appealing for such usages.  First, the amount of code is Xen is quite limited
> when compared to Type-2 hypervisors, so it's conceivable to attempt to
> safety-certify it.  There's currently an effort upstream to attempt to comply
> with the applicable parts of the MISRA C standard so Xen can be safety
> certified.
> 
> Some unique features that make it very appealing to embedded uses include:
> 
>  * Small code base: makes it possibly to audit and safety certify, also the
>    code base is being adapted to comply with the MISRA C standard.
>  * cpupools:  Xen has the ability to partition the CPUs into different groups,
>    and assign a different scheduler to each group.  Guests can then be 
> assigned
>    to those groups, allowing to have a set of guests that run using a realtime
>    scheduler, like RTDS or ARUNC653, while a different set of guests can run
>    using a general purpose scheduler like credit2.  See diagram about CPU
>    pools.
>  * CPU pinning: it's also possible to apply restrictions of which host CPUs 
> get
>    to schedule which guest CPUs, so for example a guest CPU can be exclusively
>    given a host CPU when running latency sensitive workloads.
>  * dom0less/hyperlaunch: a feature that originated in ARM and it's currently
>    being implemented for x86 also.  Allows multiple guests to be created
>    statically at boot time.  This is very useful for static partitioned 
> systems,
>    where the number of guests is fixed and known ahead of time.  In such setup
>    the presence of an initial (privileged) domain is optional, as some setups
>    don't require further operations against the initially created guests.
> 
> (insert cpupools.jpg)
> 
> FreeBSD Xen Support
> 
> FreeBSD Xen support was added quite late compared to other OSes.  For instance
> NetBSD was the first OS to formally commit Xen PV support, because Linux
> patches for full PV support didn't get merged until Linux 3.0 (around 2012).
> 
> FreeBSD had some initial support for PV, but that port was 32bit only, and not
> fully functional.  Development on it stopped, and it ended up being deleted
> from the tree once PVH support was implemented.  During the early 2010 FreeBSD
> saw the addition of PV optimizations when running as an HVM guest, which
> allowed FreeBSD to make use of PV devices for I/O, together with the usage of
> some additional PV interfaces for speedups, like the PV timer.
> 
> In early 2014 FreeBSD gained support to run as a PVHv1 guest, and shortly 
> after
> as a PVHv1 initial domain.  Sadly the first implementation of PVH (also known
> as PVHv1) was wrongly designed, and had backed in too many PV related
> limitations.  PVHv1 was designed as an attempt to move a classic PV guest so
> they could run inside of an Intel VMX container.  This was fairly limiting, as
> the guest still had a bunch of restrictions inherited from classic PV, and it
> was also limited to Intel hardware only.
> 
> After finding out about those design limitations, work started on moving to a
> different implementation of PVH.  The new approach instead started with an HVM
> guest, and stripped as much emulation as possible, including all emulation 
> done
> by QEMU.  Most of this work was in fact developed with FreeBSD, as that's my 
> main
> development platform, and I did extensive work in order to implement what was
> later called PVHv2, and is now plain PVH.
> 
> FreeBSD x86 runs as both an HVM and PVH guest, and also supports running as a
> PVH dom0 (initial domain).  In fact x86 PVH support was merged earlier in
> FreeBSD than Linux.  Running in PVH mode however still has some missing
> features compared to a classic PV dom0.  The biggest one being the lack of PCI
> passthrough support, such feature however requires changes in both FreeBSD and
> Xen to be implemented.  There's an ongoing effort in Xen upstream to add PCI
> passthrough support for PVH dom0, however that's still being worked on, and
> when finished will require changes to FreeBSD for the feature to be usable.
> 
> On the ARM side, work is underway to get FreeBSD to run as an Aarch64 Xen
> guest.  That required splitting the Xen code in FreeBSD to separate the
> architecture specific bits from the generic ones.  Further work is being done
> to integrate Xen interrupt multiplexing with the native interrupt handling 
> done
> in ARM.
> 
> Recent developments in the Xen community
> 
> Apart from the ongoing effort mentioned before that attempts to bring feature
> parity between a PV and PVH dom0 on x86, there's a lot more going on in
> upstream Xen.  Since the last Xen release (4.19) PVH dom0 is a supported mode
> of operation, albeit with caveats due to some key features still missing.
> 
> The RISC-V and PowerPC ports are making progress to reach a functional state,
> hopefully in a couple of releases we might have them reach a state where the
> initial domain can be booted and guests can be created.
> 
> At least on x86 a lot of time in recent years is spent in mitigating the 
> flurry
> of hardware security vulnerabilities.  Since the original Meltdown and Spectre
> attacks released early 2018, the amount of hardware vulnerabilities has been
> increasing steadily.  This requires a lot of work an attention on the Xen 
> side,
> the hypervisor itself needs to be fixed to not be vulnerable, but it's also
> quite likely some new controls need exposing to the guests so they can protect
> themselves.  To mitigate the impact that future hardware vulnerabilities have
> on Xen, we are working on a new feature called Address Space Isolation (which
> has also been known as Secret Free Xen), that aims to remove the direct map,
> plus all sensitive mappings from being permanently mapped in the hypervisor
> address space.  This would make Xen not vulnerable to speculative execution
> attacks, thus allowing to remove a lot of the mitigations applied on entry
> points into the hypervisor, and possibly the need to apply more mitigations 
> for
> any future speculative issues.
> 
> Since the beginning of 2021, all Xen commits are tested to build on FreeBSD
> using the Cirrus CI testing system.  This has been a massive help to keep Xen
> building on FreeBSD, as the usage of Clang plus the LLVM toolchain sometimes
> either created or displayed issues that wouldn't manifest when using the GNU
> toolchain.  We currently test that Xen builds on all the supported FreeBSD
> stable branches, plus the HEAD development branch.  Xen recently retired its
> custom testing system called osstest, and now solely relies on Gitlab CI,
> Cirrus CI and Github actions to perform testing.  This allows for a more open
> and well documented testing infrastructure, where it's easier for newcomers to
> contribute and add tests.  Future work in that area should include runtime
> testing on FreeBSD, even if initially using QEMU instead of a real hardware
> platform.
> 
> Recent releases also added toolstack support for exposing VirtIO devices to 
> Xen
> guests.  There are ongoing changes to the VirtIO specification to use grants
> instead of guest memory addresses as the basis for memory sharing between the
> VirtIO frontends and backends.  There are also efforts to introduce a
> transport layer not based on memory sharing, as this is a requirement for some
> security environments.  Going forward this would allow Xen to use VirtIO
> devices while keeping the security and isolation that's guaranteed when using
> the native Xen PV IO devices.  The overall goal is to be able to reuse the
> VirtIO drivers as first class interfaces on Xen deployments.
> 
> Safety certification and the adoption of MISRA C rules has also been one of 
> the
> main tasks for the past releases.  Last Xen release (4.19) has been extended 
> to
> support 7 directives and 113 rules, of a total of 18 directives and 182 rules
> that conform the MISRA C specification.  Adoption is been done progressively,
> so that each rule or directive can be debated and agreed before being adopted.
> Given that Xen code base wasn't designed with MISRA compliance in mind, some 
> of
> the rules will require either global or local per-instance deviations.
> 
> The future of Xen
> 
> Looking back at when x86 PVH support was first added on FreeBSD it's been a
> long and not always easy road.  FreeBSD was an early adopter of PVH for dom0
> mode, and a lot of Xen development has been done while using a FreeBSD PVH
> dom0.  It's also notable how FreeBSD has become a first class Xen citizen in
> the recent years, as now there is build testing of Xen on FreeBSD for each
> commit that goes into the Xen repository.
> 
> The port of FreeBSD to run as a Xen Aarch64 guest has also gained some 
> traction
> recently, and is certainly a feature to look forward to given the increasing
> presence of ARM based platforms both on the server, the client and the 
> embedded
> environments.
> 
> It's good to see Xen being used in so many different use-cases, and so
> different from it's inception design purpose of being focused on server side
> (cloud) virtualization.  I can only hope to see into what new deployments and
> use-cases Xen will be used in the future.




Attachment: draft-1.pdf
Description: Adobe PDF document


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.