[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Notes from the Xen Summit 2021: Design Session VirtIO Cross-Project BoF (Birds of a Feather) for Xen and Guest OS (Linux, Windows, FreeBSD) developers

Design Session notes for: VirtIO Cross-Project BoF (Birds of a Feather) for Xen and Guest OS (Linux, Windows, FreeBSD) developers
Xen Design & Developer Summit, 27th May 2021
Session Host: Juergen Gross
Notes by: Christopher Clark, with thanks to Rich Persaud

Apologies for the delay in posting these, solely my responsibility.
- Christopher

Session Context:

There are three separate recent approaches within the Xen community to enabling
use of VirtIO device drivers in guest virtual machines with the Xen hypervisor,
and a fourth older completed project.
(Placeholder names are assigned to each of these below for ease of reference.)

In addition, Linaro has an active project 'Stratos' pursuing:
    "Establish virtio as the standard interface between hypervisors,
    freeing a mobile, industrial or automotive platform to migrate between
    hypervisors and reuse the backend implementation."

    - https://linaro.atlassian.net/wiki/spaces/STR/overview
    - https://projects.linaro.org/projects/STR/summary
    - https://op-lists.linaro.org/mailman/listinfo/stratos-dev

* 'VirtIO-EPAM': enabling existing VirtIO-MMIO transport on Xen, using foreign
mappings and an IOREQ server.
by EPAM and others, with focus on Xen on Arm platforms.

Enables use of the existing standardized VirtIO-MMIO transport driver, which is
present in the mainline Linux kernel, using Xen's IOREQ emulation
infrastructure and use of privileged foreign mappings to establish shared
memory for access to guest data by the device model backend.

Status: Patches are in progress towards Xen on the xen-devel mailing list.
The presentation at Linaro Connect 2021 includes a working demonstration.

VirtIO on Xen hypervisor (Arm), Oleksandr Tyshchenko, EPAM, Linaro Connect 2021

* 'VirtIO-SuSE': introducing a new VirtIO transport driver that uses Xen grants.
by SuSE, presented at this Xen Summit.

A new VirtIO transport device driver is added to the guest kernel, to
translate guest physical addresses into grant references, enabling VirtIO data
path communication over mutually-negociated shared memory regions between the
guest virtual machine and the device model backend. Improves isolation as
backend does not need privilege over the guest to perform foreign mappings.
Grant references are a Xen-specific interface. Design supports driver domains.

Status: A prototype is described in the presentation at this Xen Summit 2021.

VirtIO and Xen with Full Grant Support:

* VirtIO-Argo: introducing a new VirtIO transport driver that uses Argo for
interdomain communication, supporting isolation and Mandatory Access Control.
Design and analysis performed within the OpenXT and Xen communities.

A new VirtIO transport device driver is added to the guest kernel to transmit
data between the guest domain and the domain hosting the device model via Argo
rings: a Hypervisor-Mediated data eXchange protocol where the hypervisor
transfers the data, being trusted to strictly adhere to the delivery protocol.
Supports stronger isolation properties and enforcement of Mandatory Access
Control security policy over interdomain communication. Does not use shared
memory between domains.
Development of a Hypervisor-agnostic interface for Argo has been proposed
and discussed within the Xen community. Design supports driver domains.

Status: Design and analysis published; funding required for development to

VirtIO-Argo: Documentation at the OpenXT wiki:
VirtIO-Argo Development:
Minutes from the Argo HMX Transport for VirtIO topic call, 14th January 2021:

* 'VirtIO-Xen-GSoC': 2011 VirtIO on Xen student project
A Google Summer of Code project by Wei Liu investigated enabling VirtIO on Xen.

A working prototype was produced for both PV and HVM guests, using XenBus and
the Qemu VirtIO backends. PV guests require a guest kernel patch to translate
guest physical addresses to machine addresses in VirtIO rings.

Status: project completed.


Summary of the VirtIO Design Session:

xl toolstack guest config file syntax for VirtIO devices:
    - Recommend: mix of: device-specific config (eg. disk, net) plus support for
                 generic VirtIO device config (eg. esoteric devices)

VirtIO spec:
    - spec change needed? understood as not mandatory
    - v1.1 has a platform feature for DMA addr translation: enables transport
      driver to use any of grants, pre-shared memory, Argo in the data path
        - ie. does not force use of (guest) physical addrs in the ring
    - open question re: Linux and qemu acceptance of non-standardized driver

Guest access to new VirtIO transport
    - for Argo or grants, add new transport driver (eg. out of tree module)

Performance, standardization
    - different characteristics on different architectures; motivates
      development and deployment of different transports for separate cases
    - difficulty of ensuring correctness with shared memory vs. performance
      achievable with some configurations

Atomics in shared memory, Arm, "Fat Virtqueue" development
    - Arm and RISC-V have challenges with atomic inst in shared memory
    - VirtIO-SuSE (using grants) useful enough for consideration
    - Wind River preshared-memory memcpy approach (OpenAMP/kvmtool)
    - Linaro "Fat Virtqueue" under development: pass data within enlarged rings

Data copies with Argo and shared memory transports
    - discussion of copies made by different transports, different conditions
    - data copies performed to protect correctness in communication

Ecosystems, differentiation
    - standarization of VirtIO shifted economics for hypervisor developers
    - Mandatory Access Control with Xen technologies will be a Xen advantage

Detailed notes for the Design Session:

Rich (OpenXT): introducing session; to talk about:
    - how any solution will be ratified by the VirtIO/OASIS community
        - esp automotive use cases: a lot of attention paid to formal
          acceptance of any VirtIO solution

    - non-Linux guest VMs
        - Windows is used in OpenXT, Qubes
        - FreeBSD is popular for networking and storage

Roger (Citrix; x86 maintainer) [chat]: I wanted to ask whether the Linux kernel
/QEMU code could be accepted without any spec change?

--- topic: xl toolstack guest config file syntax for VirtIO devices

Juergen (SUSE; PV interfaces maintainer): shared window with VM config:
disk=[ 'file:/home/vm/upstream/image,hda,w', ]
vif=[ 'mac:00:16:3e:06:a7:21,bridge=br0', ]
vfb=[ 'vnclisten=localhost:0', ]
# device_model_args=[ '-drive', 'file=/home/vm/sle15sp3/image,if=virtio,index=1,media=disk,format=raw,cache=writeback' ]
device_model_args=[ '-drive', 'file=/home/vm/sle15sp3/image,if=none,id=drive-virtio-disk0,format=raw', '-device', 'virtio-blk-pci,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=0,disable-legacy=on' ]

Juergen: Request for input on xl config file syntax for configuring VirtIO
    - see device_model_args in example (force VirtIO block to non-legacy mode)
    - alt option: add a 'virtio' specifier to the disk line
        - see recent patch by Artem
    - need to consider generic devices: non-network, non-disk device classes

Q: Should VirtIO disk be added under 'disk', or a new special VirtIO disk?

Julien (Amazon; Arm maintainer):
    - want to avoid having one vif, one framebuffer, one gps, etc.
        - so instead have generic device options (eg. block, gps)
            - ie. similar to Qemu

Marek (Qubes; Python bindings maintainer):
    - libvirt bindings perspective: Preference for the other way
        - don't want duplicated eg. disk handling, for the VirtIO case
        - unified handling for a device class is more convenient for API users

Stefano (Xilinx; Arm maintainer):
    - Disk is a special case: config already has a line for disk,
      already have a way to specify different types - network may be similar
    - alt method is better for other devices, eg. virtio-rng
        - for those, device_model_arg is probably the best

Andy (Citrix, x86 maintainer):
    - a lot of this is PCI-based, even if non-DMA transport so:
        - need to preserve support for multi-IOREQ servers
        - feasible to have VirtIO backends that are not Qemu
    - so: don't tie the configuration to Qemu

OleksandrT (EPAM) [chat]: Initially I created a new "vdisk" property for
virto.disk, but was asked to reuse existing "disk" configuration in libxl
if possible.

Juergen: ack; esp for Arm, mmio-based config: so we should be flexible.
Aiming to determine:
  1: Configuration format for VirtIO devices in guests
  2: Surfacing VirtIO devices in guest VMs:
        - tie it to the PCI specification or not?
        - challenge if guests do not have PCI detection

    - Responses:
        - PCI is fairly standard
        - issue with mmio: hotplugging devices
              - ranges to be configured => need number of devices in advance
              - hotplug already requires upfront work, due to grants, events
        - PCI and mmio are similar; choice depends on guest and ecosystem

George (Citrix; Xen Community Manager):
    - makes sense for a case to standardize on Argo?
    - Argo vs grant table vs shared memory : why duplicate the extra effort?
    - how much effort duplicated for multiple? is it pretty easy to swap them

Rich [chat]: Important for automotive, FuSA, security

Christopher (Apertus/Star Lab; Argo maintainer):
    - Options are not exclusive: VirtIO can use different transports per-device
    - Transports orthogonal eg. Argo devices enumerable via the VirtIO-Argo
      driver; Argo enables Mandatory Access Control over the data transport

--- topic: VirtIO spec change; addresses on VirtIO rings

Concern: requirement for a VirtIO spec change

- Discussion re: is spec change mandatory?
    - may be needed: VirtIO spec says pass guest physical addr on ring
    - may not be: translation step in the transport layer;
                  enables translation so don't need to change spec
    - is spec change needed for config negotiation, to announce feature?
    - response: no; transport driver does enumerations, so up to the driver

Stefano: "VirtIO" means different things: spec, transport, drivers
    - since transports are part of the spec: VirtIO-PCI, VirtIO-MMIO, ...,
      adding one (VirtIO-Argo) means adding to the spec
Daniel: no ..

Juergen: VirtIO spec does not demand physical addresses on the ring pages
    - DMA addresses are allowed
    - eg. a special DMA mechanism, which happens to be grants, is fine.
        - ie. We have a special DMA engine.
Andy: Yes. DMA addresses != identity map onto physical or guest phys addresses

Daniel: there is a spec for the transports: if implementing a PCI transport, or
  a MMIO transport, has certain characteristics of how those should behave;
  so the driver-to-transport interface is fixed.
  It doesn't fix the transport back, which was the analysis that Christopher and
  I were looking at.
  Fixed from front-end perspective: interface between the driver and transport.
  If the transport presents as a PCI device transport, there's an interface that
  the driver expects that transport to provide.
     - It doesn't fix how it happens from the transport back.

George: Frontend VirtIO driver hands phys addr to frontend transport and the
       frontend transport can convert it to eg. grant tables to do what it wants

Daniel: right
    - plus more behaviours, esp for PCI: eg. structures expected to be there
      so the drivers can expect a common behaviour for any PCI transport
        - transport is translation layer, can handle dealing with actual
          physical backend mechanism that you're going over

Stefano: so plan is to add new transport: would that need to be in the spec?
Daniel: no

--- topic: Guest access to new VirtIO transport

George: VirtIO frontend drivers would work with the new transport.
    - a current VM image won't have transport implemented
        - have to update to a new version of kernel, or
        - a new version of image to get new transport?

Daniel: Well, that's one way; for Argo, can present as either a PCI transport or
a MMIO transport to the driver, so the driver understands how to interact with
the transport; and the transport handles moving data back. For Argo, we move
everything onto Argo rings.  
To gain that on Linux, can do an out of tree kernel module, and load that
without having to recompile the entire kernel (unless they don't have loadable
module support).
- responses:
    - needs the frontend transport in the guest;
    - so can't just use an existing image unmodified -- need the compiled
      kernel driver loaded to make the transport available
    - objective will be to add it to the upstream Linux kernel

Andy: might be able to get a frontend Windows driver somewhere useful

Stefano: fewer changes to the spec are better: looks like nothing, fantastic
Second is changes to the code; seems limited to transport driver, is a lot
better than having to change all the frontend drivers

    - Christopher: yes; point is to keep all of those and should be able to
                   plug them into backend drivers in the Qemu implementation.
    - Juergen: Yes. Actual backends unchanged and only lower levels of
               transport layer need some modifications.
        - ie. for all implementations of backend infrastructure:
            - Qemu
            - the kernel for the vhost backends
            - vhost-user, all the user daemons implementing the backend
    - Andy: sounds great overall

---- topic: Performance

George: whether to use Argo or grant tables or something else:
    - Andy: think the answer is that we need both
    - Juergen: Yes.
    - Andy: there are people that need to not have shared memory, and Argo is an
            option there, and there are people who will want shared memory
            because its faster.
    - ...: Is it?
    - Andy: Yes.

Damien Thenot [chat]: Shared memory would be interesting for DPU too

Stefano: Performance is a good metric;
- Able to convince third party proprietary vendor to add a driver?
    - some sort of shared memory useful as a last resort, fallback:
        - ie. easier to say "just add a memcpy" and to copy to the pre-shared
          memory region

George: Ian Pratt presentation said:
    - copying is faster than sharing memory
    - sharing memory is so difficult that noone is smart enough to do it, and we
      were all fools back in the 2000s to think we were smart enough to do it

Christopher [chat]: https://platformsecuritysummit.com/2018/speaker/pratt

Andy: Not using shared memory can get around a lot of common bugs, but if you
      know how to do shared memory, it is still faster than data copying in
      enough cases for it to be relevant, for us to care.
Marek: if you can avoid mapping and unmapping all the time
      eg. can use well known regions.

Andy: a lot of network performance in XenServer is from not mapping
    - dom0 constructs a scatter-gather over the guest memory
      (a granted area, so permissions work) - never actually touched
    - Can't do that with copying mechanism; would force a memcpy into a
      zero overhead transmit path

Stefano: measurements so far always demonstrated grant table is slower than any
         memcpy mechanisms
    - Julien: related to atomic instructions, ref: the previous talk?
    - Stefano: could be; all my latest measurements are on Arm.
    - Andy: Different architectures will make massive differences here

Jason Andryuk (OpenXT) [chat]: @Andrew, you use netfront/back but your dom0 nic
                               scatter/gathers over the guest's granted network
                               frames directly?
Andy [chat]: Yeah. A consequence is that dom0 never touches the mapping, so
             never sets the A bit, so you can skip the TLB flush on the unmap
             side and this is great for performance
Roger [chat]: Note you need to patch Xen to not do the flush on unmap if the A
              bit isn't set

---- topic: Atomics in shared memory, Arm, "Fat Virtqueue" development

Juergen: regarding Arm and shared pages: how does VirtIO and KVM on Arm work?

Julien: don't do atomics in shared page. Lesson learned a while ago.
- Xen has many uses of atomics in shared pages, eg. evtchn, etc.
  For VirtIO, would be nicer to avoid doing the same again.

Stefano: Could we require Arm 8.1? (Is 8.1 introduces the new atomics, or 8.2?)

Julien: Yes, but limited hardware available today
- latest may have it, but not Qemu
- looks like RISC-V may have the same issues

Demi Marie (Qubes OS) [chat]: RISC-V only has 32-bit atomics

Andy: VirtIO-Grant [ie. VirtIO-SuSE in these notes] may not be perfect for all
      future use cases and other options exist - but still enough cases that it
      is worth considering

Demi Marie [chat]: Long-term fix is probably to change the guest-visible ABI to
                   not require atomics in shared memory

Julien: how can it be used with other hypervisors?
- eg. for Argo, looking at something that can be used in other hypervisors
- not possible with grant table, because of the problems I mentioned.
- want to avoid having a Xen-specific driver, ie. tying VirtIO transport to Xen
  due to use of grant table

Stefano: two related activities, both based on shared memory:
    - one driven by Wind River: memcpy-based approach
        - presharing memory, and picking addresses off the shared region

Stefano [chat]: This is WindRiver KVMtools that does shared memory for Virtio

Stefano: second approach is from Linaro: "Fat Virtqueue"
        - increase the ring size and pick addresses within the ring only
            - So the shared memory region is only the ring, and the buffers are
              within the ring. Pointer points to things within the ring.

George: summarizing:
- on Arm copying might be faster, and there are other advantages as well;
- on x86, things may be different
- SuSE and Citrix may work on the grant table one, and
- Epam and Arm, etc. may work on the Argo one
and people can understand why they're choosing which one.

George: Can you also do driver domain things with Argo?
Daniel: Yes

--- topic: Data copies with Argo and shared memory transports

Stefano: with Argo is there a memcpy, or more than one memcpy? who would do it?

Christopher: hypervisor does the copy for the transport.
- guest hypercall op to register a ring, allows hypervisor to blit data in;
    - can think of as a DMA-style operation to deliver data to the guest

Juergen: So for each I/O data, there are three copies:
    - first, into the ring, then by the hypervisor to the other domain,
      and then out of the ring of the other domain to the target address, right?
Andy: That's correct if both the source and destination are buffering the I/O in
      and out of the transport.
Daniel: Right

Andy: Buffering is probably because the backend is normally shared memory.
If you've got a guarantee that it's not shared memory between frontend and
backend, then neither the front or the back VirtIO driver need to buffer in and
out of the transport

Juergen: ok, for a disk I/O: have user data in the guest, want to write to disk
Arbitrary data in memory, needs to be written to the Argo ring buffer, right?
That's copy one.
Andy: No; only destination has an Argo ring. Source just passes arbitrary
      addresses, like scatter-gather DMA.
Juergen: OK, so like grant copy
Andy: Yes.

Julien: With Argo, are the pages always mapped in Xen?
    - Christopher: Yes, for the rings the way that Argo works at the moment is,
                   yes, once you register a ring that mapping is retained so
                   the hypervisor can send to it when a hypercall is invoked.
    - Julien: OK

Demi Marie: is it possible for Argo to copy memory from guest one userspace to
            the ring in guest two?
    - re: disk I/O: from ZFS issue tracker: have to take copies because is not
      safe to operate on data shared with untrusted parties
    - So most cases you're going to have to have a copy.
        - Only exception: extremely carefully written data sources, destinations
            - careful to only make loads exactly once
            - or tools like everparse, or another tool that generates
              formally-verified binary parsers, immune to double-fetch bugs
        - so: prefer to just take the copy
            - because memcpy on modern processors is very, very fast
            - especially while it's stayed in cache

George: Andy was saying: some people definitely going to want Argo, because
it's copying instead of shared memory, and double-fetch issues with shared
memory is definitely an issue

Andy: Yes. A lot of reasons why using shared memory is hard to do correctly;
not impossible but it is hard.

--- topic: Configuration syntax (continued)

Juergen: choice: generic virtio device configuration, vs individual
         device-specific stuff, or just mix variants?

Roger [chat]: I think both

Christopher: good to have generic + allow specialization if you care

Marek: Yes; eg. for network devices, might want a different copying mechanism if
you can avoid copies for network. Same may not be doable for disk, so might want
different mechanisms for disk and network devices.

Juergen: Should VirtIO disk be labelled under the disk item of the guest config,
or a specific virtio disk item? Or virtio-device type = disk, or whatever.

- keep the old disk options, expand to support VirtIO
- add a generic VirtIO option for more esoteric devices that we don't support in
  any other way.
Disks and vifs need a type parameter to select VirtIO as backend.

Juergen: then have a transport layer specification:
(eg. don't care, or grants, or Argo, or whatever)

George: We have the same thing for PCI (strict, relaxed, ...)
ie. Set a global default for xl, plus override in specific domain config

--- topic: Interface to adapt to platform transport

Juergen: spec bit is: VIRTIO_F_ACCESS_PLATFORM feature bit
- indicates "the device can be used on a platform where the device
  access to data memory is limited, and/or translated"
  Exactly what we want; works for grants, etc.
    - is a generic bit from newer virtio spec, not a Xen-specific bit
    - in current impl, assumes running on Xen and means grants, because
      assumes that everything is done for addresses going into the ring

--- topic: Ecosystems, differentiation

Rich: On standardization of VirtIO: it commoditized the driver interface - now
all hypervisors implement VirtIO because it's too expensive to do otherwise.
- using Argo transport to add Mandatory Access Control (needed by many guests)
  is an advantage for Xen, since only with Xen will you get the strong
  guarantees from Argo. Especially with Hyperlaunch, and all the other
  integrity pieces.
    - Challenging for other hypervisors to do MAC transport for VirtIO.

End of session

Open Issues / Technical questions:

- Method of discovery of available transport drivers, negociation which to use
    - OASIS/VirtIO Standardization required? (eg. feature bits needed?)

- Upstreaming path for different pieces to: Linux kernel, Qemu, elsewhere
    - including enabling guest support for FreeBSD, Windows, etc.

- Development of VirtIO "Fat Virtqueues": how does this affect system design?

- Design for interaction with guest virtual IOMMU
    - ref: VirtIO-iommu

- VFIO for device model backends

- Support for complex VirtIO drivers: eg. framebuffers (shared memory regions)

Areas suitable for investment to support VirtIO on Xen:

- Development of the Hypervisor-agnostic interface for Argo
Ref: Xen and OpenXT community discussion minutes:

- Development of the VirtIO-Argo system
Project analysis and design documentation:

- VirtIO-focussed test cases for the Xen Test Framework
    - to provide coverage of the hypervisor interfaces used by VirtIO transports
    - extend the framework capabilities to enable new tests (eg. multi-VM cases)
    - development of support for Arm platforms with XTF

- Rust and Go languages toolstack development for Xen
    - Investment in the Xen toolstack is needed to support hypervisor feature
      development. Toolstack support is necessary for enabling VirtIO guests.
    - Work towards enabling Xen tools in Rust has started:
    - Work towards enabling Xen tools in Go has started:

- System image test integration with Yocto/Openembedded and Qemu
    - Xen and the Xen Test Framework can already be run within Qemu for x86
      in the standard meta-virtualization environment; Arm platform support
      needs to be completed, with coverage for VirtIO guests, and Continuous
      Integration enabled
    - to evaluate: VirtIO as a DISTRO_FEATURE or MACHINE_FEATURE

- Creation of documentation for Argo
    - implementation documentation exists:
    - user interface / client / kernel developer documentation is needed

- Development items for Argo:

- Performance profiling of VirtIO-on-Xen systems across hardware architectures

Additional References:

VirtIO Specification v1.1, OASIS, 11th April 2019:
VirtIO spec maintainers repository:
Organization for the Advancement of Structured Information Standards
ACPI Virtal I/O Translation Table (VIOT) DRAFT v9, December 2020:

Linaro: Stratos Project: Auto / Industrial Demonstrable Milestones

OpenXT: VirtIO and Argo:

OpenXT: VirtIO-Argo Development:

OpenXT: Analysis of Argo as a transport medium for VirtIO:

Virtio with Argo for Xen, Mandatory Access Control
Meeting of Virtualization Experts Group of Automotive Grade Linux, 18th Aug, 2020

VirtIO on Xen hypervisor (Arm), Oleksandr Tyshchenko, EPAM, Linaro Connect 2021

Xen wiki: Argo

OpenXT: Argo: Hypervisor-Mediated data eXchange: Development:

Xen wiki: Virtio On Xen

XCP-ng: IOREQ Server: Device Emulation in the Xen Hypervisor


Video for this Design Session:



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.