[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [openxt-dev] VirtIO-Argo initial development proposal



On Thu, Dec 17, 2020 at 4:13 AM Jean-Philippe Ouellet <jpo@xxxxxx> wrote:
>
> On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark
> <christopher.w.clark@xxxxxxxxx> wrote:
> > Hi all,
> >
> > I have written a page for the OpenXT wiki describing a proposal for
> > initial development towards the VirtIO-Argo transport driver, and the
> > related system components to support it, destined for OpenXT and
> > upstream projects:
> >
> > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1
> >
> > Please review ahead of tomorrow's OpenXT Community Call.
> >
> > I would draw your attention to the Comparison of Argo interface options 
> > section:
> >
> > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Comparison-of-Argo-interface-options
> >
> > where further input to the table would be valuable;
> > and would also appreciate input on the IOREQ project section:
> >
> > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Project:-IOREQ-for-VirtIO-Argo
> >
> > in particular, whether an IOREQ implementation to support the
> > provision of devices to the frontends can replace the need for any
> > userspace software to interact with an Argo kernel interface for the
> > VirtIO-Argo implementation.
> >
> > thanks,
> > Christopher
>
> Hi,
>
> Really excited to see this happening, and disappointed that I'm not
> able to contribute at this time. I don't think I'll be able to join
> the call, but wanted to share some initial thoughts from my
> middle-of-the-night review anyway.

Thanks for the review and positive feedback - appreciated.

> Super rough notes in raw unedited notes-to-self form:
>
> main point of feedback is: I love the desire to get a non-shared-mem
> transport backend for virtio standardized. It moves us closer to an
> HMX-only world. BUT: virtio is relevant to many hypervisors beyond
> Xen, not all of which have the same views on how policy enforcement
> should be done, namely some have a preference for capability-oriented
> models over type-enforcement / MAC models. It would be nice if any
> labeling encoded into the actual specs / guest-boundary protocols
> would be strictly a mechanism, and be policy-agnostic, in particular
> not making implicit assumptions about XSM / SELinux / similar. I don't
> have specific suggestions at this point, but would love to discuss.

That is an interesting point; thanks. It is more about the features
and specification of Argo itself and its interfaces than the use of it
to implement a VirtIO transport, but is good to consider. We have a
OpenXT wiki page for Argo development, and have a related item
described there about having the hypervisor and remote guest kernel
provide message context about the communication source to the
receiver, to support policy decisions:

https://openxt.atlassian.net/wiki/spaces/DC/pages/737345538/Argo+Hypervisor-Mediated+data+eXchange+Development

> thoughts on how to handle device enumeration? hotplug notifications?
> - can't rely on xenstore
> - need some internal argo messaging for this?
> - name service w/ well-known names? starts to look like xenstore
> pretty quickly...

I don't think we have a firm decision on this. We have been
considering using ACPI-tables and/or Device Tree for device
enumeration, which is viable for devices that are statically assigned,
and hotplug is an additional case to design for. We'll be looking at
the existing VirtIO transports too.

Handling notifications on a well-known Argo port is a reasonable
direction to go and fits with applying XSM policy to govern Argo port
connectivity between domains.

https://openxt.atlassian.net/wiki/spaces/DC/pages/1333428225/Analysis+of+Argo+as+a+transport+medium+for+VirtIO#Argo:-Device-discovery-and-driver-registration-with-Virtio-Argo-transport

> - granular disaggregation of backend device-model providers desirable

agreed

> how does resource accounting work? each side pays for their own delivery ring?
> - init in already-guest-mapped mem & simply register?

Yes: rings are registered with a domain's own memory for receiving messages.

> - how does it compare to grant tables?

The grant tables are the Xen mechanism for a VM to instruct the
hypervisor to grant another VM permission to establish shared memory
mappings, or to copy data between domains. Argo is an alternative
mechanism for communicating between VMs that does not share memory
between them and provides different properties that are supportive of
isolation and access control.

There's a presentation with an overview of Argo from the 2019 Xen
Design and Developer Summit:
https://static.sched.com/hosted_files/xensummit19/92/Argo%20and%20HMX%20-%20OpenXT%20-%20Christopher%20Clark%20-%20Xen%20Summit%202019.pdf
https://www.youtube.com/watch?v=cnC0Tg3jqJQ&list=PLYyw7IQjL-zHmP6CuqwuqfXNK5QLmU7Ur&index=15

>   - do you need to go through linux driver to alloc (e.g. xengntalloc)
> or has way to share arbitrary otherwise not-special userspace pages
> (e.g. u2mfn, with all its issues (pinning, reloc, etc.))?

In the current Argo device driver implementations, userspace does not
have direct access to Argo message rings. Instead the kernel provides
devices that can be used to send and receive data with familiar I/O
primitives via those.

For the VirtIO-Argo transport, userspace would not need to be aware of
the use of Argo - the VirtIO virtual devices will present themselves
to userspace with the same VirtIO device interfaces as when they use
any other transport.

> ioreq is tangled with grant refs, evt chans, generic vmexit
> dispatcher, instruction decoder, etc. none of which seems desirable if
> trying to move towards world with strictly safer guest interfaces
> exposed (e.g. HMX-only)

ack

> - there's no io to trap/decode here, it's explicitly exclusively via
> hypercall to HMX, no?

Yes; as Roger noted in his reply in this thread, the interest in IOREQ
has been motivated by other recent VirtIO activity in the Xen
Community, and whether some potential might exist for alignment with
that work.

> - also, do we want argo sendv hypercall to be always blocking & synchronous?
>   - or perhaps async notify & background copy to other vm addr space?
>   - possibly better scaling?
>   - accounting of in-flight io requests to handle gets complicated
> (see recent XSA)
>   - PCI-like completion request semantics? (argo as cross-domain
> software dma engine w/ some basic protocol enforcement?)

I think implementation of an asynchronous delivery primitive for Argo
is worth exploring given its potential for achieving different
performance characteristics which could enable it to support
additional use cases.
It is likely beyond the scope of the initial VirtIO-Argo driver
development, but enabling VirtIO guest drivers to use Argo will allow
testing to determine which uses of it could benefit from further
investment.

> "port" v4v driver => argo:
> - yes please! something without all the confidence-inspiring
> DEBUG_{APPLE,ORANGE,BANANA} indicators of production-worthy code would
> be great ;)
> - seems like you may want to redo argo hypercall interface too?

The Xen community has plans to remove all the uses of virtual
addresses from the hypervisor interface, and the Argo interface will
need to be updated as part of that work. In addition, work to
incorporate further features from v4v, and some updates to Argo per
items on the OpenXT Argo development wiki page, will also involve some
updates to the interface.

> (at least the syscall interface...)

Yes: a new Argo Linux driver will likely have quite a different
interface to userspace to the current one; it's been discussed in the
OpenXT community and the notes from the discussion are here:

https://openxt.atlassian.net/wiki/spaces/DC/pages/775389197/New+Linux+Driver+for+Argo

There is motivation to support both a networking and non-networking
interface, so that network-enabled guest OSes can use familiar
primitives and software, and non-network-enabled guests are still able
to use Argo communication.

>   - targeting synchronous blocking sendv()?
>   - or some async queue/completion thing too? (like PF_RING, but with
> *iov entries?)
>   - both could count as HMX, both could enforce no double-write racing
> games at dest ring, etc.

The immediate focus is on building a modern, hopefully simple, driver
that unblocks the immediate use cases we have, allowing us to retire
the existing driver, and is suitable for submission and maintenance in
the kernel upstream.

> re v4vchar & doing similar for argo:
> - we may prefer "can write N bytes? -> yes/no" or "how many bytes can
> write? -> N" over "try to write N bytes -> only wrote M, EAGAIN"
> - the latter can be implemented over the former, but not the other way around
> - starts to matter when you want to be able to implement in userspace
> & provide backpressure to peer userspace without additional buffering
> & potential lying about durability of writes
> - breaks cross-domain EPIPE boundary correctness
> - Qubes ran into same issues when porting vchan from Xen to KVM
> initially via vsock

Thanks - that's helpful and will look at that when the driver work proceeds.

> some virtio drivers explicitly use shared mem for more than just
> communication rings:
> - e.g. virtio-fs, which can map pages as DAX-like fs backing to share page 
> cache
> - e.g. virtio-gpu, virtio-wayland, virtio-video, which deal in framebuffers
> - needs thought about how best to map semantics to (or at least
> interoperate cleanly & safely with) HMX-{only,mostly} world
>   - the performance of shared mem actually can meaningfully matter for
> e.g. large framebuffers in particular due to fundamental memory
> bandwidth constraints

This is an important point and given the clear utility of these
drivers it will be worth exploring what can be done to meet their
performance requirements and satisfy the semantics needed for them to
function.  It may be the case that shared memory regions are going to
be necessary for some classes of driver - some investigation required.
Along the lines of the research that Rich included in his reply, it
would be interesting to see whether modern hardware provides
primitives that can support efficient cross-domain data transport that
could be used for this. Thanks for raising it.

Christopher



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.