[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [openxt-dev] VirtIO-Argo initial development proposal
On Thu, Dec 17, 2020 at 4:13 AM Jean-Philippe Ouellet <jpo@xxxxxx> wrote: > > On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark > <christopher.w.clark@xxxxxxxxx> wrote: > > Hi all, > > > > I have written a page for the OpenXT wiki describing a proposal for > > initial development towards the VirtIO-Argo transport driver, and the > > related system components to support it, destined for OpenXT and > > upstream projects: > > > > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1 > > > > Please review ahead of tomorrow's OpenXT Community Call. > > > > I would draw your attention to the Comparison of Argo interface options > > section: > > > > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Comparison-of-Argo-interface-options > > > > where further input to the table would be valuable; > > and would also appreciate input on the IOREQ project section: > > > > https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Project:-IOREQ-for-VirtIO-Argo > > > > in particular, whether an IOREQ implementation to support the > > provision of devices to the frontends can replace the need for any > > userspace software to interact with an Argo kernel interface for the > > VirtIO-Argo implementation. > > > > thanks, > > Christopher > > Hi, > > Really excited to see this happening, and disappointed that I'm not > able to contribute at this time. I don't think I'll be able to join > the call, but wanted to share some initial thoughts from my > middle-of-the-night review anyway. Thanks for the review and positive feedback - appreciated. > Super rough notes in raw unedited notes-to-self form: > > main point of feedback is: I love the desire to get a non-shared-mem > transport backend for virtio standardized. It moves us closer to an > HMX-only world. BUT: virtio is relevant to many hypervisors beyond > Xen, not all of which have the same views on how policy enforcement > should be done, namely some have a preference for capability-oriented > models over type-enforcement / MAC models. It would be nice if any > labeling encoded into the actual specs / guest-boundary protocols > would be strictly a mechanism, and be policy-agnostic, in particular > not making implicit assumptions about XSM / SELinux / similar. I don't > have specific suggestions at this point, but would love to discuss. That is an interesting point; thanks. It is more about the features and specification of Argo itself and its interfaces than the use of it to implement a VirtIO transport, but is good to consider. We have a OpenXT wiki page for Argo development, and have a related item described there about having the hypervisor and remote guest kernel provide message context about the communication source to the receiver, to support policy decisions: https://openxt.atlassian.net/wiki/spaces/DC/pages/737345538/Argo+Hypervisor-Mediated+data+eXchange+Development > thoughts on how to handle device enumeration? hotplug notifications? > - can't rely on xenstore > - need some internal argo messaging for this? > - name service w/ well-known names? starts to look like xenstore > pretty quickly... I don't think we have a firm decision on this. We have been considering using ACPI-tables and/or Device Tree for device enumeration, which is viable for devices that are statically assigned, and hotplug is an additional case to design for. We'll be looking at the existing VirtIO transports too. Handling notifications on a well-known Argo port is a reasonable direction to go and fits with applying XSM policy to govern Argo port connectivity between domains. https://openxt.atlassian.net/wiki/spaces/DC/pages/1333428225/Analysis+of+Argo+as+a+transport+medium+for+VirtIO#Argo:-Device-discovery-and-driver-registration-with-Virtio-Argo-transport > - granular disaggregation of backend device-model providers desirable agreed > how does resource accounting work? each side pays for their own delivery ring? > - init in already-guest-mapped mem & simply register? Yes: rings are registered with a domain's own memory for receiving messages. > - how does it compare to grant tables? The grant tables are the Xen mechanism for a VM to instruct the hypervisor to grant another VM permission to establish shared memory mappings, or to copy data between domains. Argo is an alternative mechanism for communicating between VMs that does not share memory between them and provides different properties that are supportive of isolation and access control. There's a presentation with an overview of Argo from the 2019 Xen Design and Developer Summit: https://static.sched.com/hosted_files/xensummit19/92/Argo%20and%20HMX%20-%20OpenXT%20-%20Christopher%20Clark%20-%20Xen%20Summit%202019.pdf https://www.youtube.com/watch?v=cnC0Tg3jqJQ&list=PLYyw7IQjL-zHmP6CuqwuqfXNK5QLmU7Ur&index=15 > - do you need to go through linux driver to alloc (e.g. xengntalloc) > or has way to share arbitrary otherwise not-special userspace pages > (e.g. u2mfn, with all its issues (pinning, reloc, etc.))? In the current Argo device driver implementations, userspace does not have direct access to Argo message rings. Instead the kernel provides devices that can be used to send and receive data with familiar I/O primitives via those. For the VirtIO-Argo transport, userspace would not need to be aware of the use of Argo - the VirtIO virtual devices will present themselves to userspace with the same VirtIO device interfaces as when they use any other transport. > ioreq is tangled with grant refs, evt chans, generic vmexit > dispatcher, instruction decoder, etc. none of which seems desirable if > trying to move towards world with strictly safer guest interfaces > exposed (e.g. HMX-only) ack > - there's no io to trap/decode here, it's explicitly exclusively via > hypercall to HMX, no? Yes; as Roger noted in his reply in this thread, the interest in IOREQ has been motivated by other recent VirtIO activity in the Xen Community, and whether some potential might exist for alignment with that work. > - also, do we want argo sendv hypercall to be always blocking & synchronous? > - or perhaps async notify & background copy to other vm addr space? > - possibly better scaling? > - accounting of in-flight io requests to handle gets complicated > (see recent XSA) > - PCI-like completion request semantics? (argo as cross-domain > software dma engine w/ some basic protocol enforcement?) I think implementation of an asynchronous delivery primitive for Argo is worth exploring given its potential for achieving different performance characteristics which could enable it to support additional use cases. It is likely beyond the scope of the initial VirtIO-Argo driver development, but enabling VirtIO guest drivers to use Argo will allow testing to determine which uses of it could benefit from further investment. > "port" v4v driver => argo: > - yes please! something without all the confidence-inspiring > DEBUG_{APPLE,ORANGE,BANANA} indicators of production-worthy code would > be great ;) > - seems like you may want to redo argo hypercall interface too? The Xen community has plans to remove all the uses of virtual addresses from the hypervisor interface, and the Argo interface will need to be updated as part of that work. In addition, work to incorporate further features from v4v, and some updates to Argo per items on the OpenXT Argo development wiki page, will also involve some updates to the interface. > (at least the syscall interface...) Yes: a new Argo Linux driver will likely have quite a different interface to userspace to the current one; it's been discussed in the OpenXT community and the notes from the discussion are here: https://openxt.atlassian.net/wiki/spaces/DC/pages/775389197/New+Linux+Driver+for+Argo There is motivation to support both a networking and non-networking interface, so that network-enabled guest OSes can use familiar primitives and software, and non-network-enabled guests are still able to use Argo communication. > - targeting synchronous blocking sendv()? > - or some async queue/completion thing too? (like PF_RING, but with > *iov entries?) > - both could count as HMX, both could enforce no double-write racing > games at dest ring, etc. The immediate focus is on building a modern, hopefully simple, driver that unblocks the immediate use cases we have, allowing us to retire the existing driver, and is suitable for submission and maintenance in the kernel upstream. > re v4vchar & doing similar for argo: > - we may prefer "can write N bytes? -> yes/no" or "how many bytes can > write? -> N" over "try to write N bytes -> only wrote M, EAGAIN" > - the latter can be implemented over the former, but not the other way around > - starts to matter when you want to be able to implement in userspace > & provide backpressure to peer userspace without additional buffering > & potential lying about durability of writes > - breaks cross-domain EPIPE boundary correctness > - Qubes ran into same issues when porting vchan from Xen to KVM > initially via vsock Thanks - that's helpful and will look at that when the driver work proceeds. > some virtio drivers explicitly use shared mem for more than just > communication rings: > - e.g. virtio-fs, which can map pages as DAX-like fs backing to share page > cache > - e.g. virtio-gpu, virtio-wayland, virtio-video, which deal in framebuffers > - needs thought about how best to map semantics to (or at least > interoperate cleanly & safely with) HMX-{only,mostly} world > - the performance of shared mem actually can meaningfully matter for > e.g. large framebuffers in particular due to fundamental memory > bandwidth constraints This is an important point and given the clear utility of these drivers it will be worth exploring what can be done to meet their performance requirements and satisfy the semantics needed for them to function. It may be the case that shared memory regions are going to be necessary for some classes of driver - some investigation required. Along the lines of the research that Rich included in his reply, it would be interesting to see whether modern hardware provides primitives that can support efficient cross-domain data transport that could be used for this. Thanks for raising it. Christopher
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |