[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [openxt-dev] VirtIO-Argo initial development proposal
- To: Jean-Philippe Ouellet <jpo@xxxxxx>
- From: Rich Persaud <persaur@xxxxxxxxx>
- Date: Wed, 23 Dec 2020 16:32:01 -0500
- Cc: Christopher Clark <christopher.w.clark@xxxxxxxxx>, openxt <openxt@xxxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>, roger.pau@xxxxxxxxxx, Julien Grall <jgrall@xxxxxxxxxx>, James McKenzie <james@xxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Paul Durrant <pdurrant@xxxxxxxxxxxx>
- Delivery-date: Wed, 23 Dec 2020 21:32:21 +0000
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
On Dec 17, 2020, at 07:13, Jean-Philippe Ouellet <jpo@xxxxxx> wrote: On Wed, Dec 16, 2020 at 2:37 PM Christopher Clark<christopher.w.clark@xxxxxxxxx> wrote:Hi all,
I have written a page for the OpenXT wiki describing a proposal for
initial development towards the VirtIO-Argo transport driver, and the
related system components to support it, destined for OpenXT and
upstream projects:
https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1
Please review ahead of tomorrow's OpenXT Community Call.
I would draw your attention to the Comparison of Argo interface options section:
https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Comparison-of-Argo-interface-options
where further input to the table would be valuable;
and would also appreciate input on the IOREQ project section:
https://openxt.atlassian.net/wiki/spaces/~cclark/pages/1696169985/VirtIO-Argo+Development+Phase+1#Project:-IOREQ-for-VirtIO-Argo
in particular, whether an IOREQ implementation to support the
provision of devices to the frontends can replace the need for any
userspace software to interact with an Argo kernel interface for the
VirtIO-Argo implementation.
thanks,
Christopher
Hi,Really excited to see this happening, and disappointed that I'm notable to contribute at this time. I don't think I'll be able to jointhe call, but wanted to share some initial thoughts from mymiddle-of-the-night review anyway.Super rough notes in raw unedited notes-to-self form:main point of feedback is: I love the desire to get a non-shared-memtransport backend for virtio standardized. It moves us closer to anHMX-only world. BUT: virtio is relevant to many hypervisors beyondXen, not all of which have the same views on how policy enforcementshould be done, namely some have a preference for capability-orientedmodels over type-enforcement / MAC models. It would be nice if anylabeling encoded into the actual specs / guest-boundary protocolswould be strictly a mechanism, and be policy-agnostic, in particularnot making implicit assumptions about XSM / SELinux / similar. I don'thave specific suggestions at this point, but would love to discuss.thoughts on how to handle device enumeration? hotplug notifications?- can't rely on xenstore- need some internal argo messaging for this?- name service w/ well-known names? starts to look like xenstorepretty quickly...- granular disaggregation of backend device-model providers desirablehow does resource accounting work? each side pays for their own delivery ring?- init in already-guest-mapped mem & simply register?- how does it compare to grant tables? - do you need to go through linux driver to alloc (e.g. xengntalloc)or has way to share arbitrary otherwise not-special userspace pages(e.g. u2mfn, with all its issues (pinning, reloc, etc.))?ioreq is tangled with grant refs, evt chans, generic vmexitdispatcher, instruction decoder, etc. none of which seems desirable iftrying to move towards world with strictly safer guest interfacesexposed (e.g. HMX-only)- there's no io to trap/decode here, it's explicitly exclusively viahypercall to HMX, no?- also, do we want argo sendv hypercall to be always blocking & synchronous? - or perhaps async notify & background copy to other vm addr space? - possibly better scaling? - accounting of in-flight io requests to handle gets complicated(see recent XSA) - PCI-like completion request semantics? (argo as cross-domainsoftware dma engine w/ some basic protocol enforcement?)"port" v4v driver => argo:- yes please! something without all the confidence-inspiringDEBUG_{APPLE,ORANGE,BANANA} indicators of production-worthy code wouldbe great ;)- seems like you may want to redo argo hypercall interface too? (atleast the syscall interface...) - targeting synchronous blocking sendv()? - or some async queue/completion thing too? (like PF_RING, but with*iov entries?) - both could count as HMX, both could enforce no double-write racinggames at dest ring, etc.re v4vchar & doing similar for argo:- we may prefer "can write N bytes? -> yes/no" or "how many bytes canwrite? -> N" over "try to write N bytes -> only wrote M, EAGAIN"- the latter can be implemented over the former, but not the other way around- starts to matter when you want to be able to implement in userspace& provide backpressure to peer userspace without additional buffering& potential lying about durability of writes- breaks cross-domain EPIPE boundary correctness- Qubes ran into same issues when porting vchan from Xen to KVMinitially via vsocksome virtio drivers explicitly use shared mem for more than justcommunication rings:- e.g. virtio-fs, which can map pages as DAX-like fs backing to share page cache- e.g. virtio-gpu, virtio-wayland, virtio-video, which deal in framebuffers- needs thought about how best to map semantics to (or at leastinteroperate cleanly & safely with) HMX-{only,mostly} world - the performance of shared mem actually can meaningfully matter fore.g. large framebuffers in particular due to fundamental memorybandwidth constraintswhat is mentioned PX hypervisor? presumably short for PicoXen? anypublic information?
Not much at the moment, but there is prior public work. PX is an OSS L0 "Protection Hypervisor" in the Hardened Access Terminal (HAT) architecture presented by Daniel Smith at the 2020 Xen Summit: https://youtube.com/watch?v=Wt-SBhFnDZY&t=3m48s
PX is intended to build on lessons learned from IBM Ultravisor, HP/Bromium AX and AIS Bareflank L0 hypervisors:
In the long-term, efficient hypervisor nesting will require close cooperation with silicon and firmware vendors. Note that Intel is introducing TDX (Trust Domain Extensions):
There are also a couple of recent papers from Shanghai Jiao Tong University, on using hardware instructions to accelerate inter-domain HMX.
March 2019: https://ipads.se.sjtu.edu.cn/_media/publications/skybridge-eurosys19.pdf
> we present SkyBridge, a new communication facility designed and optimized for synchronous IPC in microkernels. SkyBridge requires no involvement of kernels during communication and allows a process to directly switch to the virtual address space of the target process and invoke the target function. SkyBridge retains the traditional virtual address space isolation and thus can be easily integrated into existing microkernels. The key idea of SkyBridge is to leverage a commodity hardware feature for virtualization (i.e., [Intel EPT] VMFUNC) to achieve efficient IPC. To leverage the hardware feature, SkyBridge inserts a tiny virtualization layer (Rootkernel) beneath the original microkernel (Subkernel). The Rootkernel is carefully designed to eliminate most virtualization overheads. SkyBridge also integrates a series of techniques to guarantee the security properties of IPC. We have implemented SkyBridge on three popular open-source microkernels (seL4, Fiasco.OC, and Google Zircon). The evaluation results show that SkyBridge improves the speed of IPC by 1.49x to 19.6x for microbenchmarks. For real-world applications (e.g., SQLite3 database), SkyBridge improves the throughput by 81.9%, 1.44x and 9.59x for the three microkernels on average.
July 2020: https://ipads.se.sjtu.edu.cn/_media/publications/guatc20.pdf
> a redesign of traditional microkernel OSes to harmonize the tension between messaging performance and isolation. UnderBridge moves the OS components of a microkernel between user space and kernel space at runtime while enforcing consistent isolation. It retrofits Intel Memory Protection Key for Userspace (PKU) in kernel space to achieve such isolation efficiently and design a fast IPC mechanism across those OS components. Thanks to PKU’s extremely low overhead, the inter-process communication (IPC) roundtrip cost in UnderBridge can be as low as 109 cycles. We have designed and implemented a new microkernel called ChCore based on UnderBridge and have also ported UnderBridge to three mainstream microkernels, i.e., seL4, Google Zircon, and Fiasco.OC. Evaluations show that UnderBridge speeds up the IPC by 3.0× compared with the state-of-the-art (e.g., SkyBridge) and improves the performance of IPC-intensive applications by up to 13.1× for the above three microkernels
For those interested in Argo and VirtIO, there will be a conference call on Thursday, Jan 14th 2021, at 1600 UTC.
Rich |
|