[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Enabling hypervisor agnosticism for VirtIO backends



On Wed, Aug 18, 2021 at 08:35:51AM +0000, Wei Chen wrote:
> Hi Akashi,
> 
> > -----Original Message-----
> > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > Sent: 2021年8月18日 13:39
> > To: Wei Chen <Wei.Chen@xxxxxxx>
> > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini
> > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>; Stratos
> > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-dev@lists.oasis-
> > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh Kumar
> > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik <cvanscha@xxxxxxxxxxxxxxxx>;
> > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; Jean-
> > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>; Julien
> > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul Durrant
> > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> >
> > On Tue, Aug 17, 2021 at 08:39:09AM +0000, Wei Chen wrote:
> > > Hi Akashi,
> > >
> > > > -----Original Message-----
> > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > > > Sent: 2021年8月17日 16:08
> > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini
> > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>;
> > Stratos
> > > > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-
> > dev@lists.oasis-
> > > > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh Kumar
> > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; Jean-
> > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>;
> > Julien
> > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul Durrant
> > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> > > >
> > > > Hi Wei, Oleksandr,
> > > >
> > > > On Mon, Aug 16, 2021 at 10:04:03AM +0000, Wei Chen wrote:
> > > > > Hi All,
> > > > >
> > > > > Thanks for Stefano to link my kvmtool for Xen proposal here.
> > > > > This proposal is still discussing in Xen and KVM communities.
> > > > > The main work is to decouple the kvmtool from KVM and make
> > > > > other hypervisors can reuse the virtual device implementations.
> > > > >
> > > > > In this case, we need to introduce an intermediate hypervisor
> > > > > layer for VMM abstraction, Which is, I think it's very close
> > > > > to stratos' virtio hypervisor agnosticism work.
> > > >
> > > > # My proposal[1] comes from my own idea and doesn't always represent
> > > > # Linaro's view on this subject nor reflect Alex's concerns.
> > Nevertheless,
> > > >
> > > > Your idea and my proposal seem to share the same background.
> > > > Both have the similar goal and currently start with, at first, Xen
> > > > and are based on kvm-tool. (Actually, my work is derived from
> > > > EPAM's virtio-disk, which is also based on kvm-tool.)
> > > >
> > > > In particular, the abstraction of hypervisor interfaces has a same
> > > > set of interfaces (for your "struct vmm_impl" and my "RPC interfaces").
> > > > This is not co-incident as we both share the same origin as I said
> > above.
> > > > And so we will also share the same issues. One of them is a way of
> > > > "sharing/mapping FE's memory". There is some trade-off between
> > > > the portability and the performance impact.
> > > > So we can discuss the topic here in this ML, too.
> > > > (See Alex's original email, too).
> > > >
> > > Yes, I agree.
> > >
> > > > On the other hand, my approach aims to create a "single-binary"
> > solution
> > > > in which the same binary of BE vm could run on any hypervisors.
> > > > Somehow similar to your "proposal-#2" in [2], but in my solution, all
> > > > the hypervisor-specific code would be put into another entity (VM),
> > > > named "virtio-proxy" and the abstracted operations are served via RPC.
> > > > (In this sense, BE is hypervisor-agnostic but might have OS
> > dependency.)
> > > > But I know that we need discuss if this is a requirement even
> > > > in Stratos project or not. (Maybe not)
> > > >
> > >
> > > Sorry, I haven't had time to finish reading your virtio-proxy completely
> > > (I will do it ASAP). But from your description, it seems we need a
> > > 3rd VM between FE and BE? My concern is that, if my assumption is right,
> > > will it increase the latency in data transport path? Even if we're
> > > using some lightweight guest like RTOS or Unikernel,
> >
> > Yes, you're right. But I'm afraid that it is a matter of degree.
> > As far as we execute 'mapping' operations at every fetch of payload,
> > we will see latency issue (even in your case) and if we have some solution
> > for it, we won't see it neither in my proposal :)
> >
> 
> Oleksandr has sent a proposal to Xen mailing list to reduce this kind
> of "mapping/unmapping" operations. So the latency caused by this behavior
> on Xen may eventually be eliminated, and Linux-KVM doesn't have that problem.

Obviously, I have not yet caught up there in the discussion.
Which patch specifically?

-Takahiro Akashi

> > > > Specifically speaking about kvm-tool, I have a concern about its
> > > > license term; Targeting different hypervisors and different OSs
> > > > (which I assume includes RTOS's), the resultant library should be
> > > > license permissive and GPL for kvm-tool might be an issue.
> > > > Any thoughts?
> > > >
> > >
> > > Yes. If user want to implement a FreeBSD device model, but the virtio
> > > library is GPL. Then GPL would be a problem. If we have another good
> > > candidate, I am open to it.
> >
> > I have some candidates, particularly for vq/vring, in my mind:
> > * Open-AMP, or
> > * corresponding Free-BSD code
> >
> 
> Interesting, I will look into them : )
> 
> Cheers,
> Wei Chen
> 
> > -Takahiro Akashi
> >
> >
> > > > -Takahiro Akashi
> > > >
> > > >
> > > > [1] https://op-lists.linaro.org/pipermail/stratos-dev/2021-
> > > > August/000548.html
> > > > [2] https://marc.info/?l=xen-devel&m=162373754705233&w=2
> > > >
> > > > >
> > > > > > From: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>
> > > > > > Sent: 2021年8月14日 23:38
> > > > > > To: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>; Stefano
> > Stabellini
> > > > <sstabellini@xxxxxxxxxx>
> > > > > > Cc: Alex Benn??e <alex.bennee@xxxxxxxxxx>; Stratos Mailing List
> > > > <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-dev@xxxxxxxxxxxxxxxxxxxx;
> > Arnd
> > > > Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh Kumar
> > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; Jean-
> > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > > > <mathieu.poirier@xxxxxxxxxx>; Wei Chen <Wei.Chen@xxxxxxx>; Oleksandr
> > > > Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>;
> > Julien
> > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul Durrant
> > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> > > > > >
> > > > > > Hello, all.
> > > > > >
> > > > > > Please see some comments below. And sorry for the possible format
> > > > issues.
> > > > > >
> > > > > > > On Wed, Aug 11, 2021 at 9:27 AM AKASHI Takahiro
> > > > <mailto:takahiro.akashi@xxxxxxxxxx> wrote:
> > > > > > > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini
> > wrote:
> > > > > > > > CCing people working on Xen+VirtIO and IOREQs. Not trimming
> > the
> > > > original
> > > > > > > > email to let them read the full context.
> > > > > > > >
> > > > > > > > My comments below are related to a potential Xen
> > implementation,
> > > > not
> > > > > > > > because it is the only implementation that matters, but
> > because it
> > > > is
> > > > > > > > the one I know best.
> > > > > > >
> > > > > > > Please note that my proposal (and hence the working prototype)[1]
> > > > > > > is based on Xen's virtio implementation (i.e. IOREQ) and
> > > > particularly
> > > > > > > EPAM's virtio-disk application (backend server).
> > > > > > > It has been, I believe, well generalized but is still a bit
> > biased
> > > > > > > toward this original design.
> > > > > > >
> > > > > > > So I hope you like my approach :)
> > > > > > >
> > > > > > > [1] https://op-lists.linaro.org/pipermail/stratos-dev/2021-
> > > > August/000546.html
> > > > > > >
> > > > > > > Let me take this opportunity to explain a bit more about my
> > approach
> > > > below.
> > > > > > >
> > > > > > > > Also, please see this relevant email thread:
> > > > > > > > https://marc.info/?l=xen-devel&m=162373754705233&w=2
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, 4 Aug 2021, Alex Bennée wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > One of the goals of Project Stratos is to enable hypervisor
> > > > agnostic
> > > > > > > > > backends so we can enable as much re-use of code as possible
> > and
> > > > avoid
> > > > > > > > > repeating ourselves. This is the flip side of the front end
> > > > where
> > > > > > > > > multiple front-end implementations are required - one per OS,
> > > > assuming
> > > > > > > > > you don't just want Linux guests. The resultant guests are
> > > > trivially
> > > > > > > > > movable between hypervisors modulo any abstracted paravirt
> > type
> > > > > > > > > interfaces.
> > > > > > > > >
> > > > > > > > > In my original thumb nail sketch of a solution I envisioned
> > > > vhost-user
> > > > > > > > > daemons running in a broadly POSIX like environment. The
> > > > interface to
> > > > > > > > > the daemon is fairly simple requiring only some mapped
> > memory
> > > > and some
> > > > > > > > > sort of signalling for events (on Linux this is eventfd).
> > The
> > > > idea was a
> > > > > > > > > stub binary would be responsible for any hypervisor specific
> > > > setup and
> > > > > > > > > then launch a common binary to deal with the actual
> > virtqueue
> > > > requests
> > > > > > > > > themselves.
> > > > > > > > >
> > > > > > > > > Since that original sketch we've seen an expansion in the
> > sort
> > > > of ways
> > > > > > > > > backends could be created. There is interest in
> > encapsulating
> > > > backends
> > > > > > > > > in RTOSes or unikernels for solutions like SCMI. There
> > interest
> > > > in Rust
> > > > > > > > > has prompted ideas of using the trait interface to abstract
> > > > differences
> > > > > > > > > away as well as the idea of bare-metal Rust backends.
> > > > > > > > >
> > > > > > > > > We have a card (STR-12) called "Hypercall Standardisation"
> > which
> > > > > > > > > calls for a description of the APIs needed from the
> > hypervisor
> > > > side to
> > > > > > > > > support VirtIO guests and their backends. However we are
> > some
> > > > way off
> > > > > > > > > from that at the moment as I think we need to at least
> > > > demonstrate one
> > > > > > > > > portable backend before we start codifying requirements. To
> > that
> > > > end I
> > > > > > > > > want to think about what we need for a backend to function.
> > > > > > > > >
> > > > > > > > > Configuration
> > > > > > > > > =============
> > > > > > > > >
> > > > > > > > > In the type-2 setup this is typically fairly simple because
> > the
> > > > host
> > > > > > > > > system can orchestrate the various modules that make up the
> > > > complete
> > > > > > > > > system. In the type-1 case (or even type-2 with delegated
> > > > service VMs)
> > > > > > > > > we need some sort of mechanism to inform the backend VM
> > about
> > > > key
> > > > > > > > > details about the system:
> > > > > > > > >
> > > > > > > > >   - where virt queue memory is in it's address space
> > > > > > > > >   - how it's going to receive (interrupt) and trigger (kick)
> > > > events
> > > > > > > > >   - what (if any) resources the backend needs to connect to
> > > > > > > > >
> > > > > > > > > Obviously you can elide over configuration issues by having
> > > > static
> > > > > > > > > configurations and baking the assumptions into your guest
> > images
> > > > however
> > > > > > > > > this isn't scalable in the long term. The obvious solution
> > seems
> > > > to be
> > > > > > > > > extending a subset of Device Tree data to user space but
> > perhaps
> > > > there
> > > > > > > > > are other approaches?
> > > > > > > > >
> > > > > > > > > Before any virtio transactions can take place the
> > appropriate
> > > > memory
> > > > > > > > > mappings need to be made between the FE guest and the BE
> > guest.
> > > > > > > >
> > > > > > > > > Currently the whole of the FE guests address space needs to
> > be
> > > > visible
> > > > > > > > > to whatever is serving the virtio requests. I can envision 3
> > > > approaches:
> > > > > > > > >
> > > > > > > > >  * BE guest boots with memory already mapped
> > > > > > > > >
> > > > > > > > >  This would entail the guest OS knowing where in it's Guest
> > > > Physical
> > > > > > > > >  Address space is already taken up and avoiding clashing. I
> > > > would assume
> > > > > > > > >  in this case you would want a standard interface to
> > userspace
> > > > to then
> > > > > > > > >  make that address space visible to the backend daemon.
> > > > > > >
> > > > > > > Yet another way here is that we would have well known "shared
> > > > memory" between
> > > > > > > VMs. I think that Jailhouse's ivshmem gives us good insights on
> > this
> > > > matter
> > > > > > > and that it can even be an alternative for hypervisor-agnostic
> > > > solution.
> > > > > > >
> > > > > > > (Please note memory regions in ivshmem appear as a PCI device
> > and
> > > > can be
> > > > > > > mapped locally.)
> > > > > > >
> > > > > > > I want to add this shared memory aspect to my virtio-proxy, but
> > > > > > > the resultant solution would eventually look similar to ivshmem.
> > > > > > >
> > > > > > > > >  * BE guests boots with a hypervisor handle to memory
> > > > > > > > >
> > > > > > > > >  The BE guest is then free to map the FE's memory to where
> > it
> > > > wants in
> > > > > > > > >  the BE's guest physical address space.
> > > > > > > >
> > > > > > > > I cannot see how this could work for Xen. There is no "handle"
> > to
> > > > give
> > > > > > > > to the backend if the backend is not running in dom0. So for
> > Xen I
> > > > think
> > > > > > > > the memory has to be already mapped
> > > > > > >
> > > > > > > In Xen's IOREQ solution (virtio-blk), the following information
> > is
> > > > expected
> > > > > > > to be exposed to BE via Xenstore:
> > > > > > > (I know that this is a tentative approach though.)
> > > > > > >    - the start address of configuration space
> > > > > > >    - interrupt number
> > > > > > >    - file path for backing storage
> > > > > > >    - read-only flag
> > > > > > > And the BE server have to call a particular hypervisor interface
> > to
> > > > > > > map the configuration space.
> > > > > >
> > > > > > Yes, Xenstore was chosen as a simple way to pass configuration
> > info to
> > > > the backend running in a non-toolstack domain.
> > > > > > I remember, there was a wish to avoid using Xenstore in Virtio
> > backend
> > > > itself if possible, so for non-toolstack domain, this could done with
> > > > adjusting devd (daemon that listens for devices and launches backends)
> > > > > > to read backend configuration from the Xenstore anyway and pass it
> > to
> > > > the backend via command line arguments.
> > > > > >
> > > > >
> > > > > Yes, in current PoC code we're using xenstore to pass device
> > > > configuration.
> > > > > We also designed a static device configuration parse method for
> > Dom0less
> > > > or
> > > > > other scenarios don't have xentool. yes, it's from device model
> > command
> > > > line
> > > > > or a config file.
> > > > >
> > > > > > But, if ...
> > > > > >
> > > > > > >
> > > > > > > In my approach (virtio-proxy), all those Xen (or hypervisor)-
> > > > specific
> > > > > > > stuffs are contained in virtio-proxy, yet another VM, to hide
> > all
> > > > details.
> > > > > >
> > > > > > ... the solution how to overcome that is already found and proven
> > to
> > > > work then even better.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > # My point is that a "handle" is not mandatory for executing
> > mapping.
> > > > > > >
> > > > > > > > and the mapping probably done by the
> > > > > > > > toolstack (also see below.) Or we would have to invent a new
> > Xen
> > > > > > > > hypervisor interface and Xen virtual machine privileges to
> > allow
> > > > this
> > > > > > > > kind of mapping.
> > > > > > >
> > > > > > > > If we run the backend in Dom0 that we have no problems of
> > course.
> > > > > > >
> > > > > > > One of difficulties on Xen that I found in my approach is that
> > > > calling
> > > > > > > such hypervisor intefaces (registering IOREQ, mapping memory) is
> > > > only
> > > > > > > allowed on BE servers themselvies and so we will have to extend
> > > > those
> > > > > > > interfaces.
> > > > > > > This, however, will raise some concern on security and privilege
> > > > distribution
> > > > > > > as Stefan suggested.
> > > > > >
> > > > > > We also faced policy related issues with Virtio backend running in
> > > > other than Dom0 domain in a "dummy" xsm mode. In our target system we
> > run
> > > > the backend in a driver
> > > > > > domain (we call it DomD) where the underlying H/W resides. We
> > trust it,
> > > > so we wrote policy rules (to be used in "flask" xsm mode) to provide
> > it
> > > > with a little bit more privileges than a simple DomU had.
> > > > > > Now it is permitted to issue device-model, resource and memory
> > > > mappings, etc calls.
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > To activate the mapping will
> > > > > > > > >  require some sort of hypercall to the hypervisor. I can see
> > two
> > > > options
> > > > > > > > >  at this point:
> > > > > > > > >
> > > > > > > > >   - expose the handle to userspace for daemon/helper to
> > trigger
> > > > the
> > > > > > > > >     mapping via existing hypercall interfaces. If using a
> > helper
> > > > you
> > > > > > > > >     would have a hypervisor specific one to avoid the daemon
> > > > having to
> > > > > > > > >     care too much about the details or push that complexity
> > into
> > > > a
> > > > > > > > >     compile time option for the daemon which would result in
> > > > different
> > > > > > > > >     binaries although a common source base.
> > > > > > > > >
> > > > > > > > >   - expose a new kernel ABI to abstract the hypercall
> > > > differences away
> > > > > > > > >     in the guest kernel. In this case the userspace would
> > > > essentially
> > > > > > > > >     ask for an abstract "map guest N memory to userspace
> > ptr"
> > > > and let
> > > > > > > > >     the kernel deal with the different hypercall interfaces.
> > > > This of
> > > > > > > > >     course assumes the majority of BE guests would be Linux
> > > > kernels and
> > > > > > > > >     leaves the bare-metal/unikernel approaches to their own
> > > > devices.
> > > > > > > > >
> > > > > > > > > Operation
> > > > > > > > > =========
> > > > > > > > >
> > > > > > > > > The core of the operation of VirtIO is fairly simple. Once
> > the
> > > > > > > > > vhost-user feature negotiation is done it's a case of
> > receiving
> > > > update
> > > > > > > > > events and parsing the resultant virt queue for data. The
> > vhost-
> > > > user
> > > > > > > > > specification handles a bunch of setup before that point,
> > mostly
> > > > to
> > > > > > > > > detail where the virt queues are set up FD's for memory and
> > > > event
> > > > > > > > > communication. This is where the envisioned stub process
> > would
> > > > be
> > > > > > > > > responsible for getting the daemon up and ready to run. This
> > is
> > > > > > > > > currently done inside a big VMM like QEMU but I suspect a
> > modern
> > > > > > > > > approach would be to use the rust-vmm vhost crate. It would
> > then
> > > > either
> > > > > > > > > communicate with the kernel's abstracted ABI or be re-
> > targeted
> > > > as a
> > > > > > > > > build option for the various hypervisors.
> > > > > > > >
> > > > > > > > One thing I mentioned before to Alex is that Xen doesn't have
> > VMMs
> > > > the
> > > > > > > > way they are typically envisioned and described in other
> > > > environments.
> > > > > > > > Instead, Xen has IOREQ servers. Each of them connects
> > > > independently to
> > > > > > > > Xen via the IOREQ interface. E.g. today multiple QEMUs could
> > be
> > > > used as
> > > > > > > > emulators for a single Xen VM, each of them connecting to Xen
> > > > > > > > independently via the IOREQ interface.
> > > > > > > >
> > > > > > > > The component responsible for starting a daemon and/or setting
> > up
> > > > shared
> > > > > > > > interfaces is the toolstack: the xl command and the
> > libxl/libxc
> > > > > > > > libraries.
> > > > > > >
> > > > > > > I think that VM configuration management (or orchestration in
> > > > Startos
> > > > > > > jargon?) is a subject to debate in parallel.
> > > > > > > Otherwise, is there any good assumption to avoid it right now?
> > > > > > >
> > > > > > > > Oleksandr and others I CCed have been working on ways for the
> > > > toolstack
> > > > > > > > to create virtio backends and setup memory mappings. They
> > might be
> > > > able
> > > > > > > > to provide more info on the subject. I do think we miss a way
> > to
> > > > provide
> > > > > > > > the configuration to the backend and anything else that the
> > > > backend
> > > > > > > > might require to start doing its job.
> > > > > >
> > > > > > Yes, some work has been done for the toolstack to handle Virtio
> > MMIO
> > > > devices in
> > > > > > general and Virtio block devices in particular. However, it has
> > not
> > > > been upstreaned yet.
> > > > > > Updated patches on review now:
> > > > > > https://lore.kernel.org/xen-devel/1621626361-29076-1-git-send-
> > email-
> > > > olekstysh@xxxxxxxxx/
> > > > > >
> > > > > > There is an additional (also important) activity to improve/fix
> > > > foreign memory mapping on Arm which I am also involved in.
> > > > > > The foreign memory mapping is proposed to be used for Virtio
> > backends
> > > > (device emulators) if there is a need to run guest OS completely
> > > > unmodified.
> > > > > > Of course, the more secure way would be to use grant memory
> > mapping.
> > > > Brietly, the main difference between them is that with foreign mapping
> > the
> > > > backend
> > > > > > can map any guest memory it wants to map, but with grant mapping
> > it is
> > > > allowed to map only what was previously granted by the frontend.
> > > > > >
> > > > > > So, there might be a problem if we want to pre-map some guest
> > memory
> > > > in advance or to cache mappings in the backend in order to improve
> > > > performance (because the mapping/unmapping guest pages every request
> > > > requires a lot of back and forth to Xen + P2M updates). In a nutshell,
> > > > currently, in order to map a guest page into the backend address space
> > we
> > > > need to steal a real physical page from the backend domain. So, with
> > the
> > > > said optimizations we might end up with no free memory in the backend
> > > > domain (see XSA-300). And what we try to achieve is to not waste a
> > real
> > > > domain memory at all by providing safe non-allocated-yet (so unused)
> > > > address space for the foreign (and grant) pages to be mapped into,
> > this
> > > > enabling work implies Xen and Linux (and likely DTB bindings) changes.
> > > > However, as it turned out, for this to work in a proper and safe way
> > some
> > > > prereq work needs to be done.
> > > > > > You can find the related Xen discussion at:
> > > > > > https://lore.kernel.org/xen-devel/1627489110-25633-1-git-send-
> > email-
> > > > olekstysh@xxxxxxxxx/
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > One question is how to best handle notification and kicks.
> > The
> > > > existing
> > > > > > > > > vhost-user framework uses eventfd to signal the daemon
> > (although
> > > > QEMU
> > > > > > > > > is quite capable of simulating them when you use TCG). Xen
> > has
> > > > it's own
> > > > > > > > > IOREQ mechanism. However latency is an important factor and
> > > > having
> > > > > > > > > events go through the stub would add quite a lot.
> > > > > > > >
> > > > > > > > Yeah I think, regardless of anything else, we want the
> > backends to
> > > > > > > > connect directly to the Xen hypervisor.
> > > > > > >
> > > > > > > In my approach,
> > > > > > >  a) BE -> FE: interrupts triggered by BE calling a hypervisor
> > > > interface
> > > > > > >               via virtio-proxy
> > > > > > >  b) FE -> BE: MMIO to config raises events (in event channels),
> > > > which is
> > > > > > >               converted to a callback to BE via virtio-proxy
> > > > > > >               (Xen's event channel is internnally implemented by
> > > > interrupts.)
> > > > > > >
> > > > > > > I don't know what "connect directly" means here, but sending
> > > > interrupts
> > > > > > > to the opposite side would be best efficient.
> > > > > > > Ivshmem, I suppose, takes this approach by utilizing PCI's msi-x
> > > > mechanism.
> > > > > >
> > > > > > Agree that MSI would be more efficient than SPI...
> > > > > > At the moment, in order to notify the frontend, the backend issues
> > a
> > > > specific device-model call to query Xen to inject a corresponding SPI
> > to
> > > > the guest.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > Could we consider the kernel internally converting IOREQ
> > > > messages from
> > > > > > > > > the Xen hypervisor to eventfd events? Would this scale with
> > > > other kernel
> > > > > > > > > hypercall interfaces?
> > > > > > > > >
> > > > > > > > > So any thoughts on what directions are worth experimenting
> > with?
> > > > > > > >
> > > > > > > > One option we should consider is for each backend to connect
> > to
> > > > Xen via
> > > > > > > > the IOREQ interface. We could generalize the IOREQ interface
> > and
> > > > make it
> > > > > > > > hypervisor agnostic. The interface is really trivial and easy
> > to
> > > > add.
> > > > > > >
> > > > > > > As I said above, my proposal does the same thing that you
> > mentioned
> > > > here :)
> > > > > > > The difference is that I do call hypervisor interfaces via
> > virtio-
> > > > proxy.
> > > > > > >
> > > > > > > > The only Xen-specific part is the notification mechanism,
> > which is
> > > > an
> > > > > > > > event channel. If we replaced the event channel with something
> > > > else the
> > > > > > > > interface would be generic. See:
> > > > > > > > https://gitlab.com/xen-project/xen/-
> > > > /blob/staging/xen/include/public/hvm/ioreq.h#L52
> > > > > > > >
> > > > > > > > I don't think that translating IOREQs to eventfd in the kernel
> > is
> > > > a
> > > > > > > > good idea: if feels like it would be extra complexity and that
> > the
> > > > > > > > kernel shouldn't be involved as this is a backend-hypervisor
> > > > interface.
> > > > > > >
> > > > > > > Given that we may want to implement BE as a bare-metal
> > application
> > > > > > > as I did on Zephyr, I don't think that the translation would not
> > be
> > > > > > > a big issue, especially on RTOS's.
> > > > > > > It will be some kind of abstraction layer of interrupt handling
> > > > > > > (or nothing but a callback mechanism).
> > > > > > >
> > > > > > > > Also, eventfd is very Linux-centric and we are trying to
> > design an
> > > > > > > > interface that could work well for RTOSes too. If we want to
> > do
> > > > > > > > something different, both OS-agnostic and hypervisor-agnostic,
> > > > perhaps
> > > > > > > > we could design a new interface. One that could be
> > implementable
> > > > in the
> > > > > > > > Xen hypervisor itself (like IOREQ) and of course any other
> > > > hypervisor
> > > > > > > > too.
> > > > > > > >
> > > > > > > >
> > > > > > > > There is also another problem. IOREQ is probably not be the
> > only
> > > > > > > > interface needed. Have a look at
> > > > > > > > https://marc.info/?l=xen-devel&m=162373754705233&w=2. Don't we
> > > > also need
> > > > > > > > an interface for the backend to inject interrupts into the
> > > > frontend? And
> > > > > > > > if the backend requires dynamic memory mappings of frontend
> > pages,
> > > > then
> > > > > > > > we would also need an interface to map/unmap domU pages.
> > > > > > >
> > > > > > > My proposal document might help here; All the interfaces
> > required
> > > > for
> > > > > > > virtio-proxy (or hypervisor-related interfaces) are listed as
> > > > > > > RPC protocols :)
> > > > > > >
> > > > > > > > These interfaces are a lot more problematic than IOREQ: IOREQ
> > is
> > > > tiny
> > > > > > > > and self-contained. It is easy to add anywhere. A new
> > interface to
> > > > > > > > inject interrupts or map pages is more difficult to manage
> > because
> > > > it
> > > > > > > > would require changes scattered across the various emulators.
> > > > > > >
> > > > > > > Exactly. I have no confident yet that my approach will also
> > apply
> > > > > > > to other hypervisors than Xen.
> > > > > > > Technically, yes, but whether people can accept it or not is a
> > > > different
> > > > > > > matter.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > -Takahiro Akashi
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > >
> > > > > > Oleksandr Tyshchenko
> > > > > IMPORTANT NOTICE: The contents of this email and any attachments are
> > > > confidential and may also be privileged. If you are not the intended
> > > > recipient, please notify the sender immediately and do not disclose
> > the
> > > > contents to any other person, use it for any purpose, or store or copy
> > the
> > > > information in any medium. Thank you.
> > > IMPORTANT NOTICE: The contents of this email and any attachments are
> > confidential and may also be privileged. If you are not the intended
> > recipient, please notify the sender immediately and do not disclose the
> > contents to any other person, use it for any purpose, or store or copy the
> > information in any medium. Thank you.
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy the 
> information in any medium. Thank you.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.