[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Enabling hypervisor agnosticism for VirtIO backends



Wei,

On Thu, Aug 26, 2021 at 12:10:19PM +0000, Wei Chen wrote:
> Hi Akashi,
> 
> > -----Original Message-----
> > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > Sent: 2021年8月26日 17:41
> > To: Wei Chen <Wei.Chen@xxxxxxx>
> > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini
> > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>; Kaly Xin
> > <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>;
> > virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>;
> > Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik <cvanscha@xxxxxxxxxxxxxxxx>;
> > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; Jean-
> > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>; Julien
> > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul Durrant
> > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> >
> > Hi Wei,
> >
> > On Fri, Aug 20, 2021 at 03:41:50PM +0900, AKASHI Takahiro wrote:
> > > On Wed, Aug 18, 2021 at 08:35:51AM +0000, Wei Chen wrote:
> > > > Hi Akashi,
> > > >
> > > > > -----Original Message-----
> > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > > > > Sent: 2021年8月18日 13:39
> > > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini
> > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>;
> > Stratos
> > > > > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-
> > dev@lists.oasis-
> > > > > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh Kumar
> > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> > <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>;
> > Jean-
> > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>;
> > Julien
> > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul
> > Durrant
> > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> > > > >
> > > > > On Tue, Aug 17, 2021 at 08:39:09AM +0000, Wei Chen wrote:
> > > > > > Hi Akashi,
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > > > > > > Sent: 2021年8月17日 16:08
> > > > > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano
> > Stabellini
> > > > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>;
> > > > > Stratos
> > > > > > > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-
> > > > > dev@lists.oasis-
> > > > > > > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh Kumar
> > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> > <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>;
> > Jean-
> > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > > > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > > > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev
> > <Artem_Mygaiev@xxxxxxxx>;
> > > > > Julien
> > > > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul
> > Durrant
> > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> > > > > > >
> > > > > > > Hi Wei, Oleksandr,
> > > > > > >
> > > > > > > On Mon, Aug 16, 2021 at 10:04:03AM +0000, Wei Chen wrote:
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > Thanks for Stefano to link my kvmtool for Xen proposal here.
> > > > > > > > This proposal is still discussing in Xen and KVM communities.
> > > > > > > > The main work is to decouple the kvmtool from KVM and make
> > > > > > > > other hypervisors can reuse the virtual device implementations.
> > > > > > > >
> > > > > > > > In this case, we need to introduce an intermediate hypervisor
> > > > > > > > layer for VMM abstraction, Which is, I think it's very close
> > > > > > > > to stratos' virtio hypervisor agnosticism work.
> > > > > > >
> > > > > > > # My proposal[1] comes from my own idea and doesn't always
> > represent
> > > > > > > # Linaro's view on this subject nor reflect Alex's concerns.
> > > > > Nevertheless,
> > > > > > >
> > > > > > > Your idea and my proposal seem to share the same background.
> > > > > > > Both have the similar goal and currently start with, at first,
> > Xen
> > > > > > > and are based on kvm-tool. (Actually, my work is derived from
> > > > > > > EPAM's virtio-disk, which is also based on kvm-tool.)
> > > > > > >
> > > > > > > In particular, the abstraction of hypervisor interfaces has a
> > same
> > > > > > > set of interfaces (for your "struct vmm_impl" and my "RPC
> > interfaces").
> > > > > > > This is not co-incident as we both share the same origin as I
> > said
> > > > > above.
> > > > > > > And so we will also share the same issues. One of them is a way
> > of
> > > > > > > "sharing/mapping FE's memory". There is some trade-off between
> > > > > > > the portability and the performance impact.
> > > > > > > So we can discuss the topic here in this ML, too.
> > > > > > > (See Alex's original email, too).
> > > > > > >
> > > > > > Yes, I agree.
> > > > > >
> > > > > > > On the other hand, my approach aims to create a "single-binary"
> > > > > solution
> > > > > > > in which the same binary of BE vm could run on any hypervisors.
> > > > > > > Somehow similar to your "proposal-#2" in [2], but in my solution,
> > all
> > > > > > > the hypervisor-specific code would be put into another entity
> > (VM),
> > > > > > > named "virtio-proxy" and the abstracted operations are served
> > via RPC.
> > > > > > > (In this sense, BE is hypervisor-agnostic but might have OS
> > > > > dependency.)
> > > > > > > But I know that we need discuss if this is a requirement even
> > > > > > > in Stratos project or not. (Maybe not)
> > > > > > >
> > > > > >
> > > > > > Sorry, I haven't had time to finish reading your virtio-proxy
> > completely
> > > > > > (I will do it ASAP). But from your description, it seems we need a
> > > > > > 3rd VM between FE and BE? My concern is that, if my assumption is
> > right,
> > > > > > will it increase the latency in data transport path? Even if we're
> > > > > > using some lightweight guest like RTOS or Unikernel,
> > > > >
> > > > > Yes, you're right. But I'm afraid that it is a matter of degree.
> > > > > As far as we execute 'mapping' operations at every fetch of payload,
> > > > > we will see latency issue (even in your case) and if we have some
> > solution
> > > > > for it, we won't see it neither in my proposal :)
> > > > >
> > > >
> > > > Oleksandr has sent a proposal to Xen mailing list to reduce this kind
> > > > of "mapping/unmapping" operations. So the latency caused by this
> > behavior
> > > > on Xen may eventually be eliminated, and Linux-KVM doesn't have that
> > problem.
> > >
> > > Obviously, I have not yet caught up there in the discussion.
> > > Which patch specifically?
> >
> > Can you give me the link to the discussion or patch, please?
> >
> 
> It's a RFC discussion. We have tested this RFC patch internally.
> https://lists.xenproject.org/archives/html/xen-devel/2021-07/msg01532.html

I'm afraid that I miss something here, but I don't know
why this proposed API will lead to eliminating 'mmap' in accessing
the queued payload at every request?

-Takahiro Akashi


> > Thanks,
> > -Takahiro Akashi
> >
> > > -Takahiro Akashi
> > >
> > > > > > > Specifically speaking about kvm-tool, I have a concern about its
> > > > > > > license term; Targeting different hypervisors and different OSs
> > > > > > > (which I assume includes RTOS's), the resultant library should
> > be
> > > > > > > license permissive and GPL for kvm-tool might be an issue.
> > > > > > > Any thoughts?
> > > > > > >
> > > > > >
> > > > > > Yes. If user want to implement a FreeBSD device model, but the
> > virtio
> > > > > > library is GPL. Then GPL would be a problem. If we have another
> > good
> > > > > > candidate, I am open to it.
> > > > >
> > > > > I have some candidates, particularly for vq/vring, in my mind:
> > > > > * Open-AMP, or
> > > > > * corresponding Free-BSD code
> > > > >
> > > >
> > > > Interesting, I will look into them : )
> > > >
> > > > Cheers,
> > > > Wei Chen
> > > >
> > > > > -Takahiro Akashi
> > > > >
> > > > >
> > > > > > > -Takahiro Akashi
> > > > > > >
> > > > > > >
> > > > > > > [1] https://op-lists.linaro.org/pipermail/stratos-dev/2021-
> > > > > > > August/000548.html
> > > > > > > [2] https://marc.info/?l=xen-devel&m=162373754705233&w=2
> > > > > > >
> > > > > > > >
> > > > > > > > > From: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>
> > > > > > > > > Sent: 2021年8月14日 23:38
> > > > > > > > > To: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>; Stefano
> > > > > Stabellini
> > > > > > > <sstabellini@xxxxxxxxxx>
> > > > > > > > > Cc: Alex Benn??e <alex.bennee@xxxxxxxxxx>; Stratos Mailing
> > List
> > > > > > > <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-dev@lists.oasis-
> > open.org;
> > > > > Arnd
> > > > > > > Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh Kumar
> > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> > <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>;
> > Jean-
> > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > > > > > > <mathieu.poirier@xxxxxxxxxx>; Wei Chen <Wei.Chen@xxxxxxx>;
> > Oleksandr
> > > > > > > Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev
> > <Artem_Mygaiev@xxxxxxxx>;
> > > > > Julien
> > > > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul
> > Durrant
> > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO
> > backends
> > > > > > > > >
> > > > > > > > > Hello, all.
> > > > > > > > >
> > > > > > > > > Please see some comments below. And sorry for the possible
> > format
> > > > > > > issues.
> > > > > > > > >
> > > > > > > > > > On Wed, Aug 11, 2021 at 9:27 AM AKASHI Takahiro
> > > > > > > <mailto:takahiro.akashi@xxxxxxxxxx> wrote:
> > > > > > > > > > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano
> > Stabellini
> > > > > wrote:
> > > > > > > > > > > CCing people working on Xen+VirtIO and IOREQs. Not
> > trimming
> > > > > the
> > > > > > > original
> > > > > > > > > > > email to let them read the full context.
> > > > > > > > > > >
> > > > > > > > > > > My comments below are related to a potential Xen
> > > > > implementation,
> > > > > > > not
> > > > > > > > > > > because it is the only implementation that matters, but
> > > > > because it
> > > > > > > is
> > > > > > > > > > > the one I know best.
> > > > > > > > > >
> > > > > > > > > > Please note that my proposal (and hence the working
> > prototype)[1]
> > > > > > > > > > is based on Xen's virtio implementation (i.e. IOREQ) and
> > > > > > > particularly
> > > > > > > > > > EPAM's virtio-disk application (backend server).
> > > > > > > > > > It has been, I believe, well generalized but is still a
> > bit
> > > > > biased
> > > > > > > > > > toward this original design.
> > > > > > > > > >
> > > > > > > > > > So I hope you like my approach :)
> > > > > > > > > >
> > > > > > > > > > [1] https://op-lists.linaro.org/pipermail/stratos-
> > dev/2021-
> > > > > > > August/000546.html
> > > > > > > > > >
> > > > > > > > > > Let me take this opportunity to explain a bit more about
> > my
> > > > > approach
> > > > > > > below.
> > > > > > > > > >
> > > > > > > > > > > Also, please see this relevant email thread:
> > > > > > > > > > > https://marc.info/?l=xen-devel&m=162373754705233&w=2
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, 4 Aug 2021, Alex Bennée wrote:
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > One of the goals of Project Stratos is to enable
> > hypervisor
> > > > > > > agnostic
> > > > > > > > > > > > backends so we can enable as much re-use of code as
> > possible
> > > > > and
> > > > > > > avoid
> > > > > > > > > > > > repeating ourselves. This is the flip side of the
> > front end
> > > > > > > where
> > > > > > > > > > > > multiple front-end implementations are required - one
> > per OS,
> > > > > > > assuming
> > > > > > > > > > > > you don't just want Linux guests. The resultant guests
> > are
> > > > > > > trivially
> > > > > > > > > > > > movable between hypervisors modulo any abstracted
> > paravirt
> > > > > type
> > > > > > > > > > > > interfaces.
> > > > > > > > > > > >
> > > > > > > > > > > > In my original thumb nail sketch of a solution I
> > envisioned
> > > > > > > vhost-user
> > > > > > > > > > > > daemons running in a broadly POSIX like environment.
> > The
> > > > > > > interface to
> > > > > > > > > > > > the daemon is fairly simple requiring only some mapped
> > > > > memory
> > > > > > > and some
> > > > > > > > > > > > sort of signalling for events (on Linux this is
> > eventfd).
> > > > > The
> > > > > > > idea was a
> > > > > > > > > > > > stub binary would be responsible for any hypervisor
> > specific
> > > > > > > setup and
> > > > > > > > > > > > then launch a common binary to deal with the actual
> > > > > virtqueue
> > > > > > > requests
> > > > > > > > > > > > themselves.
> > > > > > > > > > > >
> > > > > > > > > > > > Since that original sketch we've seen an expansion in
> > the
> > > > > sort
> > > > > > > of ways
> > > > > > > > > > > > backends could be created. There is interest in
> > > > > encapsulating
> > > > > > > backends
> > > > > > > > > > > > in RTOSes or unikernels for solutions like SCMI. There
> > > > > interest
> > > > > > > in Rust
> > > > > > > > > > > > has prompted ideas of using the trait interface to
> > abstract
> > > > > > > differences
> > > > > > > > > > > > away as well as the idea of bare-metal Rust backends.
> > > > > > > > > > > >
> > > > > > > > > > > > We have a card (STR-12) called "Hypercall
> > Standardisation"
> > > > > which
> > > > > > > > > > > > calls for a description of the APIs needed from the
> > > > > hypervisor
> > > > > > > side to
> > > > > > > > > > > > support VirtIO guests and their backends. However we
> > are
> > > > > some
> > > > > > > way off
> > > > > > > > > > > > from that at the moment as I think we need to at least
> > > > > > > demonstrate one
> > > > > > > > > > > > portable backend before we start codifying
> > requirements. To
> > > > > that
> > > > > > > end I
> > > > > > > > > > > > want to think about what we need for a backend to
> > function.
> > > > > > > > > > > >
> > > > > > > > > > > > Configuration
> > > > > > > > > > > > =============
> > > > > > > > > > > >
> > > > > > > > > > > > In the type-2 setup this is typically fairly simple
> > because
> > > > > the
> > > > > > > host
> > > > > > > > > > > > system can orchestrate the various modules that make
> > up the
> > > > > > > complete
> > > > > > > > > > > > system. In the type-1 case (or even type-2 with
> > delegated
> > > > > > > service VMs)
> > > > > > > > > > > > we need some sort of mechanism to inform the backend
> > VM
> > > > > about
> > > > > > > key
> > > > > > > > > > > > details about the system:
> > > > > > > > > > > >
> > > > > > > > > > > >   - where virt queue memory is in it's address space
> > > > > > > > > > > >   - how it's going to receive (interrupt) and trigger
> > (kick)
> > > > > > > events
> > > > > > > > > > > >   - what (if any) resources the backend needs to
> > connect to
> > > > > > > > > > > >
> > > > > > > > > > > > Obviously you can elide over configuration issues by
> > having
> > > > > > > static
> > > > > > > > > > > > configurations and baking the assumptions into your
> > guest
> > > > > images
> > > > > > > however
> > > > > > > > > > > > this isn't scalable in the long term. The obvious
> > solution
> > > > > seems
> > > > > > > to be
> > > > > > > > > > > > extending a subset of Device Tree data to user space
> > but
> > > > > perhaps
> > > > > > > there
> > > > > > > > > > > > are other approaches?
> > > > > > > > > > > >
> > > > > > > > > > > > Before any virtio transactions can take place the
> > > > > appropriate
> > > > > > > memory
> > > > > > > > > > > > mappings need to be made between the FE guest and the
> > BE
> > > > > guest.
> > > > > > > > > > >
> > > > > > > > > > > > Currently the whole of the FE guests address space
> > needs to
> > > > > be
> > > > > > > visible
> > > > > > > > > > > > to whatever is serving the virtio requests. I can
> > envision 3
> > > > > > > approaches:
> > > > > > > > > > > >
> > > > > > > > > > > >  * BE guest boots with memory already mapped
> > > > > > > > > > > >
> > > > > > > > > > > >  This would entail the guest OS knowing where in it's
> > Guest
> > > > > > > Physical
> > > > > > > > > > > >  Address space is already taken up and avoiding
> > clashing. I
> > > > > > > would assume
> > > > > > > > > > > >  in this case you would want a standard interface to
> > > > > userspace
> > > > > > > to then
> > > > > > > > > > > >  make that address space visible to the backend daemon.
> > > > > > > > > >
> > > > > > > > > > Yet another way here is that we would have well known
> > "shared
> > > > > > > memory" between
> > > > > > > > > > VMs. I think that Jailhouse's ivshmem gives us good
> > insights on
> > > > > this
> > > > > > > matter
> > > > > > > > > > and that it can even be an alternative for hypervisor-
> > agnostic
> > > > > > > solution.
> > > > > > > > > >
> > > > > > > > > > (Please note memory regions in ivshmem appear as a PCI
> > device
> > > > > and
> > > > > > > can be
> > > > > > > > > > mapped locally.)
> > > > > > > > > >
> > > > > > > > > > I want to add this shared memory aspect to my virtio-proxy,
> > but
> > > > > > > > > > the resultant solution would eventually look similar to
> > ivshmem.
> > > > > > > > > >
> > > > > > > > > > > >  * BE guests boots with a hypervisor handle to memory
> > > > > > > > > > > >
> > > > > > > > > > > >  The BE guest is then free to map the FE's memory to
> > where
> > > > > it
> > > > > > > wants in
> > > > > > > > > > > >  the BE's guest physical address space.
> > > > > > > > > > >
> > > > > > > > > > > I cannot see how this could work for Xen. There is no
> > "handle"
> > > > > to
> > > > > > > give
> > > > > > > > > > > to the backend if the backend is not running in dom0. So
> > for
> > > > > Xen I
> > > > > > > think
> > > > > > > > > > > the memory has to be already mapped
> > > > > > > > > >
> > > > > > > > > > In Xen's IOREQ solution (virtio-blk), the following
> > information
> > > > > is
> > > > > > > expected
> > > > > > > > > > to be exposed to BE via Xenstore:
> > > > > > > > > > (I know that this is a tentative approach though.)
> > > > > > > > > >    - the start address of configuration space
> > > > > > > > > >    - interrupt number
> > > > > > > > > >    - file path for backing storage
> > > > > > > > > >    - read-only flag
> > > > > > > > > > And the BE server have to call a particular hypervisor
> > interface
> > > > > to
> > > > > > > > > > map the configuration space.
> > > > > > > > >
> > > > > > > > > Yes, Xenstore was chosen as a simple way to pass
> > configuration
> > > > > info to
> > > > > > > the backend running in a non-toolstack domain.
> > > > > > > > > I remember, there was a wish to avoid using Xenstore in
> > Virtio
> > > > > backend
> > > > > > > itself if possible, so for non-toolstack domain, this could done
> > with
> > > > > > > adjusting devd (daemon that listens for devices and launches
> > backends)
> > > > > > > > > to read backend configuration from the Xenstore anyway and
> > pass it
> > > > > to
> > > > > > > the backend via command line arguments.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, in current PoC code we're using xenstore to pass device
> > > > > > > configuration.
> > > > > > > > We also designed a static device configuration parse method
> > for
> > > > > Dom0less
> > > > > > > or
> > > > > > > > other scenarios don't have xentool. yes, it's from device
> > model
> > > > > command
> > > > > > > line
> > > > > > > > or a config file.
> > > > > > > >
> > > > > > > > > But, if ...
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > In my approach (virtio-proxy), all those Xen (or
> > hypervisor)-
> > > > > > > specific
> > > > > > > > > > stuffs are contained in virtio-proxy, yet another VM, to
> > hide
> > > > > all
> > > > > > > details.
> > > > > > > > >
> > > > > > > > > ... the solution how to overcome that is already found and
> > proven
> > > > > to
> > > > > > > work then even better.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > # My point is that a "handle" is not mandatory for
> > executing
> > > > > mapping.
> > > > > > > > > >
> > > > > > > > > > > and the mapping probably done by the
> > > > > > > > > > > toolstack (also see below.) Or we would have to invent a
> > new
> > > > > Xen
> > > > > > > > > > > hypervisor interface and Xen virtual machine privileges
> > to
> > > > > allow
> > > > > > > this
> > > > > > > > > > > kind of mapping.
> > > > > > > > > >
> > > > > > > > > > > If we run the backend in Dom0 that we have no problems
> > of
> > > > > course.
> > > > > > > > > >
> > > > > > > > > > One of difficulties on Xen that I found in my approach is
> > that
> > > > > > > calling
> > > > > > > > > > such hypervisor intefaces (registering IOREQ, mapping
> > memory) is
> > > > > > > only
> > > > > > > > > > allowed on BE servers themselvies and so we will have to
> > extend
> > > > > > > those
> > > > > > > > > > interfaces.
> > > > > > > > > > This, however, will raise some concern on security and
> > privilege
> > > > > > > distribution
> > > > > > > > > > as Stefan suggested.
> > > > > > > > >
> > > > > > > > > We also faced policy related issues with Virtio backend
> > running in
> > > > > > > other than Dom0 domain in a "dummy" xsm mode. In our target
> > system we
> > > > > run
> > > > > > > the backend in a driver
> > > > > > > > > domain (we call it DomD) where the underlying H/W resides.
> > We
> > > > > trust it,
> > > > > > > so we wrote policy rules (to be used in "flask" xsm mode) to
> > provide
> > > > > it
> > > > > > > with a little bit more privileges than a simple DomU had.
> > > > > > > > > Now it is permitted to issue device-model, resource and
> > memory
> > > > > > > mappings, etc calls.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > To activate the mapping will
> > > > > > > > > > > >  require some sort of hypercall to the hypervisor. I
> > can see
> > > > > two
> > > > > > > options
> > > > > > > > > > > >  at this point:
> > > > > > > > > > > >
> > > > > > > > > > > >   - expose the handle to userspace for daemon/helper
> > to
> > > > > trigger
> > > > > > > the
> > > > > > > > > > > >     mapping via existing hypercall interfaces. If
> > using a
> > > > > helper
> > > > > > > you
> > > > > > > > > > > >     would have a hypervisor specific one to avoid the
> > daemon
> > > > > > > having to
> > > > > > > > > > > >     care too much about the details or push that
> > complexity
> > > > > into
> > > > > > > a
> > > > > > > > > > > >     compile time option for the daemon which would
> > result in
> > > > > > > different
> > > > > > > > > > > >     binaries although a common source base.
> > > > > > > > > > > >
> > > > > > > > > > > >   - expose a new kernel ABI to abstract the hypercall
> > > > > > > differences away
> > > > > > > > > > > >     in the guest kernel. In this case the userspace
> > would
> > > > > > > essentially
> > > > > > > > > > > >     ask for an abstract "map guest N memory to
> > userspace
> > > > > ptr"
> > > > > > > and let
> > > > > > > > > > > >     the kernel deal with the different hypercall
> > interfaces.
> > > > > > > This of
> > > > > > > > > > > >     course assumes the majority of BE guests would be
> > Linux
> > > > > > > kernels and
> > > > > > > > > > > >     leaves the bare-metal/unikernel approaches to
> > their own
> > > > > > > devices.
> > > > > > > > > > > >
> > > > > > > > > > > > Operation
> > > > > > > > > > > > =========
> > > > > > > > > > > >
> > > > > > > > > > > > The core of the operation of VirtIO is fairly simple.
> > Once
> > > > > the
> > > > > > > > > > > > vhost-user feature negotiation is done it's a case of
> > > > > receiving
> > > > > > > update
> > > > > > > > > > > > events and parsing the resultant virt queue for data.
> > The
> > > > > vhost-
> > > > > > > user
> > > > > > > > > > > > specification handles a bunch of setup before that
> > point,
> > > > > mostly
> > > > > > > to
> > > > > > > > > > > > detail where the virt queues are set up FD's for
> > memory and
> > > > > > > event
> > > > > > > > > > > > communication. This is where the envisioned stub
> > process
> > > > > would
> > > > > > > be
> > > > > > > > > > > > responsible for getting the daemon up and ready to run.
> > This
> > > > > is
> > > > > > > > > > > > currently done inside a big VMM like QEMU but I
> > suspect a
> > > > > modern
> > > > > > > > > > > > approach would be to use the rust-vmm vhost crate. It
> > would
> > > > > then
> > > > > > > either
> > > > > > > > > > > > communicate with the kernel's abstracted ABI or be re-
> > > > > targeted
> > > > > > > as a
> > > > > > > > > > > > build option for the various hypervisors.
> > > > > > > > > > >
> > > > > > > > > > > One thing I mentioned before to Alex is that Xen doesn't
> > have
> > > > > VMMs
> > > > > > > the
> > > > > > > > > > > way they are typically envisioned and described in other
> > > > > > > environments.
> > > > > > > > > > > Instead, Xen has IOREQ servers. Each of them connects
> > > > > > > independently to
> > > > > > > > > > > Xen via the IOREQ interface. E.g. today multiple QEMUs
> > could
> > > > > be
> > > > > > > used as
> > > > > > > > > > > emulators for a single Xen VM, each of them connecting
> > to Xen
> > > > > > > > > > > independently via the IOREQ interface.
> > > > > > > > > > >
> > > > > > > > > > > The component responsible for starting a daemon and/or
> > setting
> > > > > up
> > > > > > > shared
> > > > > > > > > > > interfaces is the toolstack: the xl command and the
> > > > > libxl/libxc
> > > > > > > > > > > libraries.
> > > > > > > > > >
> > > > > > > > > > I think that VM configuration management (or orchestration
> > in
> > > > > > > Startos
> > > > > > > > > > jargon?) is a subject to debate in parallel.
> > > > > > > > > > Otherwise, is there any good assumption to avoid it right
> > now?
> > > > > > > > > >
> > > > > > > > > > > Oleksandr and others I CCed have been working on ways
> > for the
> > > > > > > toolstack
> > > > > > > > > > > to create virtio backends and setup memory mappings.
> > They
> > > > > might be
> > > > > > > able
> > > > > > > > > > > to provide more info on the subject. I do think we miss
> > a way
> > > > > to
> > > > > > > provide
> > > > > > > > > > > the configuration to the backend and anything else that
> > the
> > > > > > > backend
> > > > > > > > > > > might require to start doing its job.
> > > > > > > > >
> > > > > > > > > Yes, some work has been done for the toolstack to handle
> > Virtio
> > > > > MMIO
> > > > > > > devices in
> > > > > > > > > general and Virtio block devices in particular. However, it
> > has
> > > > > not
> > > > > > > been upstreaned yet.
> > > > > > > > > Updated patches on review now:
> > > > > > > > > https://lore.kernel.org/xen-devel/1621626361-29076-1-git-
> > send-
> > > > > email-
> > > > > > > olekstysh@xxxxxxxxx/
> > > > > > > > >
> > > > > > > > > There is an additional (also important) activity to
> > improve/fix
> > > > > > > foreign memory mapping on Arm which I am also involved in.
> > > > > > > > > The foreign memory mapping is proposed to be used for Virtio
> > > > > backends
> > > > > > > (device emulators) if there is a need to run guest OS completely
> > > > > > > unmodified.
> > > > > > > > > Of course, the more secure way would be to use grant memory
> > > > > mapping.
> > > > > > > Brietly, the main difference between them is that with foreign
> > mapping
> > > > > the
> > > > > > > backend
> > > > > > > > > can map any guest memory it wants to map, but with grant
> > mapping
> > > > > it is
> > > > > > > allowed to map only what was previously granted by the frontend.
> > > > > > > > >
> > > > > > > > > So, there might be a problem if we want to pre-map some
> > guest
> > > > > memory
> > > > > > > in advance or to cache mappings in the backend in order to
> > improve
> > > > > > > performance (because the mapping/unmapping guest pages every
> > request
> > > > > > > requires a lot of back and forth to Xen + P2M updates). In a
> > nutshell,
> > > > > > > currently, in order to map a guest page into the backend address
> > space
> > > > > we
> > > > > > > need to steal a real physical page from the backend domain. So,
> > with
> > > > > the
> > > > > > > said optimizations we might end up with no free memory in the
> > backend
> > > > > > > domain (see XSA-300). And what we try to achieve is to not waste
> > a
> > > > > real
> > > > > > > domain memory at all by providing safe non-allocated-yet (so
> > unused)
> > > > > > > address space for the foreign (and grant) pages to be mapped
> > into,
> > > > > this
> > > > > > > enabling work implies Xen and Linux (and likely DTB bindings)
> > changes.
> > > > > > > However, as it turned out, for this to work in a proper and safe
> > way
> > > > > some
> > > > > > > prereq work needs to be done.
> > > > > > > > > You can find the related Xen discussion at:
> > > > > > > > > https://lore.kernel.org/xen-devel/1627489110-25633-1-git-
> > send-
> > > > > email-
> > > > > > > olekstysh@xxxxxxxxx/
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > One question is how to best handle notification and
> > kicks.
> > > > > The
> > > > > > > existing
> > > > > > > > > > > > vhost-user framework uses eventfd to signal the daemon
> > > > > (although
> > > > > > > QEMU
> > > > > > > > > > > > is quite capable of simulating them when you use TCG).
> > Xen
> > > > > has
> > > > > > > it's own
> > > > > > > > > > > > IOREQ mechanism. However latency is an important
> > factor and
> > > > > > > having
> > > > > > > > > > > > events go through the stub would add quite a lot.
> > > > > > > > > > >
> > > > > > > > > > > Yeah I think, regardless of anything else, we want the
> > > > > backends to
> > > > > > > > > > > connect directly to the Xen hypervisor.
> > > > > > > > > >
> > > > > > > > > > In my approach,
> > > > > > > > > >  a) BE -> FE: interrupts triggered by BE calling a
> > hypervisor
> > > > > > > interface
> > > > > > > > > >               via virtio-proxy
> > > > > > > > > >  b) FE -> BE: MMIO to config raises events (in event
> > channels),
> > > > > > > which is
> > > > > > > > > >               converted to a callback to BE via virtio-
> > proxy
> > > > > > > > > >               (Xen's event channel is internnally
> > implemented by
> > > > > > > interrupts.)
> > > > > > > > > >
> > > > > > > > > > I don't know what "connect directly" means here, but
> > sending
> > > > > > > interrupts
> > > > > > > > > > to the opposite side would be best efficient.
> > > > > > > > > > Ivshmem, I suppose, takes this approach by utilizing PCI's
> > msi-x
> > > > > > > mechanism.
> > > > > > > > >
> > > > > > > > > Agree that MSI would be more efficient than SPI...
> > > > > > > > > At the moment, in order to notify the frontend, the backend
> > issues
> > > > > a
> > > > > > > specific device-model call to query Xen to inject a
> > corresponding SPI
> > > > > to
> > > > > > > the guest.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Could we consider the kernel internally converting
> > IOREQ
> > > > > > > messages from
> > > > > > > > > > > > the Xen hypervisor to eventfd events? Would this scale
> > with
> > > > > > > other kernel
> > > > > > > > > > > > hypercall interfaces?
> > > > > > > > > > > >
> > > > > > > > > > > > So any thoughts on what directions are worth
> > experimenting
> > > > > with?
> > > > > > > > > > >
> > > > > > > > > > > One option we should consider is for each backend to
> > connect
> > > > > to
> > > > > > > Xen via
> > > > > > > > > > > the IOREQ interface. We could generalize the IOREQ
> > interface
> > > > > and
> > > > > > > make it
> > > > > > > > > > > hypervisor agnostic. The interface is really trivial and
> > easy
> > > > > to
> > > > > > > add.
> > > > > > > > > >
> > > > > > > > > > As I said above, my proposal does the same thing that you
> > > > > mentioned
> > > > > > > here :)
> > > > > > > > > > The difference is that I do call hypervisor interfaces via
> > > > > virtio-
> > > > > > > proxy.
> > > > > > > > > >
> > > > > > > > > > > The only Xen-specific part is the notification mechanism,
> > > > > which is
> > > > > > > an
> > > > > > > > > > > event channel. If we replaced the event channel with
> > something
> > > > > > > else the
> > > > > > > > > > > interface would be generic. See:
> > > > > > > > > > > https://gitlab.com/xen-project/xen/-
> > > > > > > /blob/staging/xen/include/public/hvm/ioreq.h#L52
> > > > > > > > > > >
> > > > > > > > > > > I don't think that translating IOREQs to eventfd in the
> > kernel
> > > > > is
> > > > > > > a
> > > > > > > > > > > good idea: if feels like it would be extra complexity
> > and that
> > > > > the
> > > > > > > > > > > kernel shouldn't be involved as this is a backend-
> > hypervisor
> > > > > > > interface.
> > > > > > > > > >
> > > > > > > > > > Given that we may want to implement BE as a bare-metal
> > > > > application
> > > > > > > > > > as I did on Zephyr, I don't think that the translation
> > would not
> > > > > be
> > > > > > > > > > a big issue, especially on RTOS's.
> > > > > > > > > > It will be some kind of abstraction layer of interrupt
> > handling
> > > > > > > > > > (or nothing but a callback mechanism).
> > > > > > > > > >
> > > > > > > > > > > Also, eventfd is very Linux-centric and we are trying to
> > > > > design an
> > > > > > > > > > > interface that could work well for RTOSes too. If we
> > want to
> > > > > do
> > > > > > > > > > > something different, both OS-agnostic and hypervisor-
> > agnostic,
> > > > > > > perhaps
> > > > > > > > > > > we could design a new interface. One that could be
> > > > > implementable
> > > > > > > in the
> > > > > > > > > > > Xen hypervisor itself (like IOREQ) and of course any
> > other
> > > > > > > hypervisor
> > > > > > > > > > > too.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > There is also another problem. IOREQ is probably not be
> > the
> > > > > only
> > > > > > > > > > > interface needed. Have a look at
> > > > > > > > > > > https://marc.info/?l=xen-devel&m=162373754705233&w=2.
> > Don't we
> > > > > > > also need
> > > > > > > > > > > an interface for the backend to inject interrupts into
> > the
> > > > > > > frontend? And
> > > > > > > > > > > if the backend requires dynamic memory mappings of
> > frontend
> > > > > pages,
> > > > > > > then
> > > > > > > > > > > we would also need an interface to map/unmap domU pages.
> > > > > > > > > >
> > > > > > > > > > My proposal document might help here; All the interfaces
> > > > > required
> > > > > > > for
> > > > > > > > > > virtio-proxy (or hypervisor-related interfaces) are listed
> > as
> > > > > > > > > > RPC protocols :)
> > > > > > > > > >
> > > > > > > > > > > These interfaces are a lot more problematic than IOREQ:
> > IOREQ
> > > > > is
> > > > > > > tiny
> > > > > > > > > > > and self-contained. It is easy to add anywhere. A new
> > > > > interface to
> > > > > > > > > > > inject interrupts or map pages is more difficult to
> > manage
> > > > > because
> > > > > > > it
> > > > > > > > > > > would require changes scattered across the various
> > emulators.
> > > > > > > > > >
> > > > > > > > > > Exactly. I have no confident yet that my approach will
> > also
> > > > > apply
> > > > > > > > > > to other hypervisors than Xen.
> > > > > > > > > > Technically, yes, but whether people can accept it or not
> > is a
> > > > > > > different
> > > > > > > > > > matter.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > -Takahiro Akashi
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > Oleksandr Tyshchenko
> > > > > > > > IMPORTANT NOTICE: The contents of this email and any
> > attachments are
> > > > > > > confidential and may also be privileged. If you are not the
> > intended
> > > > > > > recipient, please notify the sender immediately and do not
> > disclose
> > > > > the
> > > > > > > contents to any other person, use it for any purpose, or store
> > or copy
> > > > > the
> > > > > > > information in any medium. Thank you.
> > > > > > IMPORTANT NOTICE: The contents of this email and any attachments
> > are
> > > > > confidential and may also be privileged. If you are not the intended
> > > > > recipient, please notify the sender immediately and do not disclose
> > the
> > > > > contents to any other person, use it for any purpose, or store or
> > copy the
> > > > > information in any medium. Thank you.
> > > > IMPORTANT NOTICE: The contents of this email and any attachments are
> > confidential and may also be privileged. If you are not the intended
> > recipient, please notify the sender immediately and do not disclose the
> > contents to any other person, use it for any purpose, or store or copy the
> > information in any medium. Thank you.
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy the 
> information in any medium. Thank you.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.