[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Enabling hypervisor agnosticism for VirtIO backends



Hi Akashi, Oleksandr,

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Wei
> Chen
> Sent: 2021年9月2日 9:31
> To: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini
> <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>; Kaly Xin
> <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>;
> virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>;
> Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik <cvanscha@xxxxxxxxxxxxxxxx>;
> pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; Jean-
> Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>; Julien
> Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul Durrant
> <paul@xxxxxxx>; nd <nd@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> Subject: RE: Enabling hypervisor agnosticism for VirtIO backends
> 
> Hi Akashi,
> 
> > -----Original Message-----
> > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > Sent: 2021年9月1日 20:29
> > To: Wei Chen <Wei.Chen@xxxxxxx>
> > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini
> > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>; Kaly
> Xin
> > <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@op-
> lists.linaro.org>;
> > virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann
> <arnd.bergmann@xxxxxxxxxx>;
> > Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik <cvanscha@xxxxxxxxxxxxxxxx>;
> > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; Jean-
> > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>;
> Julien
> > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul Durrant
> > <paul@xxxxxxx>; nd <nd@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> >
> > Hi Wei,
> >
> > On Wed, Sep 01, 2021 at 11:12:58AM +0000, Wei Chen wrote:
> > > Hi Akashi,
> > >
> > > > -----Original Message-----
> > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > > > Sent: 2021年8月31日 14:18
> > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini
> > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>;
> Kaly
> > Xin
> > > > <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@op-
> > lists.linaro.org>;
> > > > virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann
> > <arnd.bergmann@xxxxxxxxxx>;
> > > > Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>;
> Jean-
> > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>;
> > Julien
> > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul
> Durrant
> > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> > > >
> > > > Wei,
> > > >
> > > > On Thu, Aug 26, 2021 at 12:10:19PM +0000, Wei Chen wrote:
> > > > > Hi Akashi,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > > > > > Sent: 2021年8月26日 17:41
> > > > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano
> Stabellini
> > > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>;
> > Kaly
> > > > Xin
> > > > > > <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@op-
> > > > lists.linaro.org>;
> > > > > > virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann
> > > > <arnd.bergmann@xxxxxxxxxx>;
> > > > > > Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka
> > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> > <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>;
> > Jean-
> > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier
> > > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev
> <Artem_Mygaiev@xxxxxxxx>;
> > > > Julien
> > > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul
> > Durrant
> > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
> > > > > >
> > > > > > Hi Wei,
> > > > > >
> > > > > > On Fri, Aug 20, 2021 at 03:41:50PM +0900, AKASHI Takahiro wrote:
> > > > > > > On Wed, Aug 18, 2021 at 08:35:51AM +0000, Wei Chen wrote:
> > > > > > > > Hi Akashi,
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > > > > > > > > Sent: 2021年8月18日 13:39
> > > > > > > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > > > > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano
> > > > Stabellini
> > > > > > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e
> > <alex.bennee@xxxxxxxxxx>;
> > > > > > Stratos
> > > > > > > > > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-
> > > > > > dev@lists.oasis-
> > > > > > > > > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh
> > Kumar
> > > > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan
> > Kiszka
> > > > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> > > > > > <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri
> > <vatsa@xxxxxxxxxxxxxx>;
> > > > > > Jean-
> > > > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu
> Poirier
> > > > > > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > > > > > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev
> > > > <Artem_Mygaiev@xxxxxxxx>;
> > > > > > Julien
> > > > > > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>;
> > Paul
> > > > > > Durrant
> > > > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO
> > backends
> > > > > > > > >
> > > > > > > > > On Tue, Aug 17, 2021 at 08:39:09AM +0000, Wei Chen wrote:
> > > > > > > > > > Hi Akashi,
> > > > > > > > > >
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>
> > > > > > > > > > > Sent: 2021年8月17日 16:08
> > > > > > > > > > > To: Wei Chen <Wei.Chen@xxxxxxx>
> > > > > > > > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>;
> Stefano
> > > > > > Stabellini
> > > > > > > > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e
> > > > <alex.bennee@xxxxxxxxxx>;
> > > > > > > > > Stratos
> > > > > > > > > > > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>;
> virtio-
> > > > > > > > > dev@lists.oasis-
> > > > > > > > > > > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>;
> > Viresh
> > > > Kumar
> > > > > > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > > > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx;
> > Jan
> > > > Kiszka
> > > > > > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> > > > > > <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > > > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri
> > > > <vatsa@xxxxxxxxxxxxxx>;
> > > > > > Jean-
> > > > > > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu
> > Poirier
> > > > > > > > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko
> > > > > > > > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis
> > > > > > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev
> > > > > > <Artem_Mygaiev@xxxxxxxx>;
> > > > > > > > > Julien
> > > > > > > > > > > Grall <julien@xxxxxxx>; Juergen Gross
> <jgross@xxxxxxxx>;
> > > > Paul
> > > > > > Durrant
> > > > > > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > > > > > > > Subject: Re: Enabling hypervisor agnosticism for
> VirtIO
> > > > backends
> > > > > > > > > > >
> > > > > > > > > > > Hi Wei, Oleksandr,
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 16, 2021 at 10:04:03AM +0000, Wei Chen
> wrote:
> > > > > > > > > > > > Hi All,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for Stefano to link my kvmtool for Xen
> proposal
> > > > here.
> > > > > > > > > > > > This proposal is still discussing in Xen and KVM
> > > > communities.
> > > > > > > > > > > > The main work is to decouple the kvmtool from KVM
> and
> > make
> > > > > > > > > > > > other hypervisors can reuse the virtual device
> > > > implementations.
> > > > > > > > > > > >
> > > > > > > > > > > > In this case, we need to introduce an intermediate
> > > > hypervisor
> > > > > > > > > > > > layer for VMM abstraction, Which is, I think it's
> very
> > > > close
> > > > > > > > > > > > to stratos' virtio hypervisor agnosticism work.
> > > > > > > > > > >
> > > > > > > > > > > # My proposal[1] comes from my own idea and doesn't
> > always
> > > > > > represent
> > > > > > > > > > > # Linaro's view on this subject nor reflect Alex's
> > concerns.
> > > > > > > > > Nevertheless,
> > > > > > > > > > >
> > > > > > > > > > > Your idea and my proposal seem to share the same
> > background.
> > > > > > > > > > > Both have the similar goal and currently start with,
> at
> > > > first,
> > > > > > Xen
> > > > > > > > > > > and are based on kvm-tool. (Actually, my work is
> derived
> > > > from
> > > > > > > > > > > EPAM's virtio-disk, which is also based on kvm-tool.)
> > > > > > > > > > >
> > > > > > > > > > > In particular, the abstraction of hypervisor
> interfaces
> > has
> > > > a
> > > > > > same
> > > > > > > > > > > set of interfaces (for your "struct vmm_impl" and my
> > "RPC
> > > > > > interfaces").
> > > > > > > > > > > This is not co-incident as we both share the same
> origin
> > as
> > > > I
> > > > > > said
> > > > > > > > > above.
> > > > > > > > > > > And so we will also share the same issues. One of them
> > is a
> > > > way
> > > > > > of
> > > > > > > > > > > "sharing/mapping FE's memory". There is some trade-off
> > > > between
> > > > > > > > > > > the portability and the performance impact.
> > > > > > > > > > > So we can discuss the topic here in this ML, too.
> > > > > > > > > > > (See Alex's original email, too).
> > > > > > > > > > >
> > > > > > > > > > Yes, I agree.
> > > > > > > > > >
> > > > > > > > > > > On the other hand, my approach aims to create a
> "single-
> > > > binary"
> > > > > > > > > solution
> > > > > > > > > > > in which the same binary of BE vm could run on any
> > > > hypervisors.
> > > > > > > > > > > Somehow similar to your "proposal-#2" in [2], but in
> my
> > > > solution,
> > > > > > all
> > > > > > > > > > > the hypervisor-specific code would be put into another
> > > > entity
> > > > > > (VM),
> > > > > > > > > > > named "virtio-proxy" and the abstracted operations are
> > > > served
> > > > > > via RPC.
> > > > > > > > > > > (In this sense, BE is hypervisor-agnostic but might
> have
> > OS
> > > > > > > > > dependency.)
> > > > > > > > > > > But I know that we need discuss if this is a
> requirement
> > > > even
> > > > > > > > > > > in Stratos project or not. (Maybe not)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Sorry, I haven't had time to finish reading your virtio-
> > proxy
> > > > > > completely
> > > > > > > > > > (I will do it ASAP). But from your description, it seems
> > we
> > > > need a
> > > > > > > > > > 3rd VM between FE and BE? My concern is that, if my
> > assumption
> > > > is
> > > > > > right,
> > > > > > > > > > will it increase the latency in data transport path?
> Even
> > if
> > > > we're
> > > > > > > > > > using some lightweight guest like RTOS or Unikernel,
> > > > > > > > >
> > > > > > > > > Yes, you're right. But I'm afraid that it is a matter of
> > degree.
> > > > > > > > > As far as we execute 'mapping' operations at every fetch
> of
> > > > payload,
> > > > > > > > > we will see latency issue (even in your case) and if we
> have
> > > > some
> > > > > > solution
> > > > > > > > > for it, we won't see it neither in my proposal :)
> > > > > > > > >
> > > > > > > >
> > > > > > > > Oleksandr has sent a proposal to Xen mailing list to reduce
> > this
> > > > kind
> > > > > > > > of "mapping/unmapping" operations. So the latency caused by
> > this
> > > > > > behavior
> > > > > > > > on Xen may eventually be eliminated, and Linux-KVM doesn't
> > have
> > > > that
> > > > > > problem.
> > > > > > >
> > > > > > > Obviously, I have not yet caught up there in the discussion.
> > > > > > > Which patch specifically?
> > > > > >
> > > > > > Can you give me the link to the discussion or patch, please?
> > > > > >
> > > > >
> > > > > It's a RFC discussion. We have tested this RFC patch internally.
> > > > > https://lists.xenproject.org/archives/html/xen-devel/2021-
> > > > 07/msg01532.html
> > > >
> > > > I'm afraid that I miss something here, but I don't know
> > > > why this proposed API will lead to eliminating 'mmap' in accessing
> > > > the queued payload at every request?
> > > >
> > >
> > > This API give Xen device model (QEMU or kvmtool) the ability to map
> > > whole guest RAM in device model's address space. In this case, device
> > > model doesn't need dynamic hypercall to map/unmap payload memory.
> > > It can use a flat offset to access payload memory in its address
> > > space directly. Just Like KVM device model does now.
> >
> > Thank you. Quickly, let me make sure one thing:
> > This API itself doesn't do any mapping operations, right?
> > So I suppose that virtio BE guest is responsible to
> > 1) fetch the information about all the memory regions in FE,
> > 2) call this API to allocate a big chunk of unused space in BE,
> > 3) create grant/foreign mappings for FE onto this region(S)
> > in the initialization/configuration of emulated virtio devices.
> >
> > Is this the way this API is expected to be used?
> > Does Xen already has an interface for (1)?
> >
> 
> They are discussing in that thread to find a proper way to do it.
> Because this API is common, both x86 and Arm should be considered.
> 

Please ignore my above reply. I hadn't seen Oleksandr had replied
this question. Sorry about it!

> > -Takahiro Akashi
> >
> > > Before this API, When device model to map whole guest memory, will
> > > severely consume the physical pages of Dom-0/Dom-D.
> > >
> > > > -Takahiro Akashi
> > > >
> > > >
> > > > > > Thanks,
> > > > > > -Takahiro Akashi
> > > > > >
> > > > > > > -Takahiro Akashi
> > > > > > >
> > > > > > > > > > > Specifically speaking about kvm-tool, I have a concern
> > about
> > > > its
> > > > > > > > > > > license term; Targeting different hypervisors and
> > different
> > > > OSs
> > > > > > > > > > > (which I assume includes RTOS's), the resultant
> library
> > > > should
> > > > > > be
> > > > > > > > > > > license permissive and GPL for kvm-tool might be an
> > issue.
> > > > > > > > > > > Any thoughts?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Yes. If user want to implement a FreeBSD device model,
> but
> > the
> > > > > > virtio
> > > > > > > > > > library is GPL. Then GPL would be a problem. If we have
> > > > another
> > > > > > good
> > > > > > > > > > candidate, I am open to it.
> > > > > > > > >
> > > > > > > > > I have some candidates, particularly for vq/vring, in my
> > mind:
> > > > > > > > > * Open-AMP, or
> > > > > > > > > * corresponding Free-BSD code
> > > > > > > > >
> > > > > > > >
> > > > > > > > Interesting, I will look into them : )
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Wei Chen
> > > > > > > >
> > > > > > > > > -Takahiro Akashi
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > -Takahiro Akashi
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > [1] https://op-lists.linaro.org/pipermail/stratos-
> > dev/2021-
> > > > > > > > > > > August/000548.html
> > > > > > > > > > > [2] https://marc.info/?l=xen-
> devel&m=162373754705233&w=2
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > From: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>
> > > > > > > > > > > > > Sent: 2021年8月14日 23:38
> > > > > > > > > > > > > To: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>;
> > > > Stefano
> > > > > > > > > Stabellini
> > > > > > > > > > > <sstabellini@xxxxxxxxxx>
> > > > > > > > > > > > > Cc: Alex Benn??e <alex.bennee@xxxxxxxxxx>; Stratos
> > > > Mailing
> > > > > > List
> > > > > > > > > > > <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio-
> > dev@lists.oasis-
> > > > > > open.org;
> > > > > > > > > Arnd
> > > > > > > > > > > Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh Kumar
> > > > > > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini
> > > > > > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx;
> > Jan
> > > > Kiszka
> > > > > > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik
> > > > > > <cvanscha@xxxxxxxxxxxxxxxx>;
> > > > > > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri
> > > > <vatsa@xxxxxxxxxxxxxx>;
> > > > > > Jean-
> > > > > > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu
> > Poirier
> > > > > > > > > > > <mathieu.poirier@xxxxxxxxxx>; Wei Chen
> > <Wei.Chen@xxxxxxx>;
> > > > > > Oleksandr
> > > > > > > > > > > Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand
> > Marquis
> > > > > > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev
> > > > > > <Artem_Mygaiev@xxxxxxxx>;
> > > > > > > > > Julien
> > > > > > > > > > > Grall <julien@xxxxxxx>; Juergen Gross
> <jgross@xxxxxxxx>;
> > > > Paul
> > > > > > Durrant
> > > > > > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx>
> > > > > > > > > > > > > Subject: Re: Enabling hypervisor agnosticism for
> > VirtIO
> > > > > > backends
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hello, all.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please see some comments below. And sorry for the
> > > > possible
> > > > > > format
> > > > > > > > > > > issues.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Aug 11, 2021 at 9:27 AM AKASHI Takahiro
> > > > > > > > > > > <mailto:takahiro.akashi@xxxxxxxxxx> wrote:
> > > > > > > > > > > > > > On Wed, Aug 04, 2021 at 12:20:01PM -0700,
> Stefano
> > > > > > Stabellini
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > CCing people working on Xen+VirtIO and IOREQs.
> > Not
> > > > > > trimming
> > > > > > > > > the
> > > > > > > > > > > original
> > > > > > > > > > > > > > > email to let them read the full context.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > My comments below are related to a potential
> Xen
> > > > > > > > > implementation,
> > > > > > > > > > > not
> > > > > > > > > > > > > > > because it is the only implementation that
> > matters,
> > > > but
> > > > > > > > > because it
> > > > > > > > > > > is
> > > > > > > > > > > > > > > the one I know best.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Please note that my proposal (and hence the
> > working
> > > > > > prototype)[1]
> > > > > > > > > > > > > > is based on Xen's virtio implementation (i.e.
> > IOREQ)
> > > > and
> > > > > > > > > > > particularly
> > > > > > > > > > > > > > EPAM's virtio-disk application (backend server).
> > > > > > > > > > > > > > It has been, I believe, well generalized but is
> > still
> > > > a
> > > > > > bit
> > > > > > > > > biased
> > > > > > > > > > > > > > toward this original design.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So I hope you like my approach :)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1] https://op-
> lists.linaro.org/pipermail/stratos-
> > > > > > dev/2021-
> > > > > > > > > > > August/000546.html
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Let me take this opportunity to explain a bit
> more
> > > > about
> > > > > > my
> > > > > > > > > approach
> > > > > > > > > > > below.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Also, please see this relevant email thread:
> > > > > > > > > > > > > > > https://marc.info/?l=xen-
> > devel&m=162373754705233&w=2
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, 4 Aug 2021, Alex Bennée wrote:
> > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > One of the goals of Project Stratos is to
> > enable
> > > > > > hypervisor
> > > > > > > > > > > agnostic
> > > > > > > > > > > > > > > > backends so we can enable as much re-use of
> > code
> > > > as
> > > > > > possible
> > > > > > > > > and
> > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > repeating ourselves. This is the flip side
> of
> > the
> > > > > > front end
> > > > > > > > > > > where
> > > > > > > > > > > > > > > > multiple front-end implementations are
> > required -
> > > > one
> > > > > > per OS,
> > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > you don't just want Linux guests. The
> > resultant
> > > > guests
> > > > > > are
> > > > > > > > > > > trivially
> > > > > > > > > > > > > > > > movable between hypervisors modulo any
> > abstracted
> > > > > > paravirt
> > > > > > > > > type
> > > > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > In my original thumb nail sketch of a
> solution
> > I
> > > > > > envisioned
> > > > > > > > > > > vhost-user
> > > > > > > > > > > > > > > > daemons running in a broadly POSIX like
> > > > environment.
> > > > > > The
> > > > > > > > > > > interface to
> > > > > > > > > > > > > > > > the daemon is fairly simple requiring only
> > some
> > > > mapped
> > > > > > > > > memory
> > > > > > > > > > > and some
> > > > > > > > > > > > > > > > sort of signalling for events (on Linux this
> > is
> > > > > > eventfd).
> > > > > > > > > The
> > > > > > > > > > > idea was a
> > > > > > > > > > > > > > > > stub binary would be responsible for any
> > > > hypervisor
> > > > > > specific
> > > > > > > > > > > setup and
> > > > > > > > > > > > > > > > then launch a common binary to deal with the
> > > > actual
> > > > > > > > > virtqueue
> > > > > > > > > > > requests
> > > > > > > > > > > > > > > > themselves.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Since that original sketch we've seen an
> > expansion
> > > > in
> > > > > > the
> > > > > > > > > sort
> > > > > > > > > > > of ways
> > > > > > > > > > > > > > > > backends could be created. There is interest
> > in
> > > > > > > > > encapsulating
> > > > > > > > > > > backends
> > > > > > > > > > > > > > > > in RTOSes or unikernels for solutions like
> > SCMI.
> > > > There
> > > > > > > > > interest
> > > > > > > > > > > in Rust
> > > > > > > > > > > > > > > > has prompted ideas of using the trait
> > interface to
> > > > > > abstract
> > > > > > > > > > > differences
> > > > > > > > > > > > > > > > away as well as the idea of bare-metal Rust
> > > > backends.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We have a card (STR-12) called "Hypercall
> > > > > > Standardisation"
> > > > > > > > > which
> > > > > > > > > > > > > > > > calls for a description of the APIs needed
> > from
> > > > the
> > > > > > > > > hypervisor
> > > > > > > > > > > side to
> > > > > > > > > > > > > > > > support VirtIO guests and their backends.
> > However
> > > > we
> > > > > > are
> > > > > > > > > some
> > > > > > > > > > > way off
> > > > > > > > > > > > > > > > from that at the moment as I think we need
> to
> > at
> > > > least
> > > > > > > > > > > demonstrate one
> > > > > > > > > > > > > > > > portable backend before we start codifying
> > > > > > requirements. To
> > > > > > > > > that
> > > > > > > > > > > end I
> > > > > > > > > > > > > > > > want to think about what we need for a
> backend
> > to
> > > > > > function.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Configuration
> > > > > > > > > > > > > > > > =============
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > In the type-2 setup this is typically fairly
> > > > simple
> > > > > > because
> > > > > > > > > the
> > > > > > > > > > > host
> > > > > > > > > > > > > > > > system can orchestrate the various modules
> > that
> > > > make
> > > > > > up the
> > > > > > > > > > > complete
> > > > > > > > > > > > > > > > system. In the type-1 case (or even type-2
> > with
> > > > > > delegated
> > > > > > > > > > > service VMs)
> > > > > > > > > > > > > > > > we need some sort of mechanism to inform the
> > > > backend
> > > > > > VM
> > > > > > > > > about
> > > > > > > > > > > key
> > > > > > > > > > > > > > > > details about the system:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >   - where virt queue memory is in it's
> address
> > > > space
> > > > > > > > > > > > > > > >   - how it's going to receive (interrupt)
> and
> > > > trigger
> > > > > > (kick)
> > > > > > > > > > > events
> > > > > > > > > > > > > > > >   - what (if any) resources the backend
> needs
> > to
> > > > > > connect to
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Obviously you can elide over configuration
> > issues
> > > > by
> > > > > > having
> > > > > > > > > > > static
> > > > > > > > > > > > > > > > configurations and baking the assumptions
> into
> > > > your
> > > > > > guest
> > > > > > > > > images
> > > > > > > > > > > however
> > > > > > > > > > > > > > > > this isn't scalable in the long term. The
> > obvious
> > > > > > solution
> > > > > > > > > seems
> > > > > > > > > > > to be
> > > > > > > > > > > > > > > > extending a subset of Device Tree data to
> user
> > > > space
> > > > > > but
> > > > > > > > > perhaps
> > > > > > > > > > > there
> > > > > > > > > > > > > > > > are other approaches?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Before any virtio transactions can take
> place
> > the
> > > > > > > > > appropriate
> > > > > > > > > > > memory
> > > > > > > > > > > > > > > > mappings need to be made between the FE
> guest
> > and
> > > > the
> > > > > > BE
> > > > > > > > > guest.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Currently the whole of the FE guests address
> > space
> > > > > > needs to
> > > > > > > > > be
> > > > > > > > > > > visible
> > > > > > > > > > > > > > > > to whatever is serving the virtio requests.
> I
> > can
> > > > > > envision 3
> > > > > > > > > > > approaches:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  * BE guest boots with memory already mapped
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  This would entail the guest OS knowing
> where
> > in
> > > > it's
> > > > > > Guest
> > > > > > > > > > > Physical
> > > > > > > > > > > > > > > >  Address space is already taken up and
> > avoiding
> > > > > > clashing. I
> > > > > > > > > > > would assume
> > > > > > > > > > > > > > > >  in this case you would want a standard
> > interface
> > > > to
> > > > > > > > > userspace
> > > > > > > > > > > to then
> > > > > > > > > > > > > > > >  make that address space visible to the
> > backend
> > > > daemon.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yet another way here is that we would have well
> > known
> > > > > > "shared
> > > > > > > > > > > memory" between
> > > > > > > > > > > > > > VMs. I think that Jailhouse's ivshmem gives us
> > good
> > > > > > insights on
> > > > > > > > > this
> > > > > > > > > > > matter
> > > > > > > > > > > > > > and that it can even be an alternative for
> > hypervisor-
> > > > > > agnostic
> > > > > > > > > > > solution.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > (Please note memory regions in ivshmem appear as
> a
> > PCI
> > > > > > device
> > > > > > > > > and
> > > > > > > > > > > can be
> > > > > > > > > > > > > > mapped locally.)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I want to add this shared memory aspect to my
> > virtio-
> > > > proxy,
> > > > > > but
> > > > > > > > > > > > > > the resultant solution would eventually look
> > similar
> > > > to
> > > > > > ivshmem.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  * BE guests boots with a hypervisor handle
> to
> > > > memory
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  The BE guest is then free to map the FE's
> > memory
> > > > to
> > > > > > where
> > > > > > > > > it
> > > > > > > > > > > wants in
> > > > > > > > > > > > > > > >  the BE's guest physical address space.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I cannot see how this could work for Xen.
> There
> > is
> > > > no
> > > > > > "handle"
> > > > > > > > > to
> > > > > > > > > > > give
> > > > > > > > > > > > > > > to the backend if the backend is not running
> in
> > dom0.
> > > > So
> > > > > > for
> > > > > > > > > Xen I
> > > > > > > > > > > think
> > > > > > > > > > > > > > > the memory has to be already mapped
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In Xen's IOREQ solution (virtio-blk), the
> > following
> > > > > > information
> > > > > > > > > is
> > > > > > > > > > > expected
> > > > > > > > > > > > > > to be exposed to BE via Xenstore:
> > > > > > > > > > > > > > (I know that this is a tentative approach
> though.)
> > > > > > > > > > > > > >    - the start address of configuration space
> > > > > > > > > > > > > >    - interrupt number
> > > > > > > > > > > > > >    - file path for backing storage
> > > > > > > > > > > > > >    - read-only flag
> > > > > > > > > > > > > > And the BE server have to call a particular
> > hypervisor
> > > > > > interface
> > > > > > > > > to
> > > > > > > > > > > > > > map the configuration space.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, Xenstore was chosen as a simple way to pass
> > > > > > configuration
> > > > > > > > > info to
> > > > > > > > > > > the backend running in a non-toolstack domain.
> > > > > > > > > > > > > I remember, there was a wish to avoid using
> Xenstore
> > in
> > > > > > Virtio
> > > > > > > > > backend
> > > > > > > > > > > itself if possible, so for non-toolstack domain, this
> > could
> > > > done
> > > > > > with
> > > > > > > > > > > adjusting devd (daemon that listens for devices and
> > launches
> > > > > > backends)
> > > > > > > > > > > > > to read backend configuration from the Xenstore
> > anyway
> > > > and
> > > > > > pass it
> > > > > > > > > to
> > > > > > > > > > > the backend via command line arguments.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, in current PoC code we're using xenstore to
> pass
> > > > device
> > > > > > > > > > > configuration.
> > > > > > > > > > > > We also designed a static device configuration parse
> > > > method
> > > > > > for
> > > > > > > > > Dom0less
> > > > > > > > > > > or
> > > > > > > > > > > > other scenarios don't have xentool. yes, it's from
> > device
> > > > > > model
> > > > > > > > > command
> > > > > > > > > > > line
> > > > > > > > > > > > or a config file.
> > > > > > > > > > > >
> > > > > > > > > > > > > But, if ...
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In my approach (virtio-proxy), all those Xen (or
> > > > > > hypervisor)-
> > > > > > > > > > > specific
> > > > > > > > > > > > > > stuffs are contained in virtio-proxy, yet
> another
> > VM,
> > > > to
> > > > > > hide
> > > > > > > > > all
> > > > > > > > > > > details.
> > > > > > > > > > > > >
> > > > > > > > > > > > > ... the solution how to overcome that is already
> > found
> > > > and
> > > > > > proven
> > > > > > > > > to
> > > > > > > > > > > work then even better.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > # My point is that a "handle" is not mandatory
> for
> > > > > > executing
> > > > > > > > > mapping.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > and the mapping probably done by the
> > > > > > > > > > > > > > > toolstack (also see below.) Or we would have
> to
> > > > invent a
> > > > > > new
> > > > > > > > > Xen
> > > > > > > > > > > > > > > hypervisor interface and Xen virtual machine
> > > > privileges
> > > > > > to
> > > > > > > > > allow
> > > > > > > > > > > this
> > > > > > > > > > > > > > > kind of mapping.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If we run the backend in Dom0 that we have no
> > > > problems
> > > > > > of
> > > > > > > > > course.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > One of difficulties on Xen that I found in my
> > approach
> > > > is
> > > > > > that
> > > > > > > > > > > calling
> > > > > > > > > > > > > > such hypervisor intefaces (registering IOREQ,
> > mapping
> > > > > > memory) is
> > > > > > > > > > > only
> > > > > > > > > > > > > > allowed on BE servers themselvies and so we will
> > have
> > > > to
> > > > > > extend
> > > > > > > > > > > those
> > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > > This, however, will raise some concern on
> security
> > and
> > > > > > privilege
> > > > > > > > > > > distribution
> > > > > > > > > > > > > > as Stefan suggested.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We also faced policy related issues with Virtio
> > backend
> > > > > > running in
> > > > > > > > > > > other than Dom0 domain in a "dummy" xsm mode. In our
> > target
> > > > > > system we
> > > > > > > > > run
> > > > > > > > > > > the backend in a driver
> > > > > > > > > > > > > domain (we call it DomD) where the underlying H/W
> > > > resides.
> > > > > > We
> > > > > > > > > trust it,
> > > > > > > > > > > so we wrote policy rules (to be used in "flask" xsm
> mode)
> > to
> > > > > > provide
> > > > > > > > > it
> > > > > > > > > > > with a little bit more privileges than a simple DomU
> had.
> > > > > > > > > > > > > Now it is permitted to issue device-model,
> resource
> > and
> > > > > > memory
> > > > > > > > > > > mappings, etc calls.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > To activate the mapping will
> > > > > > > > > > > > > > > >  require some sort of hypercall to the
> > hypervisor.
> > > > I
> > > > > > can see
> > > > > > > > > two
> > > > > > > > > > > options
> > > > > > > > > > > > > > > >  at this point:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >   - expose the handle to userspace for
> > > > daemon/helper
> > > > > > to
> > > > > > > > > trigger
> > > > > > > > > > > the
> > > > > > > > > > > > > > > >     mapping via existing hypercall
> interfaces.
> > If
> > > > > > using a
> > > > > > > > > helper
> > > > > > > > > > > you
> > > > > > > > > > > > > > > >     would have a hypervisor specific one to
> > avoid
> > > > the
> > > > > > daemon
> > > > > > > > > > > having to
> > > > > > > > > > > > > > > >     care too much about the details or push
> > that
> > > > > > complexity
> > > > > > > > > into
> > > > > > > > > > > a
> > > > > > > > > > > > > > > >     compile time option for the daemon which
> > would
> > > > > > result in
> > > > > > > > > > > different
> > > > > > > > > > > > > > > >     binaries although a common source base.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >   - expose a new kernel ABI to abstract the
> > > > hypercall
> > > > > > > > > > > differences away
> > > > > > > > > > > > > > > >     in the guest kernel. In this case the
> > > > userspace
> > > > > > would
> > > > > > > > > > > essentially
> > > > > > > > > > > > > > > >     ask for an abstract "map guest N memory
> to
> > > > > > userspace
> > > > > > > > > ptr"
> > > > > > > > > > > and let
> > > > > > > > > > > > > > > >     the kernel deal with the different
> > hypercall
> > > > > > interfaces.
> > > > > > > > > > > This of
> > > > > > > > > > > > > > > >     course assumes the majority of BE guests
> > would
> > > > be
> > > > > > Linux
> > > > > > > > > > > kernels and
> > > > > > > > > > > > > > > >     leaves the bare-metal/unikernel
> approaches
> > to
> > > > > > their own
> > > > > > > > > > > devices.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Operation
> > > > > > > > > > > > > > > > =========
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The core of the operation of VirtIO is
> fairly
> > > > simple.
> > > > > > Once
> > > > > > > > > the
> > > > > > > > > > > > > > > > vhost-user feature negotiation is done it's
> a
> > case
> > > > of
> > > > > > > > > receiving
> > > > > > > > > > > update
> > > > > > > > > > > > > > > > events and parsing the resultant virt queue
> > for
> > > > data.
> > > > > > The
> > > > > > > > > vhost-
> > > > > > > > > > > user
> > > > > > > > > > > > > > > > specification handles a bunch of setup
> before
> > that
> > > > > > point,
> > > > > > > > > mostly
> > > > > > > > > > > to
> > > > > > > > > > > > > > > > detail where the virt queues are set up FD's
> > for
> > > > > > memory and
> > > > > > > > > > > event
> > > > > > > > > > > > > > > > communication. This is where the envisioned
> > stub
> > > > > > process
> > > > > > > > > would
> > > > > > > > > > > be
> > > > > > > > > > > > > > > > responsible for getting the daemon up and
> > ready to
> > > > run.
> > > > > > This
> > > > > > > > > is
> > > > > > > > > > > > > > > > currently done inside a big VMM like QEMU
> but
> > I
> > > > > > suspect a
> > > > > > > > > modern
> > > > > > > > > > > > > > > > approach would be to use the rust-vmm vhost
> > crate.
> > > > It
> > > > > > would
> > > > > > > > > then
> > > > > > > > > > > either
> > > > > > > > > > > > > > > > communicate with the kernel's abstracted ABI
> > or be
> > > > re-
> > > > > > > > > targeted
> > > > > > > > > > > as a
> > > > > > > > > > > > > > > > build option for the various hypervisors.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > One thing I mentioned before to Alex is that
> Xen
> > > > doesn't
> > > > > > have
> > > > > > > > > VMMs
> > > > > > > > > > > the
> > > > > > > > > > > > > > > way they are typically envisioned and
> described
> > in
> > > > other
> > > > > > > > > > > environments.
> > > > > > > > > > > > > > > Instead, Xen has IOREQ servers. Each of them
> > > > connects
> > > > > > > > > > > independently to
> > > > > > > > > > > > > > > Xen via the IOREQ interface. E.g. today
> multiple
> > > > QEMUs
> > > > > > could
> > > > > > > > > be
> > > > > > > > > > > used as
> > > > > > > > > > > > > > > emulators for a single Xen VM, each of them
> > > > connecting
> > > > > > to Xen
> > > > > > > > > > > > > > > independently via the IOREQ interface.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The component responsible for starting a
> daemon
> > > > and/or
> > > > > > setting
> > > > > > > > > up
> > > > > > > > > > > shared
> > > > > > > > > > > > > > > interfaces is the toolstack: the xl command
> and
> > the
> > > > > > > > > libxl/libxc
> > > > > > > > > > > > > > > libraries.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think that VM configuration management (or
> > > > orchestration
> > > > > > in
> > > > > > > > > > > Startos
> > > > > > > > > > > > > > jargon?) is a subject to debate in parallel.
> > > > > > > > > > > > > > Otherwise, is there any good assumption to avoid
> > it
> > > > right
> > > > > > now?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Oleksandr and others I CCed have been working
> on
> > > > ways
> > > > > > for the
> > > > > > > > > > > toolstack
> > > > > > > > > > > > > > > to create virtio backends and setup memory
> > mappings.
> > > > > > They
> > > > > > > > > might be
> > > > > > > > > > > able
> > > > > > > > > > > > > > > to provide more info on the subject. I do
> think
> > we
> > > > miss
> > > > > > a way
> > > > > > > > > to
> > > > > > > > > > > provide
> > > > > > > > > > > > > > > the configuration to the backend and anything
> > else
> > > > that
> > > > > > the
> > > > > > > > > > > backend
> > > > > > > > > > > > > > > might require to start doing its job.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, some work has been done for the toolstack to
> > handle
> > > > > > Virtio
> > > > > > > > > MMIO
> > > > > > > > > > > devices in
> > > > > > > > > > > > > general and Virtio block devices in particular.
> > However,
> > > > it
> > > > > > has
> > > > > > > > > not
> > > > > > > > > > > been upstreaned yet.
> > > > > > > > > > > > > Updated patches on review now:
> > > > > > > > > > > > > https://lore.kernel.org/xen-devel/1621626361-
> 29076-
> > 1-
> > > > git-
> > > > > > send-
> > > > > > > > > email-
> > > > > > > > > > > olekstysh@xxxxxxxxx/
> > > > > > > > > > > > >
> > > > > > > > > > > > > There is an additional (also important) activity
> to
> > > > > > improve/fix
> > > > > > > > > > > foreign memory mapping on Arm which I am also involved
> > in.
> > > > > > > > > > > > > The foreign memory mapping is proposed to be used
> > for
> > > > Virtio
> > > > > > > > > backends
> > > > > > > > > > > (device emulators) if there is a need to run guest OS
> > > > completely
> > > > > > > > > > > unmodified.
> > > > > > > > > > > > > Of course, the more secure way would be to use
> grant
> > > > memory
> > > > > > > > > mapping.
> > > > > > > > > > > Brietly, the main difference between them is that with
> > > > foreign
> > > > > > mapping
> > > > > > > > > the
> > > > > > > > > > > backend
> > > > > > > > > > > > > can map any guest memory it wants to map, but with
> > grant
> > > > > > mapping
> > > > > > > > > it is
> > > > > > > > > > > allowed to map only what was previously granted by the
> > > > frontend.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So, there might be a problem if we want to pre-map
> > some
> > > > > > guest
> > > > > > > > > memory
> > > > > > > > > > > in advance or to cache mappings in the backend in
> order
> > to
> > > > > > improve
> > > > > > > > > > > performance (because the mapping/unmapping guest pages
> > every
> > > > > > request
> > > > > > > > > > > requires a lot of back and forth to Xen + P2M updates).
> > In a
> > > > > > nutshell,
> > > > > > > > > > > currently, in order to map a guest page into the
> backend
> > > > address
> > > > > > space
> > > > > > > > > we
> > > > > > > > > > > need to steal a real physical page from the backend
> > domain.
> > > > So,
> > > > > > with
> > > > > > > > > the
> > > > > > > > > > > said optimizations we might end up with no free memory
> > in
> > > > the
> > > > > > backend
> > > > > > > > > > > domain (see XSA-300). And what we try to achieve is to
> > not
> > > > waste
> > > > > > a
> > > > > > > > > real
> > > > > > > > > > > domain memory at all by providing safe non-allocated-
> yet
> > (so
> > > > > > unused)
> > > > > > > > > > > address space for the foreign (and grant) pages to be
> > mapped
> > > > > > into,
> > > > > > > > > this
> > > > > > > > > > > enabling work implies Xen and Linux (and likely DTB
> > bindings)
> > > > > > changes.
> > > > > > > > > > > However, as it turned out, for this to work in a
> proper
> > and
> > > > safe
> > > > > > way
> > > > > > > > > some
> > > > > > > > > > > prereq work needs to be done.
> > > > > > > > > > > > > You can find the related Xen discussion at:
> > > > > > > > > > > > > https://lore.kernel.org/xen-devel/1627489110-
> 25633-
> > 1-
> > > > git-
> > > > > > send-
> > > > > > > > > email-
> > > > > > > > > > > olekstysh@xxxxxxxxx/
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > One question is how to best handle
> > notification
> > > > and
> > > > > > kicks.
> > > > > > > > > The
> > > > > > > > > > > existing
> > > > > > > > > > > > > > > > vhost-user framework uses eventfd to signal
> > the
> > > > daemon
> > > > > > > > > (although
> > > > > > > > > > > QEMU
> > > > > > > > > > > > > > > > is quite capable of simulating them when you
> > use
> > > > TCG).
> > > > > > Xen
> > > > > > > > > has
> > > > > > > > > > > it's own
> > > > > > > > > > > > > > > > IOREQ mechanism. However latency is an
> > important
> > > > > > factor and
> > > > > > > > > > > having
> > > > > > > > > > > > > > > > events go through the stub would add quite a
> > lot.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yeah I think, regardless of anything else, we
> > want
> > > > the
> > > > > > > > > backends to
> > > > > > > > > > > > > > > connect directly to the Xen hypervisor.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In my approach,
> > > > > > > > > > > > > >  a) BE -> FE: interrupts triggered by BE calling
> a
> > > > > > hypervisor
> > > > > > > > > > > interface
> > > > > > > > > > > > > >               via virtio-proxy
> > > > > > > > > > > > > >  b) FE -> BE: MMIO to config raises events (in
> > event
> > > > > > channels),
> > > > > > > > > > > which is
> > > > > > > > > > > > > >               converted to a callback to BE via
> > > > virtio-
> > > > > > proxy
> > > > > > > > > > > > > >               (Xen's event channel is
> internnally
> > > > > > implemented by
> > > > > > > > > > > interrupts.)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't know what "connect directly" means here,
> > but
> > > > > > sending
> > > > > > > > > > > interrupts
> > > > > > > > > > > > > > to the opposite side would be best efficient.
> > > > > > > > > > > > > > Ivshmem, I suppose, takes this approach by
> > utilizing
> > > > PCI's
> > > > > > msi-x
> > > > > > > > > > > mechanism.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Agree that MSI would be more efficient than SPI...
> > > > > > > > > > > > > At the moment, in order to notify the frontend,
> the
> > > > backend
> > > > > > issues
> > > > > > > > > a
> > > > > > > > > > > specific device-model call to query Xen to inject a
> > > > > > corresponding SPI
> > > > > > > > > to
> > > > > > > > > > > the guest.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Could we consider the kernel internally
> > converting
> > > > > > IOREQ
> > > > > > > > > > > messages from
> > > > > > > > > > > > > > > > the Xen hypervisor to eventfd events? Would
> > this
> > > > scale
> > > > > > with
> > > > > > > > > > > other kernel
> > > > > > > > > > > > > > > > hypercall interfaces?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So any thoughts on what directions are worth
> > > > > > experimenting
> > > > > > > > > with?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > One option we should consider is for each
> > backend to
> > > > > > connect
> > > > > > > > > to
> > > > > > > > > > > Xen via
> > > > > > > > > > > > > > > the IOREQ interface. We could generalize the
> > IOREQ
> > > > > > interface
> > > > > > > > > and
> > > > > > > > > > > make it
> > > > > > > > > > > > > > > hypervisor agnostic. The interface is really
> > trivial
> > > > and
> > > > > > easy
> > > > > > > > > to
> > > > > > > > > > > add.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As I said above, my proposal does the same thing
> > that
> > > > you
> > > > > > > > > mentioned
> > > > > > > > > > > here :)
> > > > > > > > > > > > > > The difference is that I do call hypervisor
> > interfaces
> > > > via
> > > > > > > > > virtio-
> > > > > > > > > > > proxy.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The only Xen-specific part is the notification
> > > > mechanism,
> > > > > > > > > which is
> > > > > > > > > > > an
> > > > > > > > > > > > > > > event channel. If we replaced the event
> channel
> > with
> > > > > > something
> > > > > > > > > > > else the
> > > > > > > > > > > > > > > interface would be generic. See:
> > > > > > > > > > > > > > > https://gitlab.com/xen-project/xen/-
> > > > > > > > > > > /blob/staging/xen/include/public/hvm/ioreq.h#L52
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I don't think that translating IOREQs to
> eventfd
> > in
> > > > the
> > > > > > kernel
> > > > > > > > > is
> > > > > > > > > > > a
> > > > > > > > > > > > > > > good idea: if feels like it would be extra
> > > > complexity
> > > > > > and that
> > > > > > > > > the
> > > > > > > > > > > > > > > kernel shouldn't be involved as this is a
> > backend-
> > > > > > hypervisor
> > > > > > > > > > > interface.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Given that we may want to implement BE as a
> bare-
> > metal
> > > > > > > > > application
> > > > > > > > > > > > > > as I did on Zephyr, I don't think that the
> > translation
> > > > > > would not
> > > > > > > > > be
> > > > > > > > > > > > > > a big issue, especially on RTOS's.
> > > > > > > > > > > > > > It will be some kind of abstraction layer of
> > interrupt
> > > > > > handling
> > > > > > > > > > > > > > (or nothing but a callback mechanism).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Also, eventfd is very Linux-centric and we are
> > > > trying to
> > > > > > > > > design an
> > > > > > > > > > > > > > > interface that could work well for RTOSes too.
> > If we
> > > > > > want to
> > > > > > > > > do
> > > > > > > > > > > > > > > something different, both OS-agnostic and
> > > > hypervisor-
> > > > > > agnostic,
> > > > > > > > > > > perhaps
> > > > > > > > > > > > > > > we could design a new interface. One that
> could
> > be
> > > > > > > > > implementable
> > > > > > > > > > > in the
> > > > > > > > > > > > > > > Xen hypervisor itself (like IOREQ) and of
> course
> > any
> > > > > > other
> > > > > > > > > > > hypervisor
> > > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > There is also another problem. IOREQ is
> probably
> > not
> > > > be
> > > > > > the
> > > > > > > > > only
> > > > > > > > > > > > > > > interface needed. Have a look at
> > > > > > > > > > > > > > > https://marc.info/?l=xen-
> > devel&m=162373754705233&w=2.
> > > > > > Don't we
> > > > > > > > > > > also need
> > > > > > > > > > > > > > > an interface for the backend to inject
> > interrupts
> > > > into
> > > > > > the
> > > > > > > > > > > frontend? And
> > > > > > > > > > > > > > > if the backend requires dynamic memory
> mappings
> > of
> > > > > > frontend
> > > > > > > > > pages,
> > > > > > > > > > > then
> > > > > > > > > > > > > > > we would also need an interface to map/unmap
> > domU
> > > > pages.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > My proposal document might help here; All the
> > > > interfaces
> > > > > > > > > required
> > > > > > > > > > > for
> > > > > > > > > > > > > > virtio-proxy (or hypervisor-related interfaces)
> > are
> > > > listed
> > > > > > as
> > > > > > > > > > > > > > RPC protocols :)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > These interfaces are a lot more problematic
> than
> > > > IOREQ:
> > > > > > IOREQ
> > > > > > > > > is
> > > > > > > > > > > tiny
> > > > > > > > > > > > > > > and self-contained. It is easy to add anywhere.
> > A
> > > > new
> > > > > > > > > interface to
> > > > > > > > > > > > > > > inject interrupts or map pages is more
> difficult
> > to
> > > > > > manage
> > > > > > > > > because
> > > > > > > > > > > it
> > > > > > > > > > > > > > > would require changes scattered across the
> > various
> > > > > > emulators.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Exactly. I have no confident yet that my
> approach
> > will
> > > > > > also
> > > > > > > > > apply
> > > > > > > > > > > > > > to other hypervisors than Xen.
> > > > > > > > > > > > > > Technically, yes, but whether people can accept
> it
> > or
> > > > not
> > > > > > is a
> > > > > > > > > > > different
> > > > > > > > > > > > > > matter.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > -Takahiro Akashi
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Oleksandr Tyshchenko


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.