[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: Enabling hypervisor agnosticism for VirtIO backends
Hi Akashi, Oleksandr, > -----Original Message----- > From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Wei > Chen > Sent: 2021年9月2日 9:31 > To: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx> > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>; Kaly Xin > <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; > virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>; > Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik <cvanscha@xxxxxxxxxxxxxxxx>; > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; Jean- > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>; Julien > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul Durrant > <paul@xxxxxxx>; nd <nd@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx> > Subject: RE: Enabling hypervisor agnosticism for VirtIO backends > > Hi Akashi, > > > -----Original Message----- > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx> > > Sent: 2021年9月1日 20:29 > > To: Wei Chen <Wei.Chen@xxxxxxx> > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>; Kaly > Xin > > <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@op- > lists.linaro.org>; > > virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann > <arnd.bergmann@xxxxxxxxxx>; > > Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik <cvanscha@xxxxxxxxxxxxxxxx>; > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; Jean- > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>; > Julien > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul Durrant > > <paul@xxxxxxx>; nd <nd@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx> > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends > > > > Hi Wei, > > > > On Wed, Sep 01, 2021 at 11:12:58AM +0000, Wei Chen wrote: > > > Hi Akashi, > > > > > > > -----Original Message----- > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx> > > > > Sent: 2021年8月31日 14:18 > > > > To: Wei Chen <Wei.Chen@xxxxxxx> > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano Stabellini > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>; > Kaly > > Xin > > > > <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@op- > > lists.linaro.org>; > > > > virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann > > <arnd.bergmann@xxxxxxxxxx>; > > > > Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik > <cvanscha@xxxxxxxxxxxxxxxx>; > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; > Jean- > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>; > > Julien > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul > Durrant > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx> > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends > > > > > > > > Wei, > > > > > > > > On Thu, Aug 26, 2021 at 12:10:19PM +0000, Wei Chen wrote: > > > > > Hi Akashi, > > > > > > > > > > > -----Original Message----- > > > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx> > > > > > > Sent: 2021年8月26日 17:41 > > > > > > To: Wei Chen <Wei.Chen@xxxxxxx> > > > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano > Stabellini > > > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e <alex.bennee@xxxxxxxxxx>; > > Kaly > > > > Xin > > > > > > <Kaly.Xin@xxxxxxx>; Stratos Mailing List <stratos-dev@op- > > > > lists.linaro.org>; > > > > > > virtio-dev@xxxxxxxxxxxxxxxxxxxx; Arnd Bergmann > > > > <arnd.bergmann@xxxxxxxxxx>; > > > > > > Viresh Kumar <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan Kiszka > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik > > <cvanscha@xxxxxxxxxxxxxxxx>; > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxx>; > > Jean- > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu Poirier > > > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko > > > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev > <Artem_Mygaiev@xxxxxxxx>; > > > > Julien > > > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; Paul > > Durrant > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx> > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends > > > > > > > > > > > > Hi Wei, > > > > > > > > > > > > On Fri, Aug 20, 2021 at 03:41:50PM +0900, AKASHI Takahiro wrote: > > > > > > > On Wed, Aug 18, 2021 at 08:35:51AM +0000, Wei Chen wrote: > > > > > > > > Hi Akashi, > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx> > > > > > > > > > Sent: 2021年8月18日 13:39 > > > > > > > > > To: Wei Chen <Wei.Chen@xxxxxxx> > > > > > > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; Stefano > > > > Stabellini > > > > > > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e > > <alex.bennee@xxxxxxxxxx>; > > > > > > Stratos > > > > > > > > > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio- > > > > > > dev@lists.oasis- > > > > > > > > > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh > > Kumar > > > > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini > > > > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; Jan > > Kiszka > > > > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik > > > > > > <cvanscha@xxxxxxxxxxxxxxxx>; > > > > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri > > <vatsa@xxxxxxxxxxxxxx>; > > > > > > Jean- > > > > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu > Poirier > > > > > > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko > > > > > > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis > > > > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev > > > > <Artem_Mygaiev@xxxxxxxx>; > > > > > > Julien > > > > > > > > > Grall <julien@xxxxxxx>; Juergen Gross <jgross@xxxxxxxx>; > > Paul > > > > > > Durrant > > > > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx> > > > > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO > > backends > > > > > > > > > > > > > > > > > > On Tue, Aug 17, 2021 at 08:39:09AM +0000, Wei Chen wrote: > > > > > > > > > > Hi Akashi, > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > > From: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx> > > > > > > > > > > > Sent: 2021年8月17日 16:08 > > > > > > > > > > > To: Wei Chen <Wei.Chen@xxxxxxx> > > > > > > > > > > > Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>; > Stefano > > > > > > Stabellini > > > > > > > > > > > <sstabellini@xxxxxxxxxx>; Alex Benn??e > > > > <alex.bennee@xxxxxxxxxx>; > > > > > > > > > Stratos > > > > > > > > > > > Mailing List <stratos-dev@xxxxxxxxxxxxxxxxxxx>; > virtio- > > > > > > > > > dev@lists.oasis- > > > > > > > > > > > open.org; Arnd Bergmann <arnd.bergmann@xxxxxxxxxx>; > > Viresh > > > > Kumar > > > > > > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini > > > > > > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; > > Jan > > > > Kiszka > > > > > > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik > > > > > > <cvanscha@xxxxxxxxxxxxxxxx>; > > > > > > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri > > > > <vatsa@xxxxxxxxxxxxxx>; > > > > > > Jean- > > > > > > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu > > Poirier > > > > > > > > > > > <mathieu.poirier@xxxxxxxxxx>; Oleksandr Tyshchenko > > > > > > > > > > > <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand Marquis > > > > > > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev > > > > > > <Artem_Mygaiev@xxxxxxxx>; > > > > > > > > > Julien > > > > > > > > > > > Grall <julien@xxxxxxx>; Juergen Gross > <jgross@xxxxxxxx>; > > > > Paul > > > > > > Durrant > > > > > > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx> > > > > > > > > > > > Subject: Re: Enabling hypervisor agnosticism for > VirtIO > > > > backends > > > > > > > > > > > > > > > > > > > > > > Hi Wei, Oleksandr, > > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 16, 2021 at 10:04:03AM +0000, Wei Chen > wrote: > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > Thanks for Stefano to link my kvmtool for Xen > proposal > > > > here. > > > > > > > > > > > > This proposal is still discussing in Xen and KVM > > > > communities. > > > > > > > > > > > > The main work is to decouple the kvmtool from KVM > and > > make > > > > > > > > > > > > other hypervisors can reuse the virtual device > > > > implementations. > > > > > > > > > > > > > > > > > > > > > > > > In this case, we need to introduce an intermediate > > > > hypervisor > > > > > > > > > > > > layer for VMM abstraction, Which is, I think it's > very > > > > close > > > > > > > > > > > > to stratos' virtio hypervisor agnosticism work. > > > > > > > > > > > > > > > > > > > > > > # My proposal[1] comes from my own idea and doesn't > > always > > > > > > represent > > > > > > > > > > > # Linaro's view on this subject nor reflect Alex's > > concerns. > > > > > > > > > Nevertheless, > > > > > > > > > > > > > > > > > > > > > > Your idea and my proposal seem to share the same > > background. > > > > > > > > > > > Both have the similar goal and currently start with, > at > > > > first, > > > > > > Xen > > > > > > > > > > > and are based on kvm-tool. (Actually, my work is > derived > > > > from > > > > > > > > > > > EPAM's virtio-disk, which is also based on kvm-tool.) > > > > > > > > > > > > > > > > > > > > > > In particular, the abstraction of hypervisor > interfaces > > has > > > > a > > > > > > same > > > > > > > > > > > set of interfaces (for your "struct vmm_impl" and my > > "RPC > > > > > > interfaces"). > > > > > > > > > > > This is not co-incident as we both share the same > origin > > as > > > > I > > > > > > said > > > > > > > > > above. > > > > > > > > > > > And so we will also share the same issues. One of them > > is a > > > > way > > > > > > of > > > > > > > > > > > "sharing/mapping FE's memory". There is some trade-off > > > > between > > > > > > > > > > > the portability and the performance impact. > > > > > > > > > > > So we can discuss the topic here in this ML, too. > > > > > > > > > > > (See Alex's original email, too). > > > > > > > > > > > > > > > > > > > > > Yes, I agree. > > > > > > > > > > > > > > > > > > > > > On the other hand, my approach aims to create a > "single- > > > > binary" > > > > > > > > > solution > > > > > > > > > > > in which the same binary of BE vm could run on any > > > > hypervisors. > > > > > > > > > > > Somehow similar to your "proposal-#2" in [2], but in > my > > > > solution, > > > > > > all > > > > > > > > > > > the hypervisor-specific code would be put into another > > > > entity > > > > > > (VM), > > > > > > > > > > > named "virtio-proxy" and the abstracted operations are > > > > served > > > > > > via RPC. > > > > > > > > > > > (In this sense, BE is hypervisor-agnostic but might > have > > OS > > > > > > > > > dependency.) > > > > > > > > > > > But I know that we need discuss if this is a > requirement > > > > even > > > > > > > > > > > in Stratos project or not. (Maybe not) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sorry, I haven't had time to finish reading your virtio- > > proxy > > > > > > completely > > > > > > > > > > (I will do it ASAP). But from your description, it seems > > we > > > > need a > > > > > > > > > > 3rd VM between FE and BE? My concern is that, if my > > assumption > > > > is > > > > > > right, > > > > > > > > > > will it increase the latency in data transport path? > Even > > if > > > > we're > > > > > > > > > > using some lightweight guest like RTOS or Unikernel, > > > > > > > > > > > > > > > > > > Yes, you're right. But I'm afraid that it is a matter of > > degree. > > > > > > > > > As far as we execute 'mapping' operations at every fetch > of > > > > payload, > > > > > > > > > we will see latency issue (even in your case) and if we > have > > > > some > > > > > > solution > > > > > > > > > for it, we won't see it neither in my proposal :) > > > > > > > > > > > > > > > > > > > > > > > > > Oleksandr has sent a proposal to Xen mailing list to reduce > > this > > > > kind > > > > > > > > of "mapping/unmapping" operations. So the latency caused by > > this > > > > > > behavior > > > > > > > > on Xen may eventually be eliminated, and Linux-KVM doesn't > > have > > > > that > > > > > > problem. > > > > > > > > > > > > > > Obviously, I have not yet caught up there in the discussion. > > > > > > > Which patch specifically? > > > > > > > > > > > > Can you give me the link to the discussion or patch, please? > > > > > > > > > > > > > > > > It's a RFC discussion. We have tested this RFC patch internally. > > > > > https://lists.xenproject.org/archives/html/xen-devel/2021- > > > > 07/msg01532.html > > > > > > > > I'm afraid that I miss something here, but I don't know > > > > why this proposed API will lead to eliminating 'mmap' in accessing > > > > the queued payload at every request? > > > > > > > > > > This API give Xen device model (QEMU or kvmtool) the ability to map > > > whole guest RAM in device model's address space. In this case, device > > > model doesn't need dynamic hypercall to map/unmap payload memory. > > > It can use a flat offset to access payload memory in its address > > > space directly. Just Like KVM device model does now. > > > > Thank you. Quickly, let me make sure one thing: > > This API itself doesn't do any mapping operations, right? > > So I suppose that virtio BE guest is responsible to > > 1) fetch the information about all the memory regions in FE, > > 2) call this API to allocate a big chunk of unused space in BE, > > 3) create grant/foreign mappings for FE onto this region(S) > > in the initialization/configuration of emulated virtio devices. > > > > Is this the way this API is expected to be used? > > Does Xen already has an interface for (1)? > > > > They are discussing in that thread to find a proper way to do it. > Because this API is common, both x86 and Arm should be considered. > Please ignore my above reply. I hadn't seen Oleksandr had replied this question. Sorry about it! > > -Takahiro Akashi > > > > > Before this API, When device model to map whole guest memory, will > > > severely consume the physical pages of Dom-0/Dom-D. > > > > > > > -Takahiro Akashi > > > > > > > > > > > > > > Thanks, > > > > > > -Takahiro Akashi > > > > > > > > > > > > > -Takahiro Akashi > > > > > > > > > > > > > > > > > > Specifically speaking about kvm-tool, I have a concern > > about > > > > its > > > > > > > > > > > license term; Targeting different hypervisors and > > different > > > > OSs > > > > > > > > > > > (which I assume includes RTOS's), the resultant > library > > > > should > > > > > > be > > > > > > > > > > > license permissive and GPL for kvm-tool might be an > > issue. > > > > > > > > > > > Any thoughts? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes. If user want to implement a FreeBSD device model, > but > > the > > > > > > virtio > > > > > > > > > > library is GPL. Then GPL would be a problem. If we have > > > > another > > > > > > good > > > > > > > > > > candidate, I am open to it. > > > > > > > > > > > > > > > > > > I have some candidates, particularly for vq/vring, in my > > mind: > > > > > > > > > * Open-AMP, or > > > > > > > > > * corresponding Free-BSD code > > > > > > > > > > > > > > > > > > > > > > > > > Interesting, I will look into them : ) > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Wei Chen > > > > > > > > > > > > > > > > > -Takahiro Akashi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Takahiro Akashi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] https://op-lists.linaro.org/pipermail/stratos- > > dev/2021- > > > > > > > > > > > August/000548.html > > > > > > > > > > > [2] https://marc.info/?l=xen- > devel&m=162373754705233&w=2 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx> > > > > > > > > > > > > > Sent: 2021年8月14日 23:38 > > > > > > > > > > > > > To: AKASHI Takahiro <takahiro.akashi@xxxxxxxxxx>; > > > > Stefano > > > > > > > > > Stabellini > > > > > > > > > > > <sstabellini@xxxxxxxxxx> > > > > > > > > > > > > > Cc: Alex Benn??e <alex.bennee@xxxxxxxxxx>; Stratos > > > > Mailing > > > > > > List > > > > > > > > > > > <stratos-dev@xxxxxxxxxxxxxxxxxxx>; virtio- > > dev@lists.oasis- > > > > > > open.org; > > > > > > > > > Arnd > > > > > > > > > > > Bergmann <arnd.bergmann@xxxxxxxxxx>; Viresh Kumar > > > > > > > > > > > <viresh.kumar@xxxxxxxxxx>; Stefano Stabellini > > > > > > > > > > > <stefano.stabellini@xxxxxxxxxx>; stefanha@xxxxxxxxxx; > > Jan > > > > Kiszka > > > > > > > > > > > <jan.kiszka@xxxxxxxxxxx>; Carl van Schaik > > > > > > <cvanscha@xxxxxxxxxxxxxxxx>; > > > > > > > > > > > pratikp@xxxxxxxxxxx; Srivatsa Vaddagiri > > > > <vatsa@xxxxxxxxxxxxxx>; > > > > > > Jean- > > > > > > > > > > > Philippe Brucker <jean-philippe@xxxxxxxxxx>; Mathieu > > Poirier > > > > > > > > > > > <mathieu.poirier@xxxxxxxxxx>; Wei Chen > > <Wei.Chen@xxxxxxx>; > > > > > > Oleksandr > > > > > > > > > > > Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>; Bertrand > > Marquis > > > > > > > > > > > <Bertrand.Marquis@xxxxxxx>; Artem Mygaiev > > > > > > <Artem_Mygaiev@xxxxxxxx>; > > > > > > > > > Julien > > > > > > > > > > > Grall <julien@xxxxxxx>; Juergen Gross > <jgross@xxxxxxxx>; > > > > Paul > > > > > > Durrant > > > > > > > > > > > <paul@xxxxxxx>; Xen Devel <xen-devel@xxxxxxxxxxxxx> > > > > > > > > > > > > > Subject: Re: Enabling hypervisor agnosticism for > > VirtIO > > > > > > backends > > > > > > > > > > > > > > > > > > > > > > > > > > Hello, all. > > > > > > > > > > > > > > > > > > > > > > > > > > Please see some comments below. And sorry for the > > > > possible > > > > > > format > > > > > > > > > > > issues. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Aug 11, 2021 at 9:27 AM AKASHI Takahiro > > > > > > > > > > > <mailto:takahiro.akashi@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > On Wed, Aug 04, 2021 at 12:20:01PM -0700, > Stefano > > > > > > Stabellini > > > > > > > > > wrote: > > > > > > > > > > > > > > > CCing people working on Xen+VirtIO and IOREQs. > > Not > > > > > > trimming > > > > > > > > > the > > > > > > > > > > > original > > > > > > > > > > > > > > > email to let them read the full context. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My comments below are related to a potential > Xen > > > > > > > > > implementation, > > > > > > > > > > > not > > > > > > > > > > > > > > > because it is the only implementation that > > matters, > > > > but > > > > > > > > > because it > > > > > > > > > > > is > > > > > > > > > > > > > > > the one I know best. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please note that my proposal (and hence the > > working > > > > > > prototype)[1] > > > > > > > > > > > > > > is based on Xen's virtio implementation (i.e. > > IOREQ) > > > > and > > > > > > > > > > > particularly > > > > > > > > > > > > > > EPAM's virtio-disk application (backend server). > > > > > > > > > > > > > > It has been, I believe, well generalized but is > > still > > > > a > > > > > > bit > > > > > > > > > biased > > > > > > > > > > > > > > toward this original design. > > > > > > > > > > > > > > > > > > > > > > > > > > > > So I hope you like my approach :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] https://op- > lists.linaro.org/pipermail/stratos- > > > > > > dev/2021- > > > > > > > > > > > August/000546.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > Let me take this opportunity to explain a bit > more > > > > about > > > > > > my > > > > > > > > > approach > > > > > > > > > > > below. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Also, please see this relevant email thread: > > > > > > > > > > > > > > > https://marc.info/?l=xen- > > devel&m=162373754705233&w=2 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, 4 Aug 2021, Alex Bennée wrote: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One of the goals of Project Stratos is to > > enable > > > > > > hypervisor > > > > > > > > > > > agnostic > > > > > > > > > > > > > > > > backends so we can enable as much re-use of > > code > > > > as > > > > > > possible > > > > > > > > > and > > > > > > > > > > > avoid > > > > > > > > > > > > > > > > repeating ourselves. This is the flip side > of > > the > > > > > > front end > > > > > > > > > > > where > > > > > > > > > > > > > > > > multiple front-end implementations are > > required - > > > > one > > > > > > per OS, > > > > > > > > > > > assuming > > > > > > > > > > > > > > > > you don't just want Linux guests. The > > resultant > > > > guests > > > > > > are > > > > > > > > > > > trivially > > > > > > > > > > > > > > > > movable between hypervisors modulo any > > abstracted > > > > > > paravirt > > > > > > > > > type > > > > > > > > > > > > > > > > interfaces. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In my original thumb nail sketch of a > solution > > I > > > > > > envisioned > > > > > > > > > > > vhost-user > > > > > > > > > > > > > > > > daemons running in a broadly POSIX like > > > > environment. > > > > > > The > > > > > > > > > > > interface to > > > > > > > > > > > > > > > > the daemon is fairly simple requiring only > > some > > > > mapped > > > > > > > > > memory > > > > > > > > > > > and some > > > > > > > > > > > > > > > > sort of signalling for events (on Linux this > > is > > > > > > eventfd). > > > > > > > > > The > > > > > > > > > > > idea was a > > > > > > > > > > > > > > > > stub binary would be responsible for any > > > > hypervisor > > > > > > specific > > > > > > > > > > > setup and > > > > > > > > > > > > > > > > then launch a common binary to deal with the > > > > actual > > > > > > > > > virtqueue > > > > > > > > > > > requests > > > > > > > > > > > > > > > > themselves. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Since that original sketch we've seen an > > expansion > > > > in > > > > > > the > > > > > > > > > sort > > > > > > > > > > > of ways > > > > > > > > > > > > > > > > backends could be created. There is interest > > in > > > > > > > > > encapsulating > > > > > > > > > > > backends > > > > > > > > > > > > > > > > in RTOSes or unikernels for solutions like > > SCMI. > > > > There > > > > > > > > > interest > > > > > > > > > > > in Rust > > > > > > > > > > > > > > > > has prompted ideas of using the trait > > interface to > > > > > > abstract > > > > > > > > > > > differences > > > > > > > > > > > > > > > > away as well as the idea of bare-metal Rust > > > > backends. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We have a card (STR-12) called "Hypercall > > > > > > Standardisation" > > > > > > > > > which > > > > > > > > > > > > > > > > calls for a description of the APIs needed > > from > > > > the > > > > > > > > > hypervisor > > > > > > > > > > > side to > > > > > > > > > > > > > > > > support VirtIO guests and their backends. > > However > > > > we > > > > > > are > > > > > > > > > some > > > > > > > > > > > way off > > > > > > > > > > > > > > > > from that at the moment as I think we need > to > > at > > > > least > > > > > > > > > > > demonstrate one > > > > > > > > > > > > > > > > portable backend before we start codifying > > > > > > requirements. To > > > > > > > > > that > > > > > > > > > > > end I > > > > > > > > > > > > > > > > want to think about what we need for a > backend > > to > > > > > > function. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Configuration > > > > > > > > > > > > > > > > ============= > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In the type-2 setup this is typically fairly > > > > simple > > > > > > because > > > > > > > > > the > > > > > > > > > > > host > > > > > > > > > > > > > > > > system can orchestrate the various modules > > that > > > > make > > > > > > up the > > > > > > > > > > > complete > > > > > > > > > > > > > > > > system. In the type-1 case (or even type-2 > > with > > > > > > delegated > > > > > > > > > > > service VMs) > > > > > > > > > > > > > > > > we need some sort of mechanism to inform the > > > > backend > > > > > > VM > > > > > > > > > about > > > > > > > > > > > key > > > > > > > > > > > > > > > > details about the system: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - where virt queue memory is in it's > address > > > > space > > > > > > > > > > > > > > > > - how it's going to receive (interrupt) > and > > > > trigger > > > > > > (kick) > > > > > > > > > > > events > > > > > > > > > > > > > > > > - what (if any) resources the backend > needs > > to > > > > > > connect to > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Obviously you can elide over configuration > > issues > > > > by > > > > > > having > > > > > > > > > > > static > > > > > > > > > > > > > > > > configurations and baking the assumptions > into > > > > your > > > > > > guest > > > > > > > > > images > > > > > > > > > > > however > > > > > > > > > > > > > > > > this isn't scalable in the long term. The > > obvious > > > > > > solution > > > > > > > > > seems > > > > > > > > > > > to be > > > > > > > > > > > > > > > > extending a subset of Device Tree data to > user > > > > space > > > > > > but > > > > > > > > > perhaps > > > > > > > > > > > there > > > > > > > > > > > > > > > > are other approaches? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Before any virtio transactions can take > place > > the > > > > > > > > > appropriate > > > > > > > > > > > memory > > > > > > > > > > > > > > > > mappings need to be made between the FE > guest > > and > > > > the > > > > > > BE > > > > > > > > > guest. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Currently the whole of the FE guests address > > space > > > > > > needs to > > > > > > > > > be > > > > > > > > > > > visible > > > > > > > > > > > > > > > > to whatever is serving the virtio requests. > I > > can > > > > > > envision 3 > > > > > > > > > > > approaches: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * BE guest boots with memory already mapped > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This would entail the guest OS knowing > where > > in > > > > it's > > > > > > Guest > > > > > > > > > > > Physical > > > > > > > > > > > > > > > > Address space is already taken up and > > avoiding > > > > > > clashing. I > > > > > > > > > > > would assume > > > > > > > > > > > > > > > > in this case you would want a standard > > interface > > > > to > > > > > > > > > userspace > > > > > > > > > > > to then > > > > > > > > > > > > > > > > make that address space visible to the > > backend > > > > daemon. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yet another way here is that we would have well > > known > > > > > > "shared > > > > > > > > > > > memory" between > > > > > > > > > > > > > > VMs. I think that Jailhouse's ivshmem gives us > > good > > > > > > insights on > > > > > > > > > this > > > > > > > > > > > matter > > > > > > > > > > > > > > and that it can even be an alternative for > > hypervisor- > > > > > > agnostic > > > > > > > > > > > solution. > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Please note memory regions in ivshmem appear as > a > > PCI > > > > > > device > > > > > > > > > and > > > > > > > > > > > can be > > > > > > > > > > > > > > mapped locally.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > I want to add this shared memory aspect to my > > virtio- > > > > proxy, > > > > > > but > > > > > > > > > > > > > > the resultant solution would eventually look > > similar > > > > to > > > > > > ivshmem. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * BE guests boots with a hypervisor handle > to > > > > memory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The BE guest is then free to map the FE's > > memory > > > > to > > > > > > where > > > > > > > > > it > > > > > > > > > > > wants in > > > > > > > > > > > > > > > > the BE's guest physical address space. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I cannot see how this could work for Xen. > There > > is > > > > no > > > > > > "handle" > > > > > > > > > to > > > > > > > > > > > give > > > > > > > > > > > > > > > to the backend if the backend is not running > in > > dom0. > > > > So > > > > > > for > > > > > > > > > Xen I > > > > > > > > > > > think > > > > > > > > > > > > > > > the memory has to be already mapped > > > > > > > > > > > > > > > > > > > > > > > > > > > > In Xen's IOREQ solution (virtio-blk), the > > following > > > > > > information > > > > > > > > > is > > > > > > > > > > > expected > > > > > > > > > > > > > > to be exposed to BE via Xenstore: > > > > > > > > > > > > > > (I know that this is a tentative approach > though.) > > > > > > > > > > > > > > - the start address of configuration space > > > > > > > > > > > > > > - interrupt number > > > > > > > > > > > > > > - file path for backing storage > > > > > > > > > > > > > > - read-only flag > > > > > > > > > > > > > > And the BE server have to call a particular > > hypervisor > > > > > > interface > > > > > > > > > to > > > > > > > > > > > > > > map the configuration space. > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, Xenstore was chosen as a simple way to pass > > > > > > configuration > > > > > > > > > info to > > > > > > > > > > > the backend running in a non-toolstack domain. > > > > > > > > > > > > > I remember, there was a wish to avoid using > Xenstore > > in > > > > > > Virtio > > > > > > > > > backend > > > > > > > > > > > itself if possible, so for non-toolstack domain, this > > could > > > > done > > > > > > with > > > > > > > > > > > adjusting devd (daemon that listens for devices and > > launches > > > > > > backends) > > > > > > > > > > > > > to read backend configuration from the Xenstore > > anyway > > > > and > > > > > > pass it > > > > > > > > > to > > > > > > > > > > > the backend via command line arguments. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, in current PoC code we're using xenstore to > pass > > > > device > > > > > > > > > > > configuration. > > > > > > > > > > > > We also designed a static device configuration parse > > > > method > > > > > > for > > > > > > > > > Dom0less > > > > > > > > > > > or > > > > > > > > > > > > other scenarios don't have xentool. yes, it's from > > device > > > > > > model > > > > > > > > > command > > > > > > > > > > > line > > > > > > > > > > > > or a config file. > > > > > > > > > > > > > > > > > > > > > > > > > But, if ... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In my approach (virtio-proxy), all those Xen (or > > > > > > hypervisor)- > > > > > > > > > > > specific > > > > > > > > > > > > > > stuffs are contained in virtio-proxy, yet > another > > VM, > > > > to > > > > > > hide > > > > > > > > > all > > > > > > > > > > > details. > > > > > > > > > > > > > > > > > > > > > > > > > > ... the solution how to overcome that is already > > found > > > > and > > > > > > proven > > > > > > > > > to > > > > > > > > > > > work then even better. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > # My point is that a "handle" is not mandatory > for > > > > > > executing > > > > > > > > > mapping. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and the mapping probably done by the > > > > > > > > > > > > > > > toolstack (also see below.) Or we would have > to > > > > invent a > > > > > > new > > > > > > > > > Xen > > > > > > > > > > > > > > > hypervisor interface and Xen virtual machine > > > > privileges > > > > > > to > > > > > > > > > allow > > > > > > > > > > > this > > > > > > > > > > > > > > > kind of mapping. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we run the backend in Dom0 that we have no > > > > problems > > > > > > of > > > > > > > > > course. > > > > > > > > > > > > > > > > > > > > > > > > > > > > One of difficulties on Xen that I found in my > > approach > > > > is > > > > > > that > > > > > > > > > > > calling > > > > > > > > > > > > > > such hypervisor intefaces (registering IOREQ, > > mapping > > > > > > memory) is > > > > > > > > > > > only > > > > > > > > > > > > > > allowed on BE servers themselvies and so we will > > have > > > > to > > > > > > extend > > > > > > > > > > > those > > > > > > > > > > > > > > interfaces. > > > > > > > > > > > > > > This, however, will raise some concern on > security > > and > > > > > > privilege > > > > > > > > > > > distribution > > > > > > > > > > > > > > as Stefan suggested. > > > > > > > > > > > > > > > > > > > > > > > > > > We also faced policy related issues with Virtio > > backend > > > > > > running in > > > > > > > > > > > other than Dom0 domain in a "dummy" xsm mode. In our > > target > > > > > > system we > > > > > > > > > run > > > > > > > > > > > the backend in a driver > > > > > > > > > > > > > domain (we call it DomD) where the underlying H/W > > > > resides. > > > > > > We > > > > > > > > > trust it, > > > > > > > > > > > so we wrote policy rules (to be used in "flask" xsm > mode) > > to > > > > > > provide > > > > > > > > > it > > > > > > > > > > > with a little bit more privileges than a simple DomU > had. > > > > > > > > > > > > > Now it is permitted to issue device-model, > resource > > and > > > > > > memory > > > > > > > > > > > mappings, etc calls. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To activate the mapping will > > > > > > > > > > > > > > > > require some sort of hypercall to the > > hypervisor. > > > > I > > > > > > can see > > > > > > > > > two > > > > > > > > > > > options > > > > > > > > > > > > > > > > at this point: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - expose the handle to userspace for > > > > daemon/helper > > > > > > to > > > > > > > > > trigger > > > > > > > > > > > the > > > > > > > > > > > > > > > > mapping via existing hypercall > interfaces. > > If > > > > > > using a > > > > > > > > > helper > > > > > > > > > > > you > > > > > > > > > > > > > > > > would have a hypervisor specific one to > > avoid > > > > the > > > > > > daemon > > > > > > > > > > > having to > > > > > > > > > > > > > > > > care too much about the details or push > > that > > > > > > complexity > > > > > > > > > into > > > > > > > > > > > a > > > > > > > > > > > > > > > > compile time option for the daemon which > > would > > > > > > result in > > > > > > > > > > > different > > > > > > > > > > > > > > > > binaries although a common source base. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - expose a new kernel ABI to abstract the > > > > hypercall > > > > > > > > > > > differences away > > > > > > > > > > > > > > > > in the guest kernel. In this case the > > > > userspace > > > > > > would > > > > > > > > > > > essentially > > > > > > > > > > > > > > > > ask for an abstract "map guest N memory > to > > > > > > userspace > > > > > > > > > ptr" > > > > > > > > > > > and let > > > > > > > > > > > > > > > > the kernel deal with the different > > hypercall > > > > > > interfaces. > > > > > > > > > > > This of > > > > > > > > > > > > > > > > course assumes the majority of BE guests > > would > > > > be > > > > > > Linux > > > > > > > > > > > kernels and > > > > > > > > > > > > > > > > leaves the bare-metal/unikernel > approaches > > to > > > > > > their own > > > > > > > > > > > devices. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Operation > > > > > > > > > > > > > > > > ========= > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The core of the operation of VirtIO is > fairly > > > > simple. > > > > > > Once > > > > > > > > > the > > > > > > > > > > > > > > > > vhost-user feature negotiation is done it's > a > > case > > > > of > > > > > > > > > receiving > > > > > > > > > > > update > > > > > > > > > > > > > > > > events and parsing the resultant virt queue > > for > > > > data. > > > > > > The > > > > > > > > > vhost- > > > > > > > > > > > user > > > > > > > > > > > > > > > > specification handles a bunch of setup > before > > that > > > > > > point, > > > > > > > > > mostly > > > > > > > > > > > to > > > > > > > > > > > > > > > > detail where the virt queues are set up FD's > > for > > > > > > memory and > > > > > > > > > > > event > > > > > > > > > > > > > > > > communication. This is where the envisioned > > stub > > > > > > process > > > > > > > > > would > > > > > > > > > > > be > > > > > > > > > > > > > > > > responsible for getting the daemon up and > > ready to > > > > run. > > > > > > This > > > > > > > > > is > > > > > > > > > > > > > > > > currently done inside a big VMM like QEMU > but > > I > > > > > > suspect a > > > > > > > > > modern > > > > > > > > > > > > > > > > approach would be to use the rust-vmm vhost > > crate. > > > > It > > > > > > would > > > > > > > > > then > > > > > > > > > > > either > > > > > > > > > > > > > > > > communicate with the kernel's abstracted ABI > > or be > > > > re- > > > > > > > > > targeted > > > > > > > > > > > as a > > > > > > > > > > > > > > > > build option for the various hypervisors. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One thing I mentioned before to Alex is that > Xen > > > > doesn't > > > > > > have > > > > > > > > > VMMs > > > > > > > > > > > the > > > > > > > > > > > > > > > way they are typically envisioned and > described > > in > > > > other > > > > > > > > > > > environments. > > > > > > > > > > > > > > > Instead, Xen has IOREQ servers. Each of them > > > > connects > > > > > > > > > > > independently to > > > > > > > > > > > > > > > Xen via the IOREQ interface. E.g. today > multiple > > > > QEMUs > > > > > > could > > > > > > > > > be > > > > > > > > > > > used as > > > > > > > > > > > > > > > emulators for a single Xen VM, each of them > > > > connecting > > > > > > to Xen > > > > > > > > > > > > > > > independently via the IOREQ interface. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The component responsible for starting a > daemon > > > > and/or > > > > > > setting > > > > > > > > > up > > > > > > > > > > > shared > > > > > > > > > > > > > > > interfaces is the toolstack: the xl command > and > > the > > > > > > > > > libxl/libxc > > > > > > > > > > > > > > > libraries. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think that VM configuration management (or > > > > orchestration > > > > > > in > > > > > > > > > > > Startos > > > > > > > > > > > > > > jargon?) is a subject to debate in parallel. > > > > > > > > > > > > > > Otherwise, is there any good assumption to avoid > > it > > > > right > > > > > > now? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Oleksandr and others I CCed have been working > on > > > > ways > > > > > > for the > > > > > > > > > > > toolstack > > > > > > > > > > > > > > > to create virtio backends and setup memory > > mappings. > > > > > > They > > > > > > > > > might be > > > > > > > > > > > able > > > > > > > > > > > > > > > to provide more info on the subject. I do > think > > we > > > > miss > > > > > > a way > > > > > > > > > to > > > > > > > > > > > provide > > > > > > > > > > > > > > > the configuration to the backend and anything > > else > > > > that > > > > > > the > > > > > > > > > > > backend > > > > > > > > > > > > > > > might require to start doing its job. > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, some work has been done for the toolstack to > > handle > > > > > > Virtio > > > > > > > > > MMIO > > > > > > > > > > > devices in > > > > > > > > > > > > > general and Virtio block devices in particular. > > However, > > > > it > > > > > > has > > > > > > > > > not > > > > > > > > > > > been upstreaned yet. > > > > > > > > > > > > > Updated patches on review now: > > > > > > > > > > > > > https://lore.kernel.org/xen-devel/1621626361- > 29076- > > 1- > > > > git- > > > > > > send- > > > > > > > > > email- > > > > > > > > > > > olekstysh@xxxxxxxxx/ > > > > > > > > > > > > > > > > > > > > > > > > > > There is an additional (also important) activity > to > > > > > > improve/fix > > > > > > > > > > > foreign memory mapping on Arm which I am also involved > > in. > > > > > > > > > > > > > The foreign memory mapping is proposed to be used > > for > > > > Virtio > > > > > > > > > backends > > > > > > > > > > > (device emulators) if there is a need to run guest OS > > > > completely > > > > > > > > > > > unmodified. > > > > > > > > > > > > > Of course, the more secure way would be to use > grant > > > > memory > > > > > > > > > mapping. > > > > > > > > > > > Brietly, the main difference between them is that with > > > > foreign > > > > > > mapping > > > > > > > > > the > > > > > > > > > > > backend > > > > > > > > > > > > > can map any guest memory it wants to map, but with > > grant > > > > > > mapping > > > > > > > > > it is > > > > > > > > > > > allowed to map only what was previously granted by the > > > > frontend. > > > > > > > > > > > > > > > > > > > > > > > > > > So, there might be a problem if we want to pre-map > > some > > > > > > guest > > > > > > > > > memory > > > > > > > > > > > in advance or to cache mappings in the backend in > order > > to > > > > > > improve > > > > > > > > > > > performance (because the mapping/unmapping guest pages > > every > > > > > > request > > > > > > > > > > > requires a lot of back and forth to Xen + P2M updates). > > In a > > > > > > nutshell, > > > > > > > > > > > currently, in order to map a guest page into the > backend > > > > address > > > > > > space > > > > > > > > > we > > > > > > > > > > > need to steal a real physical page from the backend > > domain. > > > > So, > > > > > > with > > > > > > > > > the > > > > > > > > > > > said optimizations we might end up with no free memory > > in > > > > the > > > > > > backend > > > > > > > > > > > domain (see XSA-300). And what we try to achieve is to > > not > > > > waste > > > > > > a > > > > > > > > > real > > > > > > > > > > > domain memory at all by providing safe non-allocated- > yet > > (so > > > > > > unused) > > > > > > > > > > > address space for the foreign (and grant) pages to be > > mapped > > > > > > into, > > > > > > > > > this > > > > > > > > > > > enabling work implies Xen and Linux (and likely DTB > > bindings) > > > > > > changes. > > > > > > > > > > > However, as it turned out, for this to work in a > proper > > and > > > > safe > > > > > > way > > > > > > > > > some > > > > > > > > > > > prereq work needs to be done. > > > > > > > > > > > > > You can find the related Xen discussion at: > > > > > > > > > > > > > https://lore.kernel.org/xen-devel/1627489110- > 25633- > > 1- > > > > git- > > > > > > send- > > > > > > > > > email- > > > > > > > > > > > olekstysh@xxxxxxxxx/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One question is how to best handle > > notification > > > > and > > > > > > kicks. > > > > > > > > > The > > > > > > > > > > > existing > > > > > > > > > > > > > > > > vhost-user framework uses eventfd to signal > > the > > > > daemon > > > > > > > > > (although > > > > > > > > > > > QEMU > > > > > > > > > > > > > > > > is quite capable of simulating them when you > > use > > > > TCG). > > > > > > Xen > > > > > > > > > has > > > > > > > > > > > it's own > > > > > > > > > > > > > > > > IOREQ mechanism. However latency is an > > important > > > > > > factor and > > > > > > > > > > > having > > > > > > > > > > > > > > > > events go through the stub would add quite a > > lot. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yeah I think, regardless of anything else, we > > want > > > > the > > > > > > > > > backends to > > > > > > > > > > > > > > > connect directly to the Xen hypervisor. > > > > > > > > > > > > > > > > > > > > > > > > > > > > In my approach, > > > > > > > > > > > > > > a) BE -> FE: interrupts triggered by BE calling > a > > > > > > hypervisor > > > > > > > > > > > interface > > > > > > > > > > > > > > via virtio-proxy > > > > > > > > > > > > > > b) FE -> BE: MMIO to config raises events (in > > event > > > > > > channels), > > > > > > > > > > > which is > > > > > > > > > > > > > > converted to a callback to BE via > > > > virtio- > > > > > > proxy > > > > > > > > > > > > > > (Xen's event channel is > internnally > > > > > > implemented by > > > > > > > > > > > interrupts.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't know what "connect directly" means here, > > but > > > > > > sending > > > > > > > > > > > interrupts > > > > > > > > > > > > > > to the opposite side would be best efficient. > > > > > > > > > > > > > > Ivshmem, I suppose, takes this approach by > > utilizing > > > > PCI's > > > > > > msi-x > > > > > > > > > > > mechanism. > > > > > > > > > > > > > > > > > > > > > > > > > > Agree that MSI would be more efficient than SPI... > > > > > > > > > > > > > At the moment, in order to notify the frontend, > the > > > > backend > > > > > > issues > > > > > > > > > a > > > > > > > > > > > specific device-model call to query Xen to inject a > > > > > > corresponding SPI > > > > > > > > > to > > > > > > > > > > > the guest. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Could we consider the kernel internally > > converting > > > > > > IOREQ > > > > > > > > > > > messages from > > > > > > > > > > > > > > > > the Xen hypervisor to eventfd events? Would > > this > > > > scale > > > > > > with > > > > > > > > > > > other kernel > > > > > > > > > > > > > > > > hypercall interfaces? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So any thoughts on what directions are worth > > > > > > experimenting > > > > > > > > > with? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One option we should consider is for each > > backend to > > > > > > connect > > > > > > > > > to > > > > > > > > > > > Xen via > > > > > > > > > > > > > > > the IOREQ interface. We could generalize the > > IOREQ > > > > > > interface > > > > > > > > > and > > > > > > > > > > > make it > > > > > > > > > > > > > > > hypervisor agnostic. The interface is really > > trivial > > > > and > > > > > > easy > > > > > > > > > to > > > > > > > > > > > add. > > > > > > > > > > > > > > > > > > > > > > > > > > > > As I said above, my proposal does the same thing > > that > > > > you > > > > > > > > > mentioned > > > > > > > > > > > here :) > > > > > > > > > > > > > > The difference is that I do call hypervisor > > interfaces > > > > via > > > > > > > > > virtio- > > > > > > > > > > > proxy. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The only Xen-specific part is the notification > > > > mechanism, > > > > > > > > > which is > > > > > > > > > > > an > > > > > > > > > > > > > > > event channel. If we replaced the event > channel > > with > > > > > > something > > > > > > > > > > > else the > > > > > > > > > > > > > > > interface would be generic. See: > > > > > > > > > > > > > > > https://gitlab.com/xen-project/xen/- > > > > > > > > > > > /blob/staging/xen/include/public/hvm/ioreq.h#L52 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think that translating IOREQs to > eventfd > > in > > > > the > > > > > > kernel > > > > > > > > > is > > > > > > > > > > > a > > > > > > > > > > > > > > > good idea: if feels like it would be extra > > > > complexity > > > > > > and that > > > > > > > > > the > > > > > > > > > > > > > > > kernel shouldn't be involved as this is a > > backend- > > > > > > hypervisor > > > > > > > > > > > interface. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Given that we may want to implement BE as a > bare- > > metal > > > > > > > > > application > > > > > > > > > > > > > > as I did on Zephyr, I don't think that the > > translation > > > > > > would not > > > > > > > > > be > > > > > > > > > > > > > > a big issue, especially on RTOS's. > > > > > > > > > > > > > > It will be some kind of abstraction layer of > > interrupt > > > > > > handling > > > > > > > > > > > > > > (or nothing but a callback mechanism). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Also, eventfd is very Linux-centric and we are > > > > trying to > > > > > > > > > design an > > > > > > > > > > > > > > > interface that could work well for RTOSes too. > > If we > > > > > > want to > > > > > > > > > do > > > > > > > > > > > > > > > something different, both OS-agnostic and > > > > hypervisor- > > > > > > agnostic, > > > > > > > > > > > perhaps > > > > > > > > > > > > > > > we could design a new interface. One that > could > > be > > > > > > > > > implementable > > > > > > > > > > > in the > > > > > > > > > > > > > > > Xen hypervisor itself (like IOREQ) and of > course > > any > > > > > > other > > > > > > > > > > > hypervisor > > > > > > > > > > > > > > > too. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > There is also another problem. IOREQ is > probably > > not > > > > be > > > > > > the > > > > > > > > > only > > > > > > > > > > > > > > > interface needed. Have a look at > > > > > > > > > > > > > > > https://marc.info/?l=xen- > > devel&m=162373754705233&w=2. > > > > > > Don't we > > > > > > > > > > > also need > > > > > > > > > > > > > > > an interface for the backend to inject > > interrupts > > > > into > > > > > > the > > > > > > > > > > > frontend? And > > > > > > > > > > > > > > > if the backend requires dynamic memory > mappings > > of > > > > > > frontend > > > > > > > > > pages, > > > > > > > > > > > then > > > > > > > > > > > > > > > we would also need an interface to map/unmap > > domU > > > > pages. > > > > > > > > > > > > > > > > > > > > > > > > > > > > My proposal document might help here; All the > > > > interfaces > > > > > > > > > required > > > > > > > > > > > for > > > > > > > > > > > > > > virtio-proxy (or hypervisor-related interfaces) > > are > > > > listed > > > > > > as > > > > > > > > > > > > > > RPC protocols :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > These interfaces are a lot more problematic > than > > > > IOREQ: > > > > > > IOREQ > > > > > > > > > is > > > > > > > > > > > tiny > > > > > > > > > > > > > > > and self-contained. It is easy to add anywhere. > > A > > > > new > > > > > > > > > interface to > > > > > > > > > > > > > > > inject interrupts or map pages is more > difficult > > to > > > > > > manage > > > > > > > > > because > > > > > > > > > > > it > > > > > > > > > > > > > > > would require changes scattered across the > > various > > > > > > emulators. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Exactly. I have no confident yet that my > approach > > will > > > > > > also > > > > > > > > > apply > > > > > > > > > > > > > > to other hypervisors than Xen. > > > > > > > > > > > > > > Technically, yes, but whether people can accept > it > > or > > > > not > > > > > > is a > > > > > > > > > > > different > > > > > > > > > > > > > > matter. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > -Takahiro Akashi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > > > > > Oleksandr Tyshchenko
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |