[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos



On Sat, 2 Oct 2021, Oleksandr Tyshchenko wrote:
> On Sat, Oct 2, 2021 at 2:58 AM Stefano Stabellini <sstabellini@xxxxxxxxxx> 
> wrote:
> 
> Hi Stefano, all
> 
> [Sorry for the possible format issues]
> [I have CCed Julien]
> 
> 
>       On Tue, 28 Sep 2021, Oleksandr Tyshchenko wrote:
>       > On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini 
> <sstabellini@xxxxxxxxxx> wrote:
>       >
>       > Hi Stefano, all
>       >
>       > [Sorry for the possible format issues]
>       >
>       >
>       >       On Mon, 27 Sep 2021, Christopher Clark wrote:
>       >       > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev 
> <stratos-dev@xxxxxxxxxxxxxxxxxxx> wrote:
>       >       >
>       >       >       Marek Marczykowski-Górecki 
> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> writes:
>       >       >
>       >       >       > [[PGP Signed Part:Undecided]]
>       >       >       > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée 
> wrote:
>       >       >       >> Hi,
>       >       >       >
>       >       >       > Hi,
>       >       >       >
>       >       >       >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 
> ([STR-57])
>       >       >       >> 
> ───────────────────────────────────────────────────────────────
>       >       >       >>
>       >       >       >>   Currently the foreign memory mapping support only 
> works for dom0 due
>       >       >       >>   to reference counting issues. If we are to support 
> backends running in
>       >       >       >>   their own domains this will need to get fixed.
>       >       >       >>
>       >       >       >>   Estimate: 8w
>       >       >       >>
>       >       >       >>
>       >       >       >> [STR-57] <https://linaro.atlassian.net/browse/STR-57>
>       >       >       >
>       >       >       > I'm pretty sure it was discussed before, but I can't 
> find relevant
>       >       >       > (part of) thread right now: does your model assumes 
> the backend (running
>       >       >       > outside of dom0) will gain ability to map (or access 
> in other way)
>       >       >       > _arbitrary_ memory page of a frontend domain? Or 
> worse: any domain?
>       >       >
>       >       >       The aim is for some DomU's to host backends for other 
> DomU's instead of
>       >       >       all backends being in Dom0. Those backend DomU's would 
> have to be
>       >       >       considered trusted because as you say the default 
> memory model of VirtIO
>       >       >       is to have full access to the frontend domains memory 
> map.
>       >       >
>       >       >
>       >       > I share Marek's concern. I believe that there are Xen-based 
> systems that will want to run guests using VirtIO devices
>       without
>       >       extending
>       >       > this level of trust to the backend domains.
>       >
>       >       >From a safety perspective, it would be challenging to deploy a 
> system
>       >       with privileged backends. From a safety perspective, it would 
> be a lot
>       >       easier if the backend were unprivileged.
>       >
>       >       This is one of those times where safety and security 
> requirements are
>       >       actually aligned.
>       >
>       >
>       > Well, the foreign memory mapping has one advantage in the context of 
> Virtio use-case
>       > which is that Virtio infrastructure in Guest doesn't require any 
> modifications to run on top Xen.
>       > The only issue with foreign memory here is that Guest memory actually 
> mapped without its agreement
>       > which doesn't perfectly fit into the security model. (although there 
> is one more issue with XSA-300,
>       > but I think it will go away sooner or later, at least there are some 
> attempts to eliminate it).
>       > While the ability to map any part of Guest memory is not an issue for 
> the backend running in Dom0
>       > (which we usually trust), this will certainly violate Xen security 
> model if we want to run it in other
>       > domain, so I completely agree with the existing concern.
> 
>       Yep, that's what I was referring to.
> 
> 
>       > It was discussed before [1], but I couldn't find any decisions 
> regarding that. As I understand,
>       > the one of the possible ideas is to have some entity in Xen (PV 
> IOMMU/virtio-iommu/whatever)
>       > that works in protection mode, so it denies all foreign mapping 
> requests from the backend running in DomU
>       > by default and only allows requests with mapping which were 
> *implicitly* granted by the Guest before.
>       > For example, Xen could be informed which MMIOs hold the queue PFN and 
> notify registers
>       > (as it traps the accesses to these registers anyway) and could 
> theoretically parse the frontend request
>       > and retrieve descriptors to make a decision which GFNs are actually 
> *allowed*.
>       >
>       > I can't say for sure (sorry not familiar enough with the topic), but 
> implementing the virtio-iommu device
>       > in Xen we could probably avoid Guest modifications at all. Of course, 
> for this to work
>       > the Virtio infrastructure in Guest should use DMA API as mentioned in 
> [1].
>       >
>       > Would the “restricted foreign mapping” solution retain the Xen 
> security model and be accepted
>       > by the Xen community? I wonder, has someone already looked in this 
> direction, are there any
>       > pitfalls here or is this even feasible?
>       >
>       > [1] 
> https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@xxxxxxx/
> 
>       The discussion that went further is actually one based on the idea that
>       there is a pre-shared memory area and the frontend always passes
>       addresses from it. For ease of implementation, the pre-shared area is
>       the virtqueue itself so this approach has been called "fat virtqueue".
>       But it requires guest modifications and it probably results in
>       additional memory copies.
> 
>  
> I got it. Although we would need to map that pre-shared area anyway (I 
> presume it could be done at once during initialization), I think it
> much better than
> map arbitrary pages at runtime.

Yeah that's the idea


> If there is a way for Xen to know the pre-shared area location in advance it 
> will be able to allow mapping
> this region only and deny other attempts.
 
No, but there are patches (not yet upstream) to introduce a way to
pre-share memory regions between VMs using xl:
https://github.com/Xilinx/xen/commits/xilinx/release-2021.1?after=4bd2da58b5b008f77429007a307b658db9c0f636+104&branch=xilinx%2Frelease-2021.1

So I think it would probably be the other way around: xen/libxl
advertises on device tree (or ACPI) the presence of the pre-shared
regions to both domains. Then frontend and backend would start using it.

 
>       I am not sure if the approach you mentioned could be implemented
>       completely without frontend changes. It looks like Xen would have to
>       learn how to inspect virtqueues in order to verify implicit grants
>       without frontend changes.
> 
>  
> I looked through the virtio-iommu specification and corresponding Linux 
> driver but I am sure I don't see all the challenges and pitfalls.
> Having a limited knowledge of IOMMU infrastructure in Linux, below is just my 
> guess, which might be wrong.
> 
> 1. I think, if we want to avoid frontend changes the backend in Xen would 
> need to fully conform to the specification, I am afraid that
> besides just inspecting virtqueues, the backend needs to properly and 
> completely emulate the virtio device, handle shadow page tables, etc.
> Otherwise we might break the guest. I expect a huge amount of work to 
> implement this properly.

Yeah, I think we would want to stay away from shadow pagetables unless
we are really forced to go there.


> 2. Also, if I got the things correctly, it looks like when enabling 
> virtio-iommu, all addresses passed in requests to the virtio devices
> behind the virtio-iommu will be in guest virtual address space (IOVA). So we 
> would need to find a way for userspace (if the backend is
> IOREQ server) to translate them to guest physical addresses (IPA) via these 
> shadow page tables in the backend in front of mapping them via
> foreign memory map calls. So I expect Xen, toolstack and Linux privcmd driver 
> changes and additional complexity taking into account how the
> data structures could be accessed (data structures being continuously in 
> IOVA, could be discontinuous in IPA, indirect table descriptors,
> etc). 
> I am wondering, would it be possible to have identity IOMMU mapping (IOVA == 
> GPA) at the guest side but without bypassing an IOMMU, as we
> need the virtio-iommu frontend to send map/unmap requests, can we control 
> this behaviour somehow?
> I think this would simplify things.

None of the above looks easy. I think you are right that we would need
IOVA == GPA to make the implementation feasible and with decent
performance. But if we need a spec change, then I think Juergen's
proposal of introducing a new transport that uses grant table references
instead of GPAs is worth considering.


> 3. Also, we would probably want to have a single virtio-iommu device instance 
> per guest, so all virtio devices which belong to this guest
> will share the IOMMU mapping for the optimization purposes. For this to work 
> all virtio devices inside a guest should be attached to the
> same IOMMU domain. Probably, we could control that, but I am not 100% sure.  

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.