[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Understanding osdep_xenforeignmemory_map mmap behaviour

To: Alex Bennée <alex.bennee@xxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
From: Juergen Gross <jgross@xxxxxxxx>
Date: Wed, 24 Aug 2022 15:07:26 +0200
Cc: Viresh Kumar <viresh.kumar@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "Stratos-dev@xxxxxxxxxxxxxxxxxxx" <Stratos-dev@xxxxxxxxxxxxxxxxxxx>, "mathieu.poirier@xxxxxxxxxx" <mathieu.poirier@xxxxxxxxxx>, "christopher.w.clark@xxxxxxxxx" <christopher.w.clark@xxxxxxxxx>, "boris.ostrovsky@xxxxxxxxxx" <boris.ostrovsky@xxxxxxxxxx>, "gregkh@xxxxxxxxxxxxxxxxxxx" <gregkh@xxxxxxxxxxxxxxxxxxx>, "vincent.guittot@xxxxxxxxxx" <vincent.guittot@xxxxxxxxxx>, "olekstysh@xxxxxxxxx" <olekstysh@xxxxxxxxx>
Delivery-date: Wed, 24 Aug 2022 13:07:32 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 24.08.22 13:22, Alex Bennée wrote:


Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx> writes:

On 24/08/2022 10:19, Viresh Kumar wrote:

On 24-03-22, 06:12, Juergen Gross wrote:

For a rather long time we were using "normal" user pages for this purpose,
which were just locked into memory for doing the hypercall.

Unfortunately there have been very rare problems with that approach, as
the Linux kernel can set a user page related PTE to invalid for short
periods of time, which led to EFAULT in the hypervisor when trying to
access the hypercall data.

In Linux this can avoided only by using kernel memory, which is the
reason why the hypercall buffers are allocated and mmap()-ed through the
privcmd driver.

Hi Juergen,

I understand why we moved from user pages to kernel pages, but I don't
fully understand why we need to make two separate calls to map the
guest memory, i.e. mmap() followed by ioctl(IOCTL_PRIVCMD_MMAPBATCH).

Why aren't we doing all of it from mmap() itself ? I hacked it up to
check on it and it works fine if we do it all from mmap() itself.


As I understand it the MMAPBATCH ioctl is being treated like every other
hypercall proxy through the ioctl interface. Which makes sense from the
point of view of having a consistent interface to the hypervisor but not
from point of view of providing a consistent userspace interface for
mapping memory which doesn't care about the hypervisor details.

The privcmd_mmapbatch_v2 interface is slightly richer than what you
could expose via mmap() because it allows the handling of partial
mappings with what I presume is a per-page *err array. If you issued the
hypercall directly from the mmap() and one of the pages wasn't mapped by
the hypervisor you would have to unwind everything before returning
EFAULT to the user.

Aren't we abusing the Linux userspace ABI here ? As standard userspace
code would expect just mmap() to be enough to map the memory. Yes, the
current user, Xen itself, is adapted to make two calls, but it breaks
as soon as we want to use something that relies on Linux userspace
ABI.

For instance, in our case, where we are looking to create
hypervisor-agnostic virtio backends, the rust-vmm library [1] issues
mmap() only and expects it to work. It doesn't know it is running on a
Xen system, and it shouldn't know that as well.


Use /dev/xen/hypercall which has a sane ABI for getting "safe" memory.
privcmd is very much not sane.

In practice you'll need to use both.  /dev/xen/hypercall for getting
"safe" memory, and /dev/xen/privcmd for issuing hypercalls for now.


I'm unsure what is meant by safe memory here. privcmd_buf_mmap() looks
like it just allocates a bunch of GFP_KERNEL pages rather than
interacting with the hypervisor directly. Are these the same pages that
get used when you eventually call privcmd_ioctl_mmap_batch()?


privcmd_buf_mmap() is allocating kernel pages which are used for data being
accessed by the hypervisor when doing the hypercall later. This is a generic
interface being used for all hypercalls, not only for
privcmd_ioctl_mmap_batch().

The fact that /dev/xen/hypercall is specified by xen_privcmdbuf_dev is a
little confusing TBH.

Anyway the goal here is to provide a non-xen aware userspace with
standard userspace API to access the guests memory. Perhaps messing


This is what the Xen related libraries are meant for. Your decision to
ignore those is firing back now.

around with the semantics of the /dev/xen/[hypercall|privcmd] devices
nodes is too confusing.

Maybe we could instead:

  1. Have the Xen aware VMM ask to make the guests memory visible to the
     host kernels address space.


Urgh. This would be a major breach of the Xen security concept.

  2. When this is done explicitly create a device node to represent it 
(/dev/xen/dom-%d-mem?)
  3. Pass this new device to the non-Xen aware userspace which uses the
     standard mmap() call to make the kernel pages visible to userspace

Does that make sense?


Maybe from your point of view, but not from the Xen architectural point
of view IMHO. You are removing basically the main security advantages of
Xen by generating a kernel interface for mapping arbitrary guest memory
easily.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Follow-Ups:
- Re: Understanding osdep_xenforeignmemory_map mmap behaviour
  - From: Alex Bennée

References:
- Re: Understanding osdep_xenforeignmemory_map mmap behaviour
  - From: Viresh Kumar
- Re: Understanding osdep_xenforeignmemory_map mmap behaviour
  - From: Andrew Cooper
- Re: Understanding osdep_xenforeignmemory_map mmap behaviour
  - From: Alex Bennée

Prev by Date: Re: [RFC PATCH] libacpi: Fix cross building x86 on arm
Next by Date: Re: Understanding osdep_xenforeignmemory_map mmap behaviour
Previous by thread: Re: Understanding osdep_xenforeignmemory_map mmap behaviour
Next by thread: Re: Understanding osdep_xenforeignmemory_map mmap behaviour
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.