[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for physical address based hypercalls



On 29.09.22 13:32, Jan Beulich wrote:
On 28.09.2022 15:03, Juergen Gross wrote:
On 28.09.22 14:06, Jan Beulich wrote:
On 28.09.2022 12:58, Andrew Cooper wrote:
On 28/09/2022 11:38, Jan Beulich wrote:
As an alternative I'd like to propose the introduction of a bit (or multiple
ones, see below) augmenting the hypercall number, to control the flavor of the
buffers used for every individual hypercall.  This would likely involve the
introduction of a new hypercall page (or multiple ones if more than one bit is
to be used), to retain the present abstraction where it is the hypervisor which
actually fills these pages.

There are other concerns which need to be accounted for.

Encrypted VMs cannot use a hypercall page; they don't trust the
hypervisor in the first place, and the hypercall page is (specifically)
code injection.  So the sensible new ABI cannot depend on a hypercall table.

I don't think there's a dependency, and I think there never really has been.
We've been advocating for its use, but we've not enforced that anywhere, I
don't think.

Also, rewriting the hypercall page on migrate turns out not to have been
the most clever idea, and only works right now because the instructions
are the same length in the variations for each mode.

Also continuations need to change to avoid userspace liveness problems,
and existing hypercalls that we do have need splitting between things
which are actually privileged operations (within the guest context) and
things which are logical control operations, so the kernel can expose
the latter to userspace without retaining the gaping root hole which is
/dev/xen/privcmd, and a blocker to doing UEFI Secureboot.

So yes, starting some new clean(er) interface from hypercall 64 is the
plan, but it very much does not want to be a simple mirror of the
existing 0-63 with a differing calling convention.

All of these look like orthogonal problems to me. That's likely all
relevant for, as I think you've been calling it, ABI v2, but shouldn't
hinder our switching to a physical address based hypercall model.
Otherwise I'm afraid we'll never make any progress in that direction.

What about an alternative model allowing to use most of the current
hypercalls unmodified?

We could add a new hypercall for registering hypercall buffers via
virtual address, physical address, and size of the buffers (kind of a
software TLB).

Why not?

The buffer table would want to be physically addressed
by the hypercall, of course.

I'm not convinced of this, as it would break uniformity of the hypercall
interfaces. IOW in the hypervisor we then wouldn't be able to use
copy_from_guest() to retrieve the contents. Perhaps this simply shouldn't
be a table, but a hypercall not involving any buffers (i.e. every
discontiguous piece would need registering separately). I expect such a
software TLB wouldn't have many entries, so needing to use a couple of
hypercalls shouldn't be a major issue.

Fine with me.


It might be interesting to have this table per vcpu (it should be
allowed to use the same table for multiple vcpus) in order to speed
up finding translation entries of percpu buffers.

Yes. Perhaps insertion and purging could simply be two new VCPUOP_*.

Again fine with me.

As a prereq I think we'd need to sort the cross-vCPU accessing of guest
data, coincidentally pointed out in a post-commit-message remark in
https://lists.xen.org/archives/html/xen-devel/2022-09/msg01761.html. The
subject vCPU isn't available in copy_to_user_hvm(), which is where I'd
expect the TLB lookup to occur (while assuming handles point at globally
mapped space _might_ be okay, using the wrong vCPU's TLB surely isn't).

Any per-vcpu buffer should only be used by the respective vcpu.

Any hypercall buffer being addressed virtually could first tried to
be found via the SW-TLB. This wouldn't require any changes for most
of the hypercall interfaces. Only special cases with very large buffers
might need indirect variants (like Jan said: via GFN lists, which could
be passed in registered buffers).

Encrypted guests would probably want to use static percpu buffers in
order to avoid switching the encryption state of the buffers all the
time.

An unencrypted PVH/HVM domain (e.g. PVH dom0) could just define one
giant buffer with the domain's memory size via the physical memory
mapping of the kernel. All kmalloc() addresses would be in that region.

That's Linux-centric. I'm not convinced all OSes maintain a directmap.
Without such, switching to this model might end up quite intrusive on
the OS side.

This model is especially interesting for dom0. The majority of installations
is running a Linux dom0 AFAIK, so having an easy way to speed this case up
is a big plus.

Thinking of Linux, we'd need a 2nd range covering the data part of the
kernel image.

Probably, yes.

Further this still wouldn't (afaics) pave a reasonable route towards
dealing with privcmd-invoked hypercalls.

Today the hypercall buffers are all allocated via the privcmd driver. It
should be fairly easy to add an ioctl to get the buffer's kernel address
instead of using the user address.

Multi-page buffers might be problematic, though, so either we need to
have special variants for hypercalls with such buffers, or we are just
falling back to use virtual addresses for the cases where no guest
physically contiguous buffer could be allocated (doesn't apply to
encrypted guests, of course, as those need to have large enough buffers
anyway).

Finally - in how far are we concerned of PV guests using linear
addresses for hypercall buffers? I ask because I don't think the model
lends itself to use also for the PV guest interfaces.

Good question.

As long as we support PV guests we can't drop support for linear addresses
IMO. So the question is whether we are fine with PV guests not using the
pre-registered buffers, or if we want to introduce an interface for PV
guests using GFNs instead of MFNs.

Juergen

A buffer address not found would need to be translated like today (and
fail for an encrypted guest).

Thoughts?


Juergen


Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.