[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 10/12] xen/x86: call hypercall handlers via switch statement



On 21.10.21 16:41, Jan Beulich wrote:
On 15.10.2021 14:51, Juergen Gross wrote:
Instead of using a function table use the generated switch statement
macros for calling the appropriate hypercall handlers.

This is beneficial to performance and avoids speculation issues.

With calling the handlers using the correct number of parameters now
it is possible to do the parameter register clobbering in the NDEBUG
case after returning from the handler. This in turn removes the only
users of hypercall_args_table[] which can be removed now.

"removed" reads misleading to me: You really replace it by new tables,
using script-generated initializers. Also it looks like you're doubling
the data, as the same sets were previously used by pv64/hvm64 and
pv32/hvm32 respectively.

Yes, I'll change that paragraph.

Regarding having 4 tables on x86 now: merging the pv/hvm tables would be
possible, but this would add some complexity to the script generating
the tables (it should test whether the number of parameters of pv and
hvm match). As the tables are present in debug build only I don't think
this is a real issue.


--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -108,56 +108,10 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
          return compat_physdev_op(cmd, arg);
  }
-#define HYPERCALL(x) \
-    [ __HYPERVISOR_ ## x ] = { (hypercall_fn_t *) do_ ## x,  \
-                               (hypercall_fn_t *) do_ ## x }
-
-#define HVM_CALL(x)                                          \
-    [ __HYPERVISOR_ ## x ] = { (hypercall_fn_t *) hvm_ ## x, \
-                               (hypercall_fn_t *) hvm_ ## x }
-
-#define COMPAT_CALL(x)                                       \
-    [ __HYPERVISOR_ ## x ] = { (hypercall_fn_t *) do_ ## x,  \
-                               (hypercall_fn_t *) compat_ ## x }
-
-static const struct {
-    hypercall_fn_t *native, *compat;
-} hvm_hypercall_table[] = {
-    HVM_CALL(memory_op),
-    COMPAT_CALL(multicall),
-#ifdef CONFIG_GRANT_TABLE
-    HVM_CALL(grant_table_op),
-#endif
-    HYPERCALL(vm_assist),
-    COMPAT_CALL(vcpu_op),
-    HVM_CALL(physdev_op),
-    COMPAT_CALL(xen_version),
-    HYPERCALL(console_io),
-    HYPERCALL(event_channel_op),
-    COMPAT_CALL(sched_op),
-    COMPAT_CALL(set_timer_op),
-    COMPAT_CALL(xsm_op),
-    HYPERCALL(hvm_op),
-    HYPERCALL(sysctl),
-    HYPERCALL(domctl),
-#ifdef CONFIG_ARGO
-    COMPAT_CALL(argo_op),
-#endif
-    COMPAT_CALL(platform_op),
-#ifdef CONFIG_PV
-    COMPAT_CALL(mmuext_op),
-#endif
-    HYPERCALL(xenpmu_op),
-    COMPAT_CALL(dm_op),
-#ifdef CONFIG_HYPFS
-    HYPERCALL(hypfs_op),
+#ifndef NDEBUG
+static unsigned char hypercall_args_64[] = hypercall_args_hvm64;
+static unsigned char hypercall_args_32[] = hypercall_args_hvm32;

Irrespective of this being debugging-only: Const?

Yes.


@@ -239,33 +176,11 @@ int hvm_hypercall(struct cpu_user_regs *regs)
          HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%lu(%lx, %lx, %lx, %lx, %lx)",
                      eax, rdi, rsi, rdx, r10, r8);
-#ifndef NDEBUG
-        /* Deliberately corrupt parameter regs not used by this hypercall. */
-        switch ( hypercall_args_table[eax].native )
-        {
-        case 0: rdi = 0xdeadbeefdeadf00dUL; fallthrough;
-        case 1: rsi = 0xdeadbeefdeadf00dUL; fallthrough;
-        case 2: rdx = 0xdeadbeefdeadf00dUL; fallthrough;
-        case 3: r10 = 0xdeadbeefdeadf00dUL; fallthrough;
-        case 4: r8 = 0xdeadbeefdeadf00dUL;
-        }
-#endif
-
-        regs->rax = hvm_hypercall_table[eax].native(rdi, rsi, rdx, r10, r8);
+        call_handlers_hvm64(eax, regs->rax, rdi, rsi, rdx, r10, r8);
#ifndef NDEBUG
-        if ( !curr->hcall_preempted )
-        {
-            /* Deliberately corrupt parameter regs used by this hypercall. */
-            switch ( hypercall_args_table[eax].native )
-            {
-            case 5: regs->r8  = 0xdeadbeefdeadf00dUL; fallthrough;
-            case 4: regs->r10 = 0xdeadbeefdeadf00dUL; fallthrough;
-            case 3: regs->rdx = 0xdeadbeefdeadf00dUL; fallthrough;
-            case 2: regs->rsi = 0xdeadbeefdeadf00dUL; fallthrough;
-            case 1: regs->rdi = 0xdeadbeefdeadf00dUL;
-            }
-        }
+        if ( !curr->hcall_preempted && regs->rax != -ENOSYS )
+            clobber_regs(regs, hypercall_args_64[eax]);

I'm not fundamentally opposed, but sadly -ENOSYS comes back also in undue
situations, e.g. various hypercalls still produce this for "unknown
sub-function". Hence the weakened clobbering wants at least mentioning,
perhaps also justifying, in the description.

Okay.


@@ -55,4 +42,34 @@ compat_common_vcpu_op(
#endif /* CONFIG_COMPAT */ +#ifndef NDEBUG

Hmm, I was actuall hoping for the conditional to actually live ...

+static inline void clobber_regs(struct cpu_user_regs *regs,
+                                unsigned int nargs)
+{

... here and ...

+    /* Deliberately corrupt used parameter regs. */
+    switch ( nargs )
+    {
+    case 5: regs->r8  = 0xdeadbeefdeadf00dUL; fallthrough;
+    case 4: regs->r10 = 0xdeadbeefdeadf00dUL; fallthrough;
+    case 3: regs->rdx = 0xdeadbeefdeadf00dUL; fallthrough;
+    case 2: regs->rsi = 0xdeadbeefdeadf00dUL; fallthrough;
+    case 1: regs->rdi = 0xdeadbeefdeadf00dUL;
+    }
+}
+
+static inline void clobber_regs32(struct cpu_user_regs *regs,
+                                  unsigned int nargs)
+{

... here, such that the conditionals in the .c files could go away
altogether.

I didn't do that in order to be able to have the tables with the
number of parameters inside #ifndef NDEBUG sections.

I think I can change that by using a macro for reading the table
values.


+    /* Deliberately corrupt used parameter regs. */
+    switch ( nargs )
+    {
+    case 5: regs->edi = 0xdeadf00dUL; fallthrough;
+    case 4: regs->esi = 0xdeadf00dUL; fallthrough;
+    case 3: regs->edx = 0xdeadf00dUL; fallthrough;
+    case 2: regs->ecx = 0xdeadf00dUL; fallthrough;
+    case 1: regs->ebx = 0xdeadf00dUL;

No need for the UL suffixes here afaics; U ones may want to be there.

Okay.

Overall, besides these mainly cosmetic aspects the main thing missing
is an approach to prioritize the handful most frequently used functions,
for them to be pulled out of the switch() so we don't depend on the
compiler's choice for the order of comparisons done.

I have already prepared that step by generating the complete call
sequence, so any change for prioritizing some hypercalls can be local to
the generator script and the used input data.

The main question is how to do that. I've collected some hypercall
statistics data for PV and PVH guests running some simple tests (once a
build of the Xen hypervisor, and once a scp of a large file). The data
is split between guest and dom0 (PV) counts. There is no clear "winner"
which hypercall should be fastest, but several hypercalls are clearly
not important.

Here is the data:

PV-hypercall    PV-guest build   PV-guest scp    dom0 build     dom0 scp
mmu_update           186175729           2865         20936        33725
stack_switch           1273311          62381        108589       270764
multicall              2182803             50           302          524
update_va_mapping       571868             10            60           80
xen_version              73061            850           859         5432
grant_table_op               0              0         35557       139110
iret                  75673006         484132        268157       757958
vcpu_op                 453037          71199        138224       334988
set_segment_base       1650249          62387        108645       270823
mmuext_op             11225681            188          7239         3426
sched_op                280153         134645         70729       137943
event_channel_op        192327          66204         71409       214191
physdev_op                   0              0          7721         4315
(the dom0 values are for the guest running the build or scp test, so
dom0 acting as backend)

HVM-hypercall   PVH-guest build    PVH-guest scp
vcpu_op                  277684             2324
event_channel_op         350233            57383
(the related dom0 counter values are in the same range as with the test
running in the PV guest)

It should be noted that during boot of the guests the numbers for the PV
guest are more like the ones for the build test with the exception of
iret and sched_op being higher, while for PVH sched_op is by far the
most often used hypercall.

I'm not sure how to translate those numbers into a good algorithm for
generating the call sequence.

I could add priorities to each hypercall in hypercall-defs.c and have a
cascade of if (likely(foo)) call_foo; else if (likely(bla)) ... else
switch(rest).

Or I could have groups of hypercalls with a priority for each group and:

mask = 1ULL << num;
if (likely(mask & prio_1_mask)) switch(num) ...
else if (likely(mask & prio_2_mask)) switch (num) ...
...
else switch (num) ...

Or I could combine those approaches using the mask variant for cases of
multiple entries having the same priority and the direct call variant
for the cases of only a single entry having a specific priority.

And then there is the problem to set the priorities (fairly simple for
HVM, PV is more diffcult).


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.