[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 10/12] xen/x86: call hypercall handlers via switch statement
On 02.11.21 10:54, Jan Beulich wrote: On 28.10.2021 16:32, Juergen Gross wrote:On 21.10.21 16:41, Jan Beulich wrote:On 15.10.2021 14:51, Juergen Gross wrote:Instead of using a function table use the generated switch statement macros for calling the appropriate hypercall handlers. This is beneficial to performance and avoids speculation issues. With calling the handlers using the correct number of parameters now it is possible to do the parameter register clobbering in the NDEBUG case after returning from the handler. This in turn removes the only users of hypercall_args_table[] which can be removed now."removed" reads misleading to me: You really replace it by new tables, using script-generated initializers. Also it looks like you're doubling the data, as the same sets were previously used by pv64/hvm64 and pv32/hvm32 respectively.Yes, I'll change that paragraph. Regarding having 4 tables on x86 now: merging the pv/hvm tables would be possible, but this would add some complexity to the script generating the tables (it should test whether the number of parameters of pv and hvm match). As the tables are present in debug build only I don't think this is a real issue.Sure, but that imo wants saying in the description.Overall, besides these mainly cosmetic aspects the main thing missing is an approach to prioritize the handful most frequently used functions, for them to be pulled out of the switch() so we don't depend on the compiler's choice for the order of comparisons done.I have already prepared that step by generating the complete call sequence, so any change for prioritizing some hypercalls can be local to the generator script and the used input data. The main question is how to do that. I've collected some hypercall statistics data for PV and PVH guests running some simple tests (once a build of the Xen hypervisor, and once a scp of a large file). The data is split between guest and dom0 (PV) counts. There is no clear "winner" which hypercall should be fastest, but several hypercalls are clearly not important. Here is the data: PV-hypercall PV-guest build PV-guest scp dom0 build dom0 scp mmu_update 186175729 2865 20936 33725Builds should be local to the guest and I/O should involve gnttab ops but no mmu-update. Hence I have a hard time seeing where the huge difference here would be coming from. Did you have any thoughts here? I think you misunderstood the columns. The first column of data is the build job running in domU and the number of hypercalls done by that domU. The 3rd data column is the same test (build running in domU), but the number of hypercalls done by dom0 (so pure backend hypercall activity). The missing gnttab ops on domU side are fine, as granting a page doesn't require a hypercall. stack_switch 1273311 62381 108589 270764 multicall 2182803 50 302 524A fair amount of the mmu-updates is going to be coming through muticalls, I would guess. Priorities therefore may even differ for the two separate dispatch points. I can look into collecting some data here. update_va_mapping 571868 10 60 80 xen_version 73061 850 859 5432 grant_table_op 0 0 35557 139110 iret 75673006 484132 268157 757958The huge differences for builds is puzzling mere here ...vcpu_op 453037 71199 138224 334988 set_segment_base 1650249 62387 108645 270823 mmuext_op 11225681 188 7239 3426... and here as well. Did Dom0 and DomU use identical numbers of vCPU-s and identical -j make option values?sched_op 280153 134645 70729 137943 event_channel_op 192327 66204 71409 214191 physdev_op 0 0 7721 4315 (the dom0 values are for the guest running the build or scp test, so dom0 acting as backend) HVM-hypercall PVH-guest build PVH-guest scp vcpu_op 277684 2324 event_channel_op 350233 57383 (the related dom0 counter values are in the same range as with the test running in the PV guest) It should be noted that during boot of the guests the numbers for the PV guest are more like the ones for the build test with the exception of iret and sched_op being higher, while for PVH sched_op is by far the most often used hypercall. I'm not sure how to translate those numbers into a good algorithm for generating the call sequence.Well, there's never going to be a clear cut fitting everything, I suppose.I could add priorities to each hypercall in hypercall-defs.c and have a cascade of if (likely(foo)) call_foo; else if (likely(bla)) ... else switch(rest).Personally I'd lean to an approach like this one; perhaps there's not even a need to specify priorities for every hypercall, but just the ones we deem most frequently used? See my new series. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |