[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 4/6] x86/trace: Reduce stack usage from HVMTRACE_ND()
On 20/09/2021 10:05, Jan Beulich wrote: > On 17.09.2021 10:45, Andrew Cooper wrote: >> It is pointless to write all 6 entries and only consume the useful subset. >> bloat-o-meter shows quite how obscene the overhead is in >> vmx_vmexit_handler(), >> weighing in at 11% of the function arranging unread zeroes on the stack, and >> 8% for svm_vmexit_handler(). >> >> add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1867 (-1867) >> Function old new delta >> hvm_msr_write_intercept 1049 1033 -16 >> vmx_enable_intr_window 238 214 -24 >> svm_enable_intr_window 337 313 -24 >> hvmemul_write_xcr 115 91 -24 >> hvmemul_write_cr 350 326 -24 >> hvmemul_read_xcr 115 91 -24 >> hvmemul_read_cr 146 122 -24 >> hvm_mov_to_cr 438 414 -24 >> hvm_mov_from_cr 253 229 -24 >> vmx_intr_assist 1150 1118 -32 >> svm_intr_assist 459 427 -32 >> hvm_rdtsc_intercept 138 106 -32 >> hvm_msr_read_intercept 898 866 -32 >> vmx_vmenter_helper 1142 1094 -48 >> vmx_inject_event 813 765 -48 >> svm_vmenter_helper 238 190 -48 >> hvm_hlt 197 146 -51 >> svm_inject_event 1678 1614 -64 >> svm_vmexit_handler 5880 5416 -464 >> vmx_vmexit_handler 7281 6473 -808 >> Total: Before=3644184, After=3642317, chg -0.05% >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> Thanks, but this is buggy. There are direct callers of HVMTRACE_ND() which need adjustments too. There is also a further optimisation for the 0 case which drops even more. > >> Normally I wouldn't recommend patches like this for backport, but >> {vmx,svm}_vmexit_handler() are fastpaths and this is a *lot* of I-cache lines >> dropped... > The change in size is indeed unexpectedly large for these two functions. > However, what I find puzzling is that TRACEBUFFER is enabled by default > (i.e. also in release builds) in the first place, and that it can only > be disabled when EXPERT. Its not surprising in the slightest. TRACEBUFFER long predates Kconfig. > More gains could be had towards dropped code if > the option wasn't on by default in at least release builds. "Debugging > or performance analysis" (as its help text says) after all isn't a > primary target of release builds. All performance analysis needs doing on release builds. > IOW what I'd prefer to consider a backport candidate would be a patch > changing the option's default. Thoughts? I very much doubt that XenServer are the only people who use xentrace in customer environments. I'm -1 to changing the default in staging, and firmly against doing so in older releases. >> --- a/xen/include/asm-x86/hvm/trace.h >> +++ b/xen/include/asm-x86/hvm/trace.h >> @@ -67,38 +67,30 @@ >> #define TRACE_2_LONG_4D(_e, d1, d2, d3, d4, ...) \ >> TRACE_6D(_e, d1, d2, d3, d4) >> >> -#define HVMTRACE_ND(evt, modifier, cycles, count, d1, d2, d3, d4, d5, d6) \ >> +#define HVMTRACE_ND(evt, modifier, cycles, ...) \ >> do { \ >> if ( unlikely(tb_init_done) && DO_TRC_HVM_ ## evt ) \ >> { \ >> - struct { \ >> - u32 d[6]; \ >> - } _d; \ >> - _d.d[0]=(d1); \ >> - _d.d[1]=(d2); \ >> - _d.d[2]=(d3); \ >> - _d.d[3]=(d4); \ >> - _d.d[4]=(d5); \ >> - _d.d[5]=(d6); \ >> + uint32_t _d[] = { __VA_ARGS__ }; \ >> __trace_var(TRC_HVM_ ## evt | (modifier), cycles, \ >> - sizeof(*_d.d) * count, &_d); \ >> + sizeof(_d), _d); \ >> } \ >> } while(0) >> >> #define HVMTRACE_6D(evt, d1, d2, d3, d4, d5, d6) \ >> - HVMTRACE_ND(evt, 0, 0, 6, d1, d2, d3, d4, d5, d6) >> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5, d6) >> #define HVMTRACE_5D(evt, d1, d2, d3, d4, d5) \ >> - HVMTRACE_ND(evt, 0, 0, 5, d1, d2, d3, d4, d5, 0) >> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4, d5) >> #define HVMTRACE_4D(evt, d1, d2, d3, d4) \ >> - HVMTRACE_ND(evt, 0, 0, 4, d1, d2, d3, d4, 0, 0) >> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3, d4) >> #define HVMTRACE_3D(evt, d1, d2, d3) \ >> - HVMTRACE_ND(evt, 0, 0, 3, d1, d2, d3, 0, 0, 0) >> + HVMTRACE_ND(evt, 0, 0, d1, d2, d3) >> #define HVMTRACE_2D(evt, d1, d2) \ >> - HVMTRACE_ND(evt, 0, 0, 2, d1, d2, 0, 0, 0, 0) >> + HVMTRACE_ND(evt, 0, 0, d1, d2) >> #define HVMTRACE_1D(evt, d1) \ >> - HVMTRACE_ND(evt, 0, 0, 1, d1, 0, 0, 0, 0, 0) >> + HVMTRACE_ND(evt, 0, 0, d1) >> #define HVMTRACE_0D(evt) \ >> - HVMTRACE_ND(evt, 0, 0, 0, 0, 0, 0, 0, 0, 0) >> + HVMTRACE_ND(evt, 0, 0) > These HVMTRACE_<n>D() aren't this much of a gain anymore; perhaps down > the road we will want to have just a single wrapper macro adding the > modifier and cycles arguments, otherwise making use of variable > arguments as well? Same on the plain TRACE() side. There is an awful lot of cleanup to do here. Other findings include HVM records using the non-HVM helpers (to have cycles included), and examples such as vpic_ack_pending_irq() which duplicate calls vlapic_accept_pic_intr(), causing 3 trace records to be written out. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |