[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/livepatch: Make check_for_livepatch_work() faster in the common case
On 22.12.2023 23:00, Andrew Cooper wrote: > When livepatching is enabled, this function is used all the time. Really do > check the fastpath first, and annotate it likely() as this is the right answer > 100% of the time (to many significant figures). > > This cuts out 3 pointer dereferences in the "nothing to do path", and it seems > the optimiser has an easier time too. Bloat-o-meter reports: > > add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-57 (-57) > Function old new delta > check_for_livepatch_work.cold 1201 1183 -18 > check_for_livepatch_work 1021 982 -39 > > which isn't too shabby for no logical change. > > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> > I'm still a little disappointed with the code generation. GCC still chooses > to set up the full stack frame (6 regs, +3 more slots) intermixed with the > per-cpu calculations. > > In isolation, GCC can check the boolean without creating a stack frame: > > <work_to_to>: > 48 89 e2 mov %rsp,%rdx > 48 8d 05 de e1 37 00 lea 0x37e1de(%rip),%rax # > ffff82d0405b6068 <per_cpu__work_to_do> > 48 81 ca ff 7f 00 00 or $0x7fff,%rdx > 8b 4a c1 mov -0x3f(%rdx),%ecx > 48 8d 15 45 aa 39 00 lea 0x39aa45(%rip),%rdx # > ffff82d0405d28e0 <__per_cpu_offset> > 48 8b 14 ca mov (%rdx,%rcx,8),%rdx > 0f b6 04 02 movzbl (%rdx,%rax,1),%eax > c3 retq > > but I can't find a way to convince GCC that it would be worth not setting up a > stack frame in in the common case, and having a few extra mov reg/reg's later > in the uncommon case. > > I haven't tried manually splitting the function into a check() and a do() > function. Views on whether that might be acceptable? At a guess, do() would > need to be a static noinline to avoid it turning back into what it currently > is. Or maybe move the fast-path check into an inline function, which calls the (renamed) out-of-line one only when the fast-path check passes? Downside would be that the per-CPU work_to_do variable then couldn't be static anymore. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |