[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/PV: avoid indirect call for I/O emulation quirk hook
On 18.01.2024 12:04, Andrew Cooper wrote: > On 17/01/2024 9:37 am, Jan Beulich wrote: >> --- a/xen/arch/x86/ioport_emulate.c >> +++ b/xen/arch/x86/ioport_emulate.c >> @@ -8,11 +8,10 @@ >> #include <xen/sched.h> >> #include <xen/dmi.h> >> >> -unsigned int (*__read_mostly ioemul_handle_quirk)( >> - uint8_t opcode, char *io_emul_stub, struct cpu_user_regs *regs); >> +bool __ro_after_init ioemul_handle_quirk; >> >> -static unsigned int cf_check ioemul_handle_proliant_quirk( >> - u8 opcode, char *io_emul_stub, struct cpu_user_regs *regs) >> +unsigned int ioemul_handle_proliant_quirk( >> + uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs) >> { >> static const char stub[] = { >> 0x9c, /* pushf */ > > Something occurred to me. > > diff --git a/xen/arch/x86/ioport_emulate.c b/xen/arch/x86/ioport_emulate.c > index 23cba842b22e..70f94febe255 100644 > --- a/xen/arch/x86/ioport_emulate.c > +++ b/xen/arch/x86/ioport_emulate.c > @@ -13,7 +13,7 @@ bool __ro_after_init ioemul_handle_quirk; > unsigned int ioemul_handle_proliant_quirk( > uint8_t opcode, char *io_emul_stub, const struct cpu_user_regs *regs) > { > - static const char stub[] = { > + const char stub[] = { > 0x9c, /* pushf */ > 0xfa, /* cli */ > 0xee, /* out %al, %dx */ > > is an improvement, confirmed by bloat-o-meter: > > add/remove: 0/1 grow/shrink: 1/0 up/down: 1/-9 (-8) > Function old new delta > ioemul_handle_proliant_quirk 58 59 +1 > stub 9 - -9 > > The reason is that we've got a 9 byte object that's decomposed into two > rip-relative accesses. i.e. we've got more pointer than data in this case. I wouldn't mind this as a separate change, but I don't see how it would fit right here. > But this adjustment seems to tickle a GCC bug. With that change in > place, GCC emits: > > <ioemul_handle_proliant_quirk>: > 48 83 ec 10 sub $0x10,%rsp > ... > 48 83 c4 10 add $0x10,%rsp > c3 retq > > i.e. we get a stack frame (space at least, no initialisation) despite > the object having been converted entirely to instruction immediates. > > Or in other words, there's a further 12 byte saving available when GCC > can be persuaded to not even emit the stack frame. > > What is even more weird is that I see this GCC-10, and a build of gcc > master from last week, but not when I try to reproduce in > https://godbolt.org/z/PnachbznW so there's probably some other setting > used by Xen which tickles this bug. Yeah, I've observed such pointless frame allocation elsewhere as well, so far without being able what exactly triggers it. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |