[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/hvm: Simplify stdvga_mem_accept() further
On 12.09.2024 14:06, Andrew Cooper wrote: > stdvga_mem_accept() is called on almost all IO emulations, and the > overwhelming likely answer is to reject the ioreq. Simply rearranging the > expression yields an improvement: > > add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-57 (-57) > Function old new delta > stdvga_mem_accept 109 52 -57 > > which is best explained looking at the disassembly: > > Before: After: > f3 0f 1e fa endbr64 f3 0f 1e fa > endbr64 > 0f b6 4e 1e movzbl 0x1e(%rsi),%ecx | 0f b6 46 1e > movzbl 0x1e(%rsi),%eax > 48 8b 16 mov (%rsi),%rdx | 31 d2 > xor %edx,%edx > f6 c1 40 test $0x40,%cl | a8 30 > test $0x30,%al > 75 38 jne <stdvga_mem_accept+0x48> | 75 23 > jne <stdvga_mem_accept+0x31> > 31 c0 xor %eax,%eax < > 48 81 fa ff ff 09 00 cmp $0x9ffff,%rdx < > 76 26 jbe <stdvga_mem_accept+0x41> < > 8b 46 14 mov 0x14(%rsi),%eax < > 8b 7e 10 mov 0x10(%rsi),%edi < > 48 0f af c7 imul %rdi,%rax < > 48 8d 54 02 ff lea -0x1(%rdx,%rax,1),%rdx < > 31 c0 xor %eax,%eax < > 48 81 fa ff ff 0b 00 cmp $0xbffff,%rdx < > 77 0c ja <stdvga_mem_accept+0x41> < > 83 e1 30 and $0x30,%ecx < > 75 07 jne <stdvga_mem_accept+0x41> < > 83 7e 10 01 cmpl $0x1,0x10(%rsi) 83 7e 10 01 > cmpl $0x1,0x10(%rsi) > 0f 94 c0 sete %al | 75 1d > jne <stdvga_mem_accept+0x31> > c3 ret | 48 8b 0e > mov (%rsi),%rcx > 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) | 48 81 f9 ff ff > 09 00 cmp $0x9ffff,%rcx > 8b 46 10 mov 0x10(%rsi),%eax | 76 11 > jbe <stdvga_mem_accept+0x31> > 8b 7e 14 mov 0x14(%rsi),%edi | 8b 46 14 > mov 0x14(%rsi),%eax > 49 89 d0 mov %rdx,%r8 | 48 8d 44 01 ff > lea -0x1(%rcx,%rax,1),%rax > 48 83 e8 01 sub $0x1,%rax | 48 3d ff ff 0b > 00 cmp $0xbffff,%rax > 48 8d 54 3a ff lea -0x1(%rdx,%rdi,1),%rdx | 0f 96 c2 > setbe %dl > 48 0f af c7 imul %rdi,%rax | 89 d0 > mov %edx,%eax > 49 29 c0 sub %rax,%r8 < > 31 c0 xor %eax,%eax < > 49 81 f8 ff ff 09 00 cmp $0x9ffff,%r8 < > 77 be ja <stdvga_mem_accept+0x2a> < > c3 ret c3 > ret > > By moving the "p->count != 1" check ahead of the > ioreq_mmio_{first,last}_byte() calls, both multiplies disappear along with a > lot of surrounding logic. > > No functional change. > > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> > --- a/xen/arch/x86/hvm/stdvga.c > +++ b/xen/arch/x86/hvm/stdvga.c > @@ -69,18 +69,14 @@ static int cf_check stdvga_mem_write( > static bool cf_check stdvga_mem_accept( > const struct hvm_io_handler *handler, const ioreq_t *p) > { > - if ( (ioreq_mmio_first_byte(p) < VGA_MEM_BASE) || > + /* > + * Only accept single direct writes, as that's the only thing we can > + * accelerate using buffered ioreq handling. > + */ > + if ( p->dir != IOREQ_WRITE || p->data_is_ptr || p->count != 1 || > + (ioreq_mmio_first_byte(p) < VGA_MEM_BASE) || > (ioreq_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) ) Arguably the function calls are then pointless (as generated code proves), but maybe keeping them for doc purposes is indeed worthwhile. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |