[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
Hi, On 26/04/2022 16:37, Paran Lee wrote: Thanks you, I agreed! It made me think once more about what my patch could improve. patches I sent have been reviewed in various ways. It was a good opportunity to analyze my patch from various perspectives. :) I checked objdump in -O2 optimization(default) of Xen Makefile to make sure CSE (Common subexpression elimination) works well on the latest arm64 cross compiler on x86_64 from Arm GNU Toolchain. $ ~/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc -v ... A-profile Architecture 10.3-2021.07 (arm-10.29)' Thread model: posix Supported LTO compression algorithms: zlib gcc version 10.3.1 20210621 (GNU Toolchain for the A-profile Architecture 10.3-2021.07 (arm-10.29) I compared the before and after my patches. This time, without adding a "pages" variable, I proceeded to use the local variable mask with order operation. I was able to confirm that it does one less operation. Well... I don't think the one less operation is because of introduction of the local variable (see more below). (1) before clean up 0000000000001bb4 <p2m_set_entry>: while ( nr ) 1bb4: b40005e2 cbz x2, 1c70 <p2m_set_entry+0xbc> { ... if ( rc ) 1c1c: 350002e0 cbnz w0, 1c78 <p2m_set_entry+0xc4> sgfn = gfn_add(sgfn, (1 << order)); 1 << order is a 32-bit value but the second parameter is a 64-bit value (assuming arm64). So... 1c20: 1ad32373 lsl w19, w27, w19 // <<< CES works 1c24: 93407e73 sxtw x19, w19 // <<< well! ... this instruction is extending the 32-bit value to 64-bit value. This code is not only using a local variable but also using "1UL". So, I suspect that if you were using 1 << order, the instruction would re-appear.return _gfn(gfn_x(gfn) + i); 1c28: 8b1302d6 add x22, x22, x19 return _mfn(mfn_x(mfn) + i); 1c2c: 8b130281 add x1, x20, x19 1c30: b100069f cmn x20, #0x1 1c34: 9a941034 csel x20, x1, x20, ne // ne = any while ( nr ) 1c38: eb1302b5 subs x21, x21, x19 1c3c: 540001e0 b.eq 1c78 <p2m_set_entry+0xc4> // b.none (2) Using again mask variable. mask = 1UL << order code show me sxtw x19, w19 operation disappeared. Cheers, -- Julien Grall
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |