[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
Hello, Julien Grall. Thanks you, I agreed! It made me think once more about what my patch could improve. patches I sent have been reviewed in various ways. It was a good opportunity to analyze my patch from various perspectives. :) I checked objdump in -O2 optimization(default) of Xen Makefile to make sure CSE (Common subexpression elimination) works well on the latest arm64 cross compiler on x86_64 from Arm GNU Toolchain. $ ~/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc -v ... A-profile Architecture 10.3-2021.07 (arm-10.29)' Thread model: posix Supported LTO compression algorithms: zlib gcc version 10.3.1 20210621 (GNU Toolchain for the A-profile Architecture 10.3-2021.07 (arm-10.29) I compared the before and after my patches. This time, without adding a "pages" variable, I proceeded to use the local variable mask with order operation. I was able to confirm that it does one less operation. (1) before clean up 0000000000001bb4 <p2m_set_entry>: while ( nr ) 1bb4: b40005e2 cbz x2, 1c70 <p2m_set_entry+0xbc> { ... if ( rc ) 1c1c: 350002e0 cbnz w0, 1c78 <p2m_set_entry+0xc4> sgfn = gfn_add(sgfn, (1 << order)); 1c20: 1ad32373 lsl w19, w27, w19 // <<< CES works 1c24: 93407e73 sxtw x19, w19 // <<< well! return _gfn(gfn_x(gfn) + i); 1c28: 8b1302d6 add x22, x22, x19 return _mfn(mfn_x(mfn) + i); 1c2c: 8b130281 add x1, x20, x19 1c30: b100069f cmn x20, #0x1 1c34: 9a941034 csel x20, x1, x20, ne // ne = any while ( nr ) 1c38: eb1302b5 subs x21, x21, x19 1c3c: 540001e0 b.eq 1c78 <p2m_set_entry+0xc4> // b.none (2) Using again mask variable. mask = 1UL << order code show me sxtw x19, w19 operation disappeared. 0000000000001bb4 <p2m_set_entry>: while ( nr ) 1bb4: b40005c2 cbz x2, 1c6c <p2m_set_entry+0xb8> { ... if ( rc ) 1c1c: 350002c0 cbnz w0, 1c74 <p2m_set_entry+0xc0> mask = 1UL << order; 1c20: 9ad32373 lsl x19, x27, x19 // <<< only lsl return _gfn(gfn_x(gfn) + i); 1c24: 8b1302d6 add x22, x22, x19 return _mfn(mfn_x(mfn) + i); 1c28: 8b130281 add x1, x20, x19 1c2c: b100069f cmn x20, #0x1 1c30: 9a941034 csel x20, x1, x20, ne // ne = any while ( nr ) 1c34: eb1302b5 subs x21, x21, x19 1c38: 540001e0 b.eq 1c74 <p2m_set_entry+0xc0> // b.none I'll send you a follow-up patch I've been working on. BR Paran Lee 2022-04-25 오전 1:17에 Julien Grall 이(가) 쓴 글: > Hi, > > On 21/04/2022 16:17, Paran Lee wrote: >> It doesn't seem necessary to do that calculation of order shift again. > > I think we need to weight that against increasing the number of local > variables that do pretty much the same. > > This is pretty much done to a matter of taste here. IMHO, the original > version is better but I see Stefano reviewed it so I will not argue > against it. > > That said, given you already sent a few patches, can you explain why you > are doing this? Is this optimization purpose? Is it clean-up? > >> >> Signed-off-by: Paran Lee <p4ranlee@xxxxxxxxx> >> --- >> xen/arch/arm/p2m.c | 9 +++++---- >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c >> index 1d1059f7d2..533afc830a 100644 >> --- a/xen/arch/arm/p2m.c >> +++ b/xen/arch/arm/p2m.c >> @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m, >> while ( nr ) >> { >> unsigned long mask; >> - unsigned long order; >> + unsigned long order, pages; >> /* >> * Don't take into account the MFN when removing mapping (i.e >> @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m, >> if ( rc ) >> break; >> - sgfn = gfn_add(sgfn, (1 << order)); >> + pages = 1 << order; > > Please take the opportunity to switch to 1UL. > >> + sgfn = gfn_add(sgfn, pages); >> if ( !mfn_eq(smfn, INVALID_MFN) ) >> - smfn = mfn_add(smfn, (1 << order)); >> + smfn = mfn_add(smfn, pages); >> - nr -= (1 << order); >> + nr -= pages; >> } >> return rc; > > Cheers, >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |