[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH for-4.12 v2 17/17] xen/arm: Track page accessed between batch of Set/Way operations
Hi, On 12/4/18 8:26 PM, Julien Grall wrote: At the moment, the implementation of Set/Way operations will go through all the entries of the guest P2M and flush them. However, this is very expensive and may render unusable a guest OS using them. For instance, Linux 32-bit will use Set/Way operations during secondary CPU bring-up. As the implementation is really expensive, it may be possible to hit the CPU bring-up timeout. To limit the Set/Way impact, we track what pages has been of the guest has been accessed between batch of Set/Way operations. This is done using bit[0] (aka valid bit) of the P2M entry. This patch adds a new per-arch helper is introduced to perform actions just before the guest is first unpaused. This will be used to invalidate the P2M to track access from the start of the guest. Signed-off-by: Julien Grall <julien.grall@xxxxxxx> --- While we can spread d->creation_finished all over the code, the per-arch helper to perform actions just before the guest is first unpaused can bring a lot of benefit for both architecture. For instance, on Arm, the flush to the instruction cache could be delayed until the domain is first run. This would improve greatly the performance of creating guest. I am still doing the benchmark whether having a command line option is worth it. I will provide numbers as soon as I have them. I remembered Stefano suggested to look at the impact on the boot. This is a bit tricky to do as there are many kernel configurations existing and all the mappings may not have been touched during the boot. Instead I wrote a tiny guest [1] that will zero roughly 1GB of memory. Because the toolstack will always try to allocate with the biggest mapping, I had to hack a bit the toolstack to be able to test with different mapping size (but not a mix). The guest has only one vCPU with a dedicated pCPU. - 1GB: 0.03% slower when starting with valid bit unset - 2MB: 0.04% faster when starting with valid bit unset - 4KB: ~3% slower when starting with valid bit unsetThe performance using 1GB and 2MB mapping is pretty much insignificant because the number of traps is very limited (resp. 1 and 513). With 4KB mapping, there are a much significant drop because you have more traps (~262700) as the P2M contains more entries. However, having many 4KB mappings in the P2M is pretty unlikely as the toolstack will always try to get bigger mapping. In real world, you should only have 4KB mappings when you guest has not memory aligned with a bigger mapping. If you end up to have many 4KB mappings, then you are already going to have a performance impact in long run because of the TLB pressure. Overall, I would not recommend to introduce a command line option until we figured out a use case where the trap will be a slow down. Cheers, [1] .text b _start /* branch to kernel start, magic */ .long 0 /* reserved */.quad 0x0 /* Image load offset from start of RAM */ .quad 0x0 /* XXX: Effective Image size */ .quad 2 /* kernel flags: LE, 4K page size */ .quad 0 /* reserved */ .quad 0 /* reserved */ .quad 0 /* reserved */ .byte 0x41 /* Magic number, "ARM\x64" */ .byte 0x52 .byte 0x4d .byte 0x64 .long 0 /* reserved */ _start: isb mrs x0, CNTPCT_EL0 isb adrp x2, _end ldr x3, =(0x40000000 + (1 << 30)) 1: str xzr, [x2], #8 cmp x2, x3 b.lo 1b isb mrs x1, CNTPCT_EL0 isb hvc #0xffff 1: b 1b -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |