[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH for-4.12 v2 17/17] xen/arm: Track page accessed between batch of Set/Way operations
On Thu, 6 Dec 2018, Julien Grall wrote: > Hi, > > On 12/4/18 8:26 PM, Julien Grall wrote: > > At the moment, the implementation of Set/Way operations will go through > > all the entries of the guest P2M and flush them. However, this is very > > expensive and may render unusable a guest OS using them. > > > > For instance, Linux 32-bit will use Set/Way operations during secondary > > CPU bring-up. As the implementation is really expensive, it may be possible > > to hit the CPU bring-up timeout. > > > > To limit the Set/Way impact, we track what pages has been of the guest > > has been accessed between batch of Set/Way operations. This is done > > using bit[0] (aka valid bit) of the P2M entry. > > > > This patch adds a new per-arch helper is introduced to perform actions just > > before the guest is first unpaused. This will be used to invalidate the > > P2M to track access from the start of the guest. > > > > Signed-off-by: Julien Grall <julien.grall@xxxxxxx> > > > > --- > > > > While we can spread d->creation_finished all over the code, the per-arch > > helper to perform actions just before the guest is first unpaused can > > bring a lot of benefit for both architecture. For instance, on Arm, the > > flush to the instruction cache could be delayed until the domain is > > first run. This would improve greatly the performance of creating guest. > > > > I am still doing the benchmark whether having a command line option is > > worth it. I will provide numbers as soon as I have them. > > I remembered Stefano suggested to look at the impact on the boot. This is a > bit tricky to do as there are many kernel configurations existing and all the > mappings may not have been touched during the boot. > > Instead I wrote a tiny guest [1] that will zero roughly 1GB of memory. Because > the toolstack will always try to allocate with the biggest mapping, I had to > hack a bit the toolstack to be able to test with different mapping size (but > not a mix). The guest has only one vCPU with a dedicated pCPU. > - 1GB: 0.03% slower when starting with valid bit unset > - 2MB: 0.04% faster when starting with valid bit unset > - 4KB: ~3% slower when starting with valid bit unset > > The performance using 1GB and 2MB mapping is pretty much insignificant because > the number of traps is very limited (resp. 1 and 513). With 4KB mapping, there > are a much significant drop because you have more traps (~262700) as the P2M > contains more entries. > > However, having many 4KB mappings in the P2M is pretty unlikely as the > toolstack will always try to get bigger mapping. In real world, you should > only have 4KB mappings when you guest has not memory aligned with a bigger > mapping. If you end up to have many 4KB mappings, then you are already going > to have a performance impact in long run because of the TLB pressure. > > Overall, I would not recommend to introduce a command line option until we > figured out a use case where the trap will be a slow down. Looking at the numbers, I agree with you. This is OK for now. But we should still be open to revisit this issue in the future in case it becomes a problem (I know of customers wanting to boot the system in less than a second overall). _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |