[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for-4.12 v2 17/17] xen/arm: Track page accessed between batch of Set/Way operations

On Thu, 6 Dec 2018, Julien Grall wrote:
> Hi,
> On 12/4/18 8:26 PM, Julien Grall wrote:
> > At the moment, the implementation of Set/Way operations will go through
> > all the entries of the guest P2M and flush them. However, this is very
> > expensive and may render unusable a guest OS using them.
> > 
> > For instance, Linux 32-bit will use Set/Way operations during secondary
> > CPU bring-up. As the implementation is really expensive, it may be possible
> > to hit the CPU bring-up timeout.
> > 
> > To limit the Set/Way impact, we track what pages has been of the guest
> > has been accessed between batch of Set/Way operations. This is done
> > using bit[0] (aka valid bit) of the P2M entry.
> > 
> > This patch adds a new per-arch helper is introduced to perform actions just
> > before the guest is first unpaused. This will be used to invalidate the
> > P2M to track access from the start of the guest.
> > 
> > Signed-off-by: Julien Grall <julien.grall@xxxxxxx>
> > 
> > ---
> > 
> > While we can spread d->creation_finished all over the code, the per-arch
> > helper to perform actions just before the guest is first unpaused can
> > bring a lot of benefit for both architecture. For instance, on Arm, the
> > flush to the instruction cache could be delayed until the domain is
> > first run. This would improve greatly the performance of creating guest.
> > 
> > I am still doing the benchmark whether having a command line option is
> > worth it. I will provide numbers as soon as I have them.
> I remembered Stefano suggested to look at the impact on the boot. This is a
> bit tricky to do as there are many kernel configurations existing and all the
> mappings may not have been touched during the boot.
> Instead I wrote a tiny guest [1] that will zero roughly 1GB of memory. Because
> the toolstack will always try to allocate with the biggest mapping, I had to
> hack a bit the toolstack to be able to test with different mapping size (but
> not a mix). The guest has only one vCPU with a dedicated pCPU.
>       - 1GB: 0.03% slower when starting with valid bit unset
>       - 2MB: 0.04% faster when starting with valid bit unset
>         - 4KB: ~3% slower when starting with valid bit unset
> The performance using 1GB and 2MB mapping is pretty much insignificant because
> the number of traps is very limited (resp. 1 and 513). With 4KB mapping, there
> are a much significant drop because you have more traps (~262700) as the P2M
> contains more entries.
> However, having many 4KB mappings in the P2M is pretty unlikely as the
> toolstack will always try to get bigger mapping. In real world, you should
> only have 4KB mappings when you guest has not memory aligned with a bigger
> mapping. If you end up to have many 4KB mappings, then you are already going
> to have a performance impact in long run because of the TLB pressure.
> Overall, I would not recommend to introduce a command line option until we
> figured out a use case where the trap will be a slow down.

Looking at the numbers, I agree with you. This is OK for now. But we
should still be open to revisit this issue in the future in case it
becomes a problem (I know of customers wanting to boot the system in
less than a second overall).

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.