[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] xen/arm: Handling cache maintenance instructions by set/way



On Tue, 5 Dec 2017, Julien Grall wrote:
> Hi all,
> 
> Even though it is an Arm failure, I have CCed x86 folks to get feedback on the
> approach. I have a WIP branch I could share if that interest people.
> 
> Few months ago, we noticed an heisenbug on jobs run by osstest on the
> cubietrucks (see [1]). From the log, we figured out that the guest vCPU 0 is
> in data/prefetch abort state at early boot. I have been able to reproduce it
> reliably, although from the little information I have I think it is related to
> a cache issue because we don't trap cache maintenance instructions by set/way.
> 
> This is a set of 3 instructions (clean, clean & invalidate, invalidate)
> working on a given cache level by S/W. Because the OS is not allowed to infer
> the S/W to PA mapping, it can only use S/W to nuke the whole cache. "The
> expected usage of the cache maintenance that operate by set/way is associated
> with powerdown and powerup of caches, if this is required by the
> implementation" (see D3-2020 ARM DDI 0487B.b).
> 
> Those instructions will target a local processor and usually working in batch
> for nuking the cache. This means if the vCPU is migrated to another pCPU in
> the middle of the process, the cache may not be cleaned. This would result to
> data corruption and potential crash of the OS.
> 
> Thankfully, the Arm architecture offers a way to trap all the cache
> maintenance instructions by S/W (e.g HCR_EL2.TSW). Xen will need to set that
> bit and handle S/W.
> 
> The major question now is how to handle them. S/W instructions are difficult
> to virtualize (see ARMv7 ARM B1.14.4).
> 
> The suggested policy is based on the KVM one:
>       - If we trap a S/W instructions, we enable VM trapping (e.g
> HCR_EL2.TVM) to detect cache being turned on/off, and do a full clean.
>       - We flush the caches on both caches being turned on and off.
>       - Once the caches are enabled, we stop trapping VM instructions.
> 
> Doing a full clean will require to go through the P2M and flush the entries
> one by one. At the moment, all the memory is mapped. As you can imagine
> flushing guest with hundreds of MB will take a very long time (Linux timeout
> during CPU bring).
> 
> Therefore, we need a way to limit the number of entries we need to flush. The
> suggested solution here is to introduce Populate On Demand (PoD) on Arm.
> 
> The guest would boot with no RAM mapped in stage-2 page-table. At every
> prefetch/data abort, the RAM would be mapped using preferably 2MB chunk or
> 4KB. This means that when S/W would be used, the number of entries mapped
> would be very limited. However, for safety, the flush should be preemptible.
> 
> For those been worry about the performance impact, I have looked at the
> current use of S/W instructions:
>       - Linux Arm64: The last used in the kernel was beginning of 2015
>       - Linux Arm32: Still use S/W for boot and secondary CPU bring-up. No
> plan to change.
>       - UEFI: A couple of use in UEFI, but I have heard they plan to remove
> them (need confirmation).
> 
> I haven't looked at all the OSes. However, given the Arm Arm clearly state S/W
> instructions are not easily virtualizable, I would expect guest OSes
> developers to try there best to limit the use of the instructions.
> 
> To limit the performance impact, we could introduce a guest option to tell
> whether the guest will use S/W. If it does plan to use S/W, PoD will be
> disabled.
> 
> Now regarding the hardware domain. At the moment, it has its RAM direct
> mapped. Supporting direct mapping in PoD will be quite a pain for a limited
> benefits (see why above). In that case I would suggest to impose vCPU pinning
> for the hardware domain if the S/W are expected to be used. Again, a command
> line option could be introduced here.
> 
> Any feedbacks on the approach will be welcomed.
 
Could we pin the hwdom vcpus only at boot time, until all S/W operations
are issued, then "release" them? If we can detect the last expected S/W
operation with some sort of heuristic.

Given the information provided above, would it make sense to consider
avoiding PoD for arm64 kernel direct boots?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.