[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] xen/arm: Handling cache maintenance instructions by set/way

On 05/12/2017 22:35, Stefano Stabellini wrote:
On Tue, 5 Dec 2017, Julien Grall wrote:
Hi all,

Even though it is an Arm failure, I have CCed x86 folks to get feedback on the
approach. I have a WIP branch I could share if that interest people.

Few months ago, we noticed an heisenbug on jobs run by osstest on the
cubietrucks (see [1]). From the log, we figured out that the guest vCPU 0 is
in data/prefetch abort state at early boot. I have been able to reproduce it
reliably, although from the little information I have I think it is related to
a cache issue because we don't trap cache maintenance instructions by set/way.

This is a set of 3 instructions (clean, clean & invalidate, invalidate)
working on a given cache level by S/W. Because the OS is not allowed to infer
the S/W to PA mapping, it can only use S/W to nuke the whole cache. "The
expected usage of the cache maintenance that operate by set/way is associated
with powerdown and powerup of caches, if this is required by the
implementation" (see D3-2020 ARM DDI 0487B.b).

Those instructions will target a local processor and usually working in batch
for nuking the cache. This means if the vCPU is migrated to another pCPU in
the middle of the process, the cache may not be cleaned. This would result to
data corruption and potential crash of the OS.

Thankfully, the Arm architecture offers a way to trap all the cache
maintenance instructions by S/W (e.g HCR_EL2.TSW). Xen will need to set that
bit and handle S/W.

The major question now is how to handle them. S/W instructions are difficult
to virtualize (see ARMv7 ARM B1.14.4).

The suggested policy is based on the KVM one:
        - If we trap a S/W instructions, we enable VM trapping (e.g
HCR_EL2.TVM) to detect cache being turned on/off, and do a full clean.
        - We flush the caches on both caches being turned on and off.
        - Once the caches are enabled, we stop trapping VM instructions.

Doing a full clean will require to go through the P2M and flush the entries
one by one. At the moment, all the memory is mapped. As you can imagine
flushing guest with hundreds of MB will take a very long time (Linux timeout
during CPU bring).

Therefore, we need a way to limit the number of entries we need to flush. The
suggested solution here is to introduce Populate On Demand (PoD) on Arm.

The guest would boot with no RAM mapped in stage-2 page-table. At every
prefetch/data abort, the RAM would be mapped using preferably 2MB chunk or
4KB. This means that when S/W would be used, the number of entries mapped
would be very limited. However, for safety, the flush should be preemptible.

For those been worry about the performance impact, I have looked at the
current use of S/W instructions:
        - Linux Arm64: The last used in the kernel was beginning of 2015
        - Linux Arm32: Still use S/W for boot and secondary CPU bring-up. No
plan to change.
        - UEFI: A couple of use in UEFI, but I have heard they plan to remove
them (need confirmation).

I haven't looked at all the OSes. However, given the Arm Arm clearly state S/W
instructions are not easily virtualizable, I would expect guest OSes
developers to try there best to limit the use of the instructions.

To limit the performance impact, we could introduce a guest option to tell
whether the guest will use S/W. If it does plan to use S/W, PoD will be

Now regarding the hardware domain. At the moment, it has its RAM direct
mapped. Supporting direct mapping in PoD will be quite a pain for a limited
benefits (see why above). In that case I would suggest to impose vCPU pinning
for the hardware domain if the S/W are expected to be used. Again, a command
line option could be introduced here.

Any feedbacks on the approach will be welcomed.
Could we pin the hwdom vcpus only at boot time, until all S/W operations
are issued, then "release" them? If we can detect the last expected S/W
operation with some sort of heuristic.

Feel free to suggest a way. I haven't found it. But to be honest, you have seen how much people care about 32-bit hwdom today. So I would not spend too much time thinking about optimizing it.

Given the information provided above, would it make sense to consider
avoiding PoD for arm64 kernel direct boots?

Please suggest a way to kernel an arm64 kernel direct boot and not using S/W. I don't see any.

The only solution, I can see, is to provide a configuration option at boot time as I suggested a bit above:

"To limit the performance impact, we could introduce a guest option to tell whether the guest will use S/W. If it does plan to use S/W, PoD will be disabled."

But at this stage, my concern is fixing blatant bug in Xen and performance is a second step.


Julien Grall

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.