[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Shattering superpages impact on IOMMU in Xen



Hi Andrew,

On 03/04/17 18:16, Andrew Cooper wrote:
On 03/04/17 18:02, Julien Grall wrote:
Hi Andrew,

On 03/04/17 17:42, Andrew Cooper wrote:
On 03/04/17 17:24, Oleksandr Tyshchenko wrote:
Hi, all.

Playing with non-shared IOMMU in Xen on ARM I faced one interesting
thing. I found out that the superpages were shattered during domain
life cycle.
This is the result of mapping of foreign pages, ballooning memory,
even if domain maps Xen shared pages, etc.
I don't bother with the memory fragmentation at the moment. But,
shattering bothers me from the IOMMU point of view.
As the Xen owns IOMMU it might manipulate IOMMU page tables when
passthoughed/protected device doing DMA in Linux. It is hard to detect
when the DMA transaction isn't in progress
in order to prevent this race. So, if we have inflight transaction
from a device when changing IOMMU mapping we might get into trouble.
Unfortunately, not in all the cases the
faulting transaction can be restarted. The chance to hit the problem
increases during shattering.

I did next test:
The dom0 on my setup contains ethernet IP that are protected by IOMMU.
What is more, as the IOMMU I am playing with supports superpages (2M,
1G) the IOMMU driver
takes into account these capabilities when building page tables. As I
gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks
only. As I am using NFS for both dom0 and domU the ethernet IP
performs DMA transactions almost all the time.
Sometimes, I see the IOMMU page faults during creating guest domain. I
think, it happens during Xen is shattering 2M mappings 4K mappings (it
unmaps dom0 pages by one 4K page at a time, then maps domU pages there
for copying domU images).
But, I don't see any page faults when the IOMMU page table was built
by 4K pages only.

I had a talk with Julien on IIRC and we came to conclusion that the
safest way would be to use 4K pages to prevent shattering, so the
IOMMU shouldn't report superpage capability.
On the other hand, if we build IOMMU from 4K pages we will have
performance drop (during building, walking page tables), TLB pressure,
etc.
Another possible solution Julien was suggesting is to always
ballooning with 2M, 1G, and not using 4K. That would help us to
prevent shattering effect.
The discussion was moved to the ML since it seems to be a generic
issue and the right solution should be think of.

What do you think is the right way to follow? Use 4K pages and don't
bother with shattering or try to optimize? And if the idea to make
balloon mechanism smarter makes sense how to teach balloon to do so?
Thank you.

Ballooning and foreign mappings are terrible for trying to retain
superpage mappings.  No OS, not even Linux, can sensibly provide victim
pages in a useful way to avoid shattering.

If you care about performance, don't ever balloon.  Foreign mappings in
translated guests should start from the top of RAM, and work upwards.

I am not sure to understand this. Can you extend?

I am not sure what is unclear.  Handing random frames of RAM back to the
hypervisor is what exacerbates host superpage fragmentation, and all
balloon drivers currently do it.

If you want to avoid host superpage fragmentation, don't use a
scattergun approach of handing frames back to Xen.  However, because
even Linux doesn't provide enough hooks into the physical memory
management logic, the only solution is to not balloon at all, and to use
already-unoccupied frames for foreign mappings.

Do you have any pointer in the Linux code?





As for the IOMMU specifically, things are rather easier.  It is the
guests responsibility to ensure that frames offered up for ballooning or
foreign mappings are unused.  Therefore, if anything cares about the
specific 4K region becoming non-present in the IOMMU mappings, it is the
guest kernels fault for offering up a frame already in use.

For the shattering however, It is Xen's responsibility to ensure that
all other mappings stay valid at all points.  The correct way to do this
is to construct a new L1 table, mirroring the L2 superpage but lacking
the specific 4K mapping in question, then atomically replace the L2
superpage entry with the new L1 table, then issue an IOMMU TLB
invalidation to remove any cached mappings.

By following that procedure, all DMA within the 2M region, but not
hitting the 4K frame, won't observe any interim lack of mappings.  It
appears from your description that Xen isn't following the procedure.

Xen is following what's the ARM ARM is mandating. For shattering page
table, we have to follow the break-before-sequence i.e:
    - Invalidate the L2 entry
    - Flush the TLBs
    - Add the new L1 table
See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a
small window where there are no valid mapping. It is easy to trap data
abort from processor and restarting it but not for device memory
transactions.

Xen by default is sharing stage-2 page tables with between the IOMMU
and the MMU. However, from the discussion I had with Oleksandr, they
are not sharing page tables and still see the problem. I am not sure
how they are updating the page table here. Oleksandr, can you provide
more details?

Are you saying that ARM has no way of making atomic updates to the IOMMU
mappings?  (How do I get access to that document?  Google gets me to
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html,
but
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k/index.html
which looks like the document you specified results in 404.)

Below a link, I am not sure why google does not refer it:

http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k_10775/index.html


If so, that is an architecture bug IMO.  By design, the IOMMU is out of
control of guest software, and the hypervisor should be able to make
atomic modifications without guest cooperation.

I think you misread what I meant, IOMMU supports atomic operations. However, if you share the page table we have to apply Break-Before-Make when shattering superpage. This is mandatory if you want to get Xen running on all the micro-architectures.

Some IOMMU may cope with the BBM, some not. I haven't seen any issue so far (it does not mean there are none).

The IOMMU used by Oleksandr (e.g VMSA-IPMMU) is an IP from Renesas which I never used myself. In his case he needs different page tables because the layouts are not the same.

Oleksandr, looking at the code your provided, the superpage are split the way Andrew said, i.e:
        1) allocating level 3 table minus the 4K mapping
        2) replace level 2 entry with the new table

Am I right?

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.