[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 18/22] xen/arm: p2m: Introduce p2m_set_entry and __p2m_set_entry



Hi Stefano,

On 06/09/2016 19:51, Stefano Stabellini wrote:
On Tue, 6 Sep 2016, Julien Grall wrote:
I was asking myself the same question

It is not trivial. On ARMv7, there is no way to invalidate by IPA, so we still
need to do a full flush.

In the case of ARMv8, it is possible to do a flush by IPA with the following
sequence (see D4-1739 in ARM DDI 0487A.i):
 tlbi ipa2e1, xt
 dsb
 tlbi vmalle1

So I was wondering if we could leave that for a future optimization.

We can leave it for now but I have an hunch that it is going to have a
pretty significant impact.

In theory, the current approach will have an impact on platform where the TLBs are caching separately stage-1 and stage-2 translation.

This will not be the case if the TLBs are caching stage-1 and stage-2 in a single entry.

However, on ARMv7 it will not be possible to minimize the impact.

But to be honest, most of the time, the p2m will be modified via guest_physmap_add_entry and guest_physmap_remove_page. Both are often called with order = 0 or order = 6 (for domain use 64KB page granularity).

Taken aside domains using 64KB page granularity, the number of TLB flushs will be either 1 or 2 depending whether you need to shatter a superpage. Which is the same as today.

In the case of 64KB domain, the number of TLB flush will be higher if the domain is replacing existing mapping (1 TLB flush per 4KB-entry). But this is already a programming issue given that in this case the underlying memory (if RAM) will not be freed until the domain is destroyed. In general a domain should remove a mapping before creating a new one at the same address.

I have done some testing and noticed that DOMU p2m will not be often modified. In the case of DOM0, this will mostly happen when a domain is created (you have to map the new domain memory). Although, the number of flushes should be the same given dom0 will use balloon page (i.e the stage-2 mapping does not exist).

I have some ideas on how to optimize a bit more the code, but they are heavy and will be hard to maintain. I would prefer to defer it until user come with use case where the performance hit is too much.



+        if ( !(mask & ((1UL << FIRST_ORDER) - 1)) )
+            order = FIRST_ORDER;
+        else if ( !(mask & ((1UL << SECOND_ORDER) - 1)) )
+            order = SECOND_ORDER;
+        else
+            order = THIRD_ORDER;
+
+        rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
+        if ( rc )
+            break;

Shouldn't we be checking that "(1<<order)" doesn't exceed "todo" before
calling __p2m_set_entry? Otherwise we risk creating a mapping bigger
than requested.

The mask is defined as gfn_x(sgfn) | mfn_x(smfn) | todo

So we will never have the order exceeding "todo".

Ah, I see. But this way we are not able to do superpage mappings for
regions larger than 2MB but not multiple of 2MB (3MB for example). But I
don't think we were able to do it before either so it is not a
requirement for the patch.

From my understanding of is_mapping_aligned, the case you mentioned is handled.

The function p2m_set_entry is using the number of page because some caller (such as MMIO ones) are passing a number of page. However, the main callers are guest_physmap_remove_page and guest_physmap_add_entry. They take an order in parameter.

So it would be a nice feature to have, but I don't think this is strictly necessary.

Regards,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.