[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 3/3] xen/arm: p2m: Perform local TLB invalidation on vCPU migration





On 08/03/2017 19:48, Stefano Stabellini wrote:
On Wed, 8 Mar 2017, Julien Grall wrote:
Hi Stefano,

On 08/03/17 18:58, Stefano Stabellini wrote:
On Wed, 8 Mar 2017, Julien Grall wrote:
The ARM architecture allows an OS to have per-CPU page tables, as it
guarantees that TLBs never migrate from one CPU to another.

This works fine until this is done in a guest. Consider the following
scenario:
    - vcpu-0 maps P to V
    - vpcu-1 maps P' to V

If run on the same physical CPU, vcpu-1 can hit in TLBs generated by
vcpu-0 accesses, and access the wrong physical page.

The solution to this is to keep a per-p2m map of which vCPU ran the last
on each given pCPU and invalidate local TLBs if two vPCUs from the same
VM run on the same CPU.

Unfortunately it is not possible to allocate per-cpu variable on the
fly. So for now the size of the array is NR_CPUS, this is fine because
we still have space in the structure domain. We may want to add an
helper to allocate per-cpu variable in the future.

I think I understand the problem, thanks for the description. I have a
question: when a vcpu of a different domain is scheduled on a pcpu,
there is no impact on tlb flushes in relation to this problem, is there?

For example:

1) vcpu0/domain1 -> pcpu0
2) vcpu0/domain2 -> pcpu0
3) vcpu1/domain1 -> pcpu0

In 3) we still need to do the flush_tlb_local(), no matter if 2)
happened or not, right?

That's correct. The last vCPU running is stored per p2m domain and belongs to
the associated domain.


Also, I have one suggestion below.

[...]

diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 0899523084..18c57f936e 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -96,6 +96,9 @@ struct p2m_domain {

     /* back pointer to domain */
     struct domain *domain;
+
+    /* Keeping track on which CPU this p2m was used and for which vCPU */
+    uint8_t last_vcpu_ran[NR_CPUS];
 };

NR_CPUS is usually much larger than cpu_possible_map. I think it would
be best to allocate last_vcpu_ran dynamically (after cpu_possible_map is
set). It should reduce the memory impact significantly.

Well, I thought about it when I wrote the patch. p2m_domain is currently
embedded in the structure domain and Xen will always allocate a page even if
structure is smaller. So for now, static allocation is the best.

Maybe I misunderstood your answer or my suggestion wasn't clear. What
is the problem with something like the following (untested)?

The problem is you allocate more memory than my current solution. As your code below clearly show it, p2m_domain is directly integrated in arch_domain.

The structure domain can never be bigger than PAGE_SIZE (see alloc_domanin_struct) and we will always allocate a page no matter the size of the structure domain. This patch is just filling hole in the page.

So why would we want to switch to something the increase the memory impact?

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.