[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 5/5] just realized that it's broken



>>> "Jan Beulich" <jbeulich@xxxxxxxxxx> 30.01.09 11:48 >>>
>This patch was moving the cpumask out of struct page_info into the page
>itself, but I just realized that this it not valid, since the purpose of the 
>mask
>is to avoid a later TLB flush, hence the base assumption is that the page
>might still be in the TLB of some CPU, thus making it possible for a mis-
>behaved guest to still write to that page.
>
>Sorry for the mis-numbering,
>Jan

Actually, after some more consideration I think the patch is actually correct
(minus an issue that already has been in place for a while anyway): All
access a domain may continue have to a page it's freeing (through stale
TLB entries) is read/execute. This is because when the type count of a
page (i.e. in particular a writeable one) a (domain-)global TLB flush is
performed anyway. Allowing the freeing domain (as well as the one
subsequently allocating) to read the cpumask does not present a problem
in my opinion.

The issue mentioned above that I think current code has even without
that patch (though I may easily have missed some aspect here) is that
there is no enforcement of an immediate TLB flush when a writeable
page's type count drops to zero - instead the flush is deferred until the
end of the current operation or batch. With the removal of the use of
the per-domain lock in these code paths it is no longer valid to defer
the flush this much - it must be carried out before the page in question
gets unlocked.

And I would think that invalidate_shadow_ldt() has a similar issue, plus
it seems questionable that it does a local flush when acting on the current
vCPU, but a global one for all others.

Jan

*******************************************************

Move the (potentially large) cpumask field of free pages into the page
itself, thus making sizeof(struct page_info) independent of the
configured number of CPUs, which avoids penalizing systems with few
CPUs just because the hypervisor is able to support many.

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxxxx>

--- 2009-01-27.orig/xen/common/page_alloc.c     2009-01-29 15:27:38.000000000 
+0100
+++ 2009-01-27/xen/common/page_alloc.c  2009-01-29 15:33:09.000000000 +0100
@@ -376,7 +376,7 @@ static struct page_info *alloc_heap_page
         BUG_ON(pg[i].count_info != 0);
 
         /* Add in any extra CPUs that need flushing because of this page. */
-        cpus_andnot(extra_cpus_mask, pg[i].u.free.cpumask, mask);
+        cpus_andnot(extra_cpus_mask, PFN_CPUMASK(&pg[i]), mask);
         tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
         cpus_or(mask, mask, extra_cpus_mask);
 
@@ -425,11 +425,11 @@ static void free_heap_pages(
         if ( (d = page_get_owner(&pg[i])) != NULL )
         {
             pg[i].tlbflush_timestamp = tlbflush_current_time();
-            pg[i].u.free.cpumask     = d->domain_dirty_cpumask;
+            PFN_CPUMASK(&pg[i])      = d->domain_dirty_cpumask;
         }
         else
         {
-            cpus_clear(pg[i].u.free.cpumask);
+            cpus_clear(PFN_CPUMASK(&pg[i]));
         }
     }
 
--- 2009-01-27.orig/xen/include/asm-ia64/mm.h   2009-01-30 08:26:33.000000000 
+0100
+++ 2009-01-27/xen/include/asm-ia64/mm.h        2009-01-29 14:24:42.000000000 
+0100
@@ -35,8 +35,11 @@ typedef unsigned long page_flags_t;
  * Every architecture must ensure the following:
  *  1. 'struct page_info' contains a 'struct list_head list'.
  *  2. Provide a PFN_ORDER() macro for accessing the order of a free page.
+ *  3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted
+ *     TLBs.
  */
-#define PFN_ORDER(_pfn)        ((_pfn)->u.free.order)
+#define PFN_ORDER(_pfn)   ((_pfn)->u.free.order)
+#define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask)
 
 #define PRtype_info "016lx"
 
--- 2009-01-27.orig/xen/include/asm-x86/mm.h    2009-01-29 14:03:37.000000000 
+0100
+++ 2009-01-27/xen/include/asm-x86/mm.h 2009-01-30 08:28:27.000000000 +0100
@@ -14,8 +14,18 @@
  * Every architecture must ensure the following:
  *  1. 'struct page_info' contains a 'struct page_list_entry list'.
  *  2. Provide a PFN_ORDER() macro for accessing the order of a free page.
+ *  3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted
+ *     TLBs.
  */
 #define PFN_ORDER(_pfn) ((_pfn)->v.free.order)
+#ifdef __i386__
+# define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask)
+#else
+# define PFN_CPUMASK(_pfn) (*(cpumask_t *)page_to_virt(_pfn))
+# if NR_CPUS > PAGE_SIZE * 8
+#  error Cannot handle this many CPUs.
+# endif
+#endif
 
 /*
  * This definition is solely for the use in struct page_info (and
@@ -72,8 +82,10 @@ struct page_info
 
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
+#ifdef __i386__
             /* Mask of possibly-tainted TLBs. */
             cpumask_t cpumask;
+#endif
         } free;
 
     } u;



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.