WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH 5/5] just realized that it's broken

To: <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH 5/5] just realized that it's broken
From: "Jan Beulich" <jbeulich@xxxxxxxxxx>
Date: Wed, 04 Feb 2009 08:27:24 +0000
Delivery-date: Wed, 04 Feb 2009 00:27:09 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4982E8F4.76E4.0078.0@xxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4982E8F4.76E4.0078.0@xxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>>> "Jan Beulich" <jbeulich@xxxxxxxxxx> 30.01.09 11:48 >>>
>This patch was moving the cpumask out of struct page_info into the page
>itself, but I just realized that this it not valid, since the purpose of the 
>mask
>is to avoid a later TLB flush, hence the base assumption is that the page
>might still be in the TLB of some CPU, thus making it possible for a mis-
>behaved guest to still write to that page.
>
>Sorry for the mis-numbering,
>Jan

Actually, after some more consideration I think the patch is actually correct
(minus an issue that already has been in place for a while anyway): All
access a domain may continue have to a page it's freeing (through stale
TLB entries) is read/execute. This is because when the type count of a
page (i.e. in particular a writeable one) a (domain-)global TLB flush is
performed anyway. Allowing the freeing domain (as well as the one
subsequently allocating) to read the cpumask does not present a problem
in my opinion.

The issue mentioned above that I think current code has even without
that patch (though I may easily have missed some aspect here) is that
there is no enforcement of an immediate TLB flush when a writeable
page's type count drops to zero - instead the flush is deferred until the
end of the current operation or batch. With the removal of the use of
the per-domain lock in these code paths it is no longer valid to defer
the flush this much - it must be carried out before the page in question
gets unlocked.

And I would think that invalidate_shadow_ldt() has a similar issue, plus
it seems questionable that it does a local flush when acting on the current
vCPU, but a global one for all others.

Jan

*******************************************************

Move the (potentially large) cpumask field of free pages into the page
itself, thus making sizeof(struct page_info) independent of the
configured number of CPUs, which avoids penalizing systems with few
CPUs just because the hypervisor is able to support many.

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxxxx>

--- 2009-01-27.orig/xen/common/page_alloc.c     2009-01-29 15:27:38.000000000 
+0100
+++ 2009-01-27/xen/common/page_alloc.c  2009-01-29 15:33:09.000000000 +0100
@@ -376,7 +376,7 @@ static struct page_info *alloc_heap_page
         BUG_ON(pg[i].count_info != 0);
 
         /* Add in any extra CPUs that need flushing because of this page. */
-        cpus_andnot(extra_cpus_mask, pg[i].u.free.cpumask, mask);
+        cpus_andnot(extra_cpus_mask, PFN_CPUMASK(&pg[i]), mask);
         tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
         cpus_or(mask, mask, extra_cpus_mask);
 
@@ -425,11 +425,11 @@ static void free_heap_pages(
         if ( (d = page_get_owner(&pg[i])) != NULL )
         {
             pg[i].tlbflush_timestamp = tlbflush_current_time();
-            pg[i].u.free.cpumask     = d->domain_dirty_cpumask;
+            PFN_CPUMASK(&pg[i])      = d->domain_dirty_cpumask;
         }
         else
         {
-            cpus_clear(pg[i].u.free.cpumask);
+            cpus_clear(PFN_CPUMASK(&pg[i]));
         }
     }
 
--- 2009-01-27.orig/xen/include/asm-ia64/mm.h   2009-01-30 08:26:33.000000000 
+0100
+++ 2009-01-27/xen/include/asm-ia64/mm.h        2009-01-29 14:24:42.000000000 
+0100
@@ -35,8 +35,11 @@ typedef unsigned long page_flags_t;
  * Every architecture must ensure the following:
  *  1. 'struct page_info' contains a 'struct list_head list'.
  *  2. Provide a PFN_ORDER() macro for accessing the order of a free page.
+ *  3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted
+ *     TLBs.
  */
-#define PFN_ORDER(_pfn)        ((_pfn)->u.free.order)
+#define PFN_ORDER(_pfn)   ((_pfn)->u.free.order)
+#define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask)
 
 #define PRtype_info "016lx"
 
--- 2009-01-27.orig/xen/include/asm-x86/mm.h    2009-01-29 14:03:37.000000000 
+0100
+++ 2009-01-27/xen/include/asm-x86/mm.h 2009-01-30 08:28:27.000000000 +0100
@@ -14,8 +14,18 @@
  * Every architecture must ensure the following:
  *  1. 'struct page_info' contains a 'struct page_list_entry list'.
  *  2. Provide a PFN_ORDER() macro for accessing the order of a free page.
+ *  3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted
+ *     TLBs.
  */
 #define PFN_ORDER(_pfn) ((_pfn)->v.free.order)
+#ifdef __i386__
+# define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask)
+#else
+# define PFN_CPUMASK(_pfn) (*(cpumask_t *)page_to_virt(_pfn))
+# if NR_CPUS > PAGE_SIZE * 8
+#  error Cannot handle this many CPUs.
+# endif
+#endif
 
 /*
  * This definition is solely for the use in struct page_info (and
@@ -72,8 +82,10 @@ struct page_info
 
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
+#ifdef __i386__
             /* Mask of possibly-tainted TLBs. */
             cpumask_t cpumask;
+#endif
         } free;
 
     } u;



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>