[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC] Nested Paging Live Migration



Retry.

1. Most common code are moved from shadow to paging:
* log dirty related fields (dirty_count ...) are moved to paging_domain
* log_dirty_bitmap allocation, free, peek, and clean
* mark_dirty_page becomes a common function too
* a new lock dirty lock is created to guard these code

2. shadow/hap_log_dirty_enable() and shadow/hap_log_dirty_disable()
These four functions were not changed. However, I really want to create two common functions (paging_log_dirty_disable() and paging_log_dirty_enable()) for them. To do this, it requires two function pointers (*log_dirty_enable() and *log_dirty_disable()), which point to shadow-specific code or hap-specific code. For example, *log_dirty_enable() points to shadow_log_dirty_enable().

Tim, let me know if you like this approach.

3. p2m_set_l1e_flags() is renamed to p2m_set_flags_global() as requested. It does NOT walk P2M. Instead, it still relies on set_p2m_entry() to walk P2M table.

The reason: I feel uncomfortable to duplicate the code of set_p2m_entry() in this method. Most of them will be same as set_p2m_entry() and p2m_next_level(). What is your opinion?


Any comments is welcome. I will create a new patch after collecting them.

Thanks,

-Wei


Tim Deegan wrote:
Hi,

Thanks for this patch.

At 10:05 -0500 on 01 Jun (1180692316), Huang2, Wei wrote:
 > The attached file supports AMD-V nested paging live migration. Please
 > comment. I will create an updated version after collecting feedbacks.

Can a lot more log-dirty code (bitmap allocation, clearing, reporting)
be made common?  E.g.: hap_mark_dirty() is virtually identical to
sh_mark_dirty() -- including some recursive locking and associated
comments that are not true in HAP modes.  Maybe give it its own lock to
cover bit-setting?  Probably only the code for clearing the bitmap
(i.e., resetting the trap that will cause us to mark pages dirty) needs
to be split out.

 > The following areas require special attention:
 > 1. paging_mark_dirty()
 > Currently, paging_mark_dirty() dispatches to sh_mark_dirty() or
 > hap_mark_dirty() based on paging support. I personally prefer a function
 > pointer. However, current paging interface only provides a function
 > pointer for vcpu-level functions, not for domain-level functions. This
 > is a bit annoying.

Make it a common function and that should go away.
> 2. locking in p2m_set_l1e_flags()
 > p2m_set_l1e_flags(), which is invoked by hap.c, calls
 > hap_write_p2m_entry(). hap_lock() is called twice. I currently remove
 > hap_lock() in hap_write_p2m_entry(). A better solution is needed here.

Hmm.  Since you don't ever change the monitor table of a HAP domain, it
might be possible to make hap_write_p2m_entry (and
hap.c:p2m_install_entry_in_monitors()) safe without locking.

It is worth noting that this would be a different locking discipline
from the one used in shadow code -- code paths that take both the p2m
lock and the shadow lock always take the p2m lock first (there are some
convolutions in shadow init routines etc to make sure this is true).
If the hap lock is to be taken before the p2m lock that will need some
care and attention in the rest of the code.


> +/* This function handles P2M page faults by fixing l1e flags with correct > + * values. It also calls paging_mark_dirty() function to record the dirty
 > + * pages.
 > + */
 > +int p2m_fix_table(struct domain *d, paddr_t gpa)

Can this have a better name?  It's not really fixing anything. Maybe
have this be p2m_set_flags() and the previous function be
p2m_set_flags_global()?

Also maybe the call to mark_dirty could be made from the SVM code, which
is where we're really handling the write?

Cheers,

Tim.

--
Tim Deegan <Tim.Deegan@xxxxxxxxxxxxx>, XenSource UK Limited
Registered office c/o EC2Y 5EB, UK; company number 05334508


diff -r 7ab0527484c8 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c    Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/hvm/hvm.c    Tue Jun 05 04:35:27 2007 -0500
@@ -568,7 +568,7 @@ static int __hvm_copy(void *buf, paddr_t
         if ( dir )
         {
             memcpy(p, buf, count); /* dir == TRUE:  *to* guest */
-            mark_dirty(current->domain, mfn);
+            paging_mark_dirty(current->domain, mfn);
         }
         else
             memcpy(buf, p, count); /* dir == FALSE: *from guest */
diff -r 7ab0527484c8 xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c     Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/hvm/io.c     Tue Jun 05 04:35:45 2007 -0500
@@ -865,7 +865,7 @@ void hvm_io_assist(void)
     if ( (p->dir == IOREQ_READ) && p->data_is_ptr )
     {
         gmfn = get_mfn_from_gpfn(paging_gva_to_gfn(v, p->data));
-        mark_dirty(d, gmfn);
+        paging_mark_dirty(d, gmfn);
     }
 
  out:
diff -r 7ab0527484c8 xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c        Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c        Tue Jun 05 11:50:28 2007 -0500
@@ -1028,13 +1028,16 @@ int start_svm(struct cpuinfo_x86 *c)
 
 static int svm_do_nested_pgfault(paddr_t gpa, struct cpu_user_regs *regs)
 {
+    struct domain *d;
+
     if (mmio_space(gpa)) {
         handle_mmio(gpa);
         return 1;
     }
 
-    /* We should not reach here. Otherwise, P2M table is not correct.*/
-    return 0;
+    d = current->domain;
+    paging_mark_dirty(d, get_mfn_from_gpfn(gpa >> PAGE_SHIFT));
+    return p2m_set_flags(d, gpa, __PAGE_HYPERVISOR|_PAGE_USER);
 }
 
 static void svm_do_no_device_fault(struct vmcb_struct *vmcb)
diff -r 7ab0527484c8 xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm.c Tue Jun 05 04:34:56 2007 -0500
@@ -1556,7 +1556,7 @@ int alloc_page_type(struct page_info *pa
 
     /* A page table is dirtied when its type count becomes non-zero. */
     if ( likely(owner != NULL) )
-        mark_dirty(owner, page_to_mfn(page));
+        paging_mark_dirty(owner, page_to_mfn(page));
 
     switch ( type & PGT_type_mask )
     {
@@ -1602,7 +1602,7 @@ void free_page_type(struct page_info *pa
         if ( unlikely(paging_mode_enabled(owner)) )
         {
             /* A page table is dirtied when its type count becomes zero. */
-            mark_dirty(owner, page_to_mfn(page));
+            paging_mark_dirty(owner, page_to_mfn(page));
 
             if ( shadow_mode_refcounts(owner) )
                 return;
@@ -2057,7 +2057,7 @@ int do_mmuext_op(
             }
 
             /* A page is dirtied when its pin status is set. */
-            mark_dirty(d, mfn);
+            paging_mark_dirty(d, mfn);
            
             /* We can race domain destruction (domain_relinquish_resources). */
             if ( unlikely(this_cpu(percpu_mm_info).foreign != NULL) )
@@ -2089,7 +2089,7 @@ int do_mmuext_op(
                 put_page_and_type(page);
                 put_page(page);
                 /* A page is dirtied when its pin status is cleared. */
-                mark_dirty(d, mfn);
+                paging_mark_dirty(d, mfn);
             }
             else
             {
@@ -2424,7 +2424,7 @@ int do_mmu_update(
             set_gpfn_from_mfn(mfn, gpfn);
             okay = 1;
 
-            mark_dirty(FOREIGNDOM, mfn);
+            paging_mark_dirty(FOREIGNDOM, mfn);
 
             put_page(mfn_to_page(mfn));
             break;
@@ -3005,7 +3005,7 @@ long do_update_descriptor(u64 pa, u64 de
         break;
     }
 
-    mark_dirty(dom, mfn);
+    paging_mark_dirty(dom, mfn);
 
     /* All is good so make the update. */
     gdt_pent = map_domain_page(mfn);
diff -r 7ab0527484c8 xen/arch/x86/mm/hap/hap.c
--- a/xen/arch/x86/mm/hap/hap.c Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/hap/hap.c Tue Jun 05 16:37:53 2007 -0500
@@ -385,6 +385,56 @@ void hap_destroy_monitor_table(struct vc
 }
 
 /************************************************/
+/*             HAP LOG DIRTY SUPPORT            */
+/************************************************/
+int hap_log_dirty_enable(struct domain *d)
+{
+    int ret;
+
+    domain_pause(d);
+    hap_lock(d);
+
+    ret = paging_alloc_log_dirty_bitmap(d);
+    if ( ret != 0 )
+    {
+       paging_free_log_dirty_bitmap(d);
+       goto out;
+    }
+
+    /* turn on PG_log_dirty bit in paging mode */
+    d->arch.paging.mode |= PG_log_dirty;
+
+    /* mark physical memory as not writable */
+    p2m_set_flags_global(d, (_PAGE_PRESENT|_PAGE_USER));
+    flush_tlb_all_pge();
+
+ out:
+    hap_unlock(d);
+    domain_unpause(d);
+    
+    return ret;
+}
+
+int hap_log_dirty_disable(struct domain *d)
+{
+    domain_pause(d);
+    hap_lock(d);
+    if ( paging_mode_log_dirty(d) )
+       paging_free_log_dirty_bitmap(d);
+
+    /* turn off PG_log_dirty bit in paging mode */
+    d->arch.paging.mode &= ~PG_log_dirty;
+
+    /* recover P2M table to normal mode */
+    p2m_set_flags_global(d, __PAGE_HYPERVISOR|_PAGE_USER);
+
+    hap_unlock(d);
+    domain_unpause(d);
+
+    return 1;
+}
+
+/************************************************/
 /*          HAP DOMAIN LEVEL FUNCTIONS          */
 /************************************************/
 void hap_domain_init(struct domain *d)
@@ -498,12 +548,16 @@ int hap_domctl(struct domain *d, xen_dom
 
     HERE_I_AM;
 
-    if ( unlikely(d == current->domain) ) {
-        gdprintk(XENLOG_INFO, "Don't try to do a hap op on yourself!\n");
-        return -EINVAL;
-    }
-    
     switch ( sc->op ) {
+    case XEN_DOMCTL_SHADOW_OP_OFF:
+       if ( paging_mode_log_dirty(d) )
+            if ( (rc = hap_log_dirty_disable(d)) != 0 )
+                return rc;
+       return 0;
+
+    case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+       return hap_log_dirty_enable(d);
+
     case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION:
         hap_lock(d);
         rc = hap_set_allocation(d, sc->mb << (20 - PAGE_SHIFT), &preempted);
diff -r 7ab0527484c8 xen/arch/x86/mm/p2m.c
--- a/xen/arch/x86/mm/p2m.c     Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/p2m.c     Tue Jun 05 11:41:29 2007 -0500
@@ -169,7 +169,7 @@ p2m_next_level(struct domain *d, mfn_t *
 
 // Returns 0 on error (out of memory)
 static int
-set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+set_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, u32 l1e_flags)
 {
     // XXX -- this might be able to be faster iff current->domain == d
     mfn_t table_mfn = pagetable_get_mfn(d->arch.phys_table);
@@ -213,7 +213,7 @@ set_p2m_entry(struct domain *d, unsigned
         d->arch.p2m.max_mapped_pfn = gfn;
 
     if ( mfn_valid(mfn) )
-        entry_content = l1e_from_pfn(mfn_x(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
+        entry_content = l1e_from_pfn(mfn_x(mfn), l1e_flags);
     else
         entry_content = l1e_empty();
 
@@ -278,7 +278,7 @@ int p2m_alloc_table(struct domain *d,
         p2m_unlock(d);
         return -ENOMEM;
     }
-list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
+    list_add_tail(&p2m_top->list, &d->arch.p2m.pages);
 
     p2m_top->count_info = 1;
     p2m_top->u.inuse.type_info = 
@@ -297,8 +297,8 @@ list_add_tail(&p2m_top->list, &d->arch.p
  
     /* Initialise physmap tables for slot zero. Other code assumes this. */
     gfn = 0;
-mfn = _mfn(INVALID_MFN);
-    if ( !set_p2m_entry(d, gfn, mfn) )
+    mfn = _mfn(INVALID_MFN);
+    if ( !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
         goto error;
 
     for ( entry = d->page_list.next;
@@ -316,7 +316,7 @@ mfn = _mfn(INVALID_MFN);
             (gfn != 0x55555555L)
 #endif
              && gfn != INVALID_M2P_ENTRY
-             && !set_p2m_entry(d, gfn, mfn) )
+             && !set_p2m_entry(d, gfn, mfn, __PAGE_HYPERVISOR|_PAGE_USER) )
             goto error;
     }
 
@@ -626,7 +626,7 @@ p2m_remove_page(struct domain *d, unsign
     ASSERT(mfn_x(gfn_to_mfn(d, gfn)) == mfn);
     //ASSERT(mfn_to_gfn(d, mfn) == gfn);
 
-    set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+    set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
 }
 
@@ -659,7 +659,7 @@ guest_physmap_add_page(struct domain *d,
     omfn = gfn_to_mfn(d, gfn);
     if ( mfn_valid(omfn) )
     {
-        set_p2m_entry(d, gfn, _mfn(INVALID_MFN));
+        set_p2m_entry(d, gfn, _mfn(INVALID_MFN), __PAGE_HYPERVISOR|_PAGE_USER);
         set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
     }
 
@@ -685,13 +685,81 @@ guest_physmap_add_page(struct domain *d,
         }
     }
 
-    set_p2m_entry(d, gfn, _mfn(mfn));
+    set_p2m_entry(d, gfn, _mfn(mfn), __PAGE_HYPERVISOR|_PAGE_USER);
     set_gpfn_from_mfn(mfn, gfn);
 
     audit_p2m(d);
     p2m_unlock(d);
 }
 
+/* This function goes through P2M table and modify l1e flags of all pages. Note
+ * that  physical base address of l1e is intact. This function can be used for 
+ * special purpose, such as marking physical memory as Not-Writable for
+ * tracking dirty pages during live migration. 
+ */
+int p2m_set_flags_global(struct domain *d, u32 l1e_flags) 
+{
+    mfn_t mfn;
+    struct list_head *entry;
+    struct page_info *page;
+    unsigned long gfn;
+
+    p2m_lock(d);
+
+    if ( pagetable_get_pfn(d->arch.phys_table) == 0 )
+    {
+       P2M_ERROR("p2m table has not been allocated for this domain yet!\n");
+       p2m_unlock(d);
+       return -EINVAL;
+    }
+
+    for ( entry = d->page_list.next;
+          entry != &d->page_list;
+          entry = entry->next )
+    {
+        page   = list_entry(entry, struct page_info, list);
+        mfn = page_to_mfn(page);
+        gfn = get_gpfn_from_mfn(mfn_x(mfn));
+        if (
+#ifdef __x86_64__
+            (gfn != 0x5555555555555555L)
+#else
+            (gfn != 0x55555555L)
+#endif
+             && gfn != INVALID_M2P_ENTRY
+             && !set_p2m_entry(d, gfn, mfn, l1e_flags) )
+            goto error;
+    }
+
+    p2m_unlock(d);
+    return 0;
+
+ error:
+    P2M_PRINTK("failed to change l1e flags of p2m table, gfn=%05lx, mfn=%"
+               PRI_mfn "\n", gfn, mfn_x(mfn));
+    p2m_unlock(d);
+    return -ENOMEM;
+}
+
+/* This function goes through p2M table and modifies l1e flags of a specific 
+ * gpa.
+ */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 l1e_flags) 
+{
+    unsigned long gfn;
+    mfn_t mfn;
+
+    p2m_lock(d);
+
+    gfn = gpa >> PAGE_SHIFT;
+    mfn = gfn_to_mfn(d, gfn);
+    if ( mfn_valid(mfn) )
+        set_p2m_entry(d, gfn, mfn, l1e_flags);
+    
+    p2m_unlock(d);
+
+    return 1;
+}
 
 /*
  * Local variables:
diff -r 7ab0527484c8 xen/arch/x86/mm/paging.c
--- a/xen/arch/x86/mm/paging.c  Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/paging.c  Tue Jun 05 17:20:34 2007 -0500
@@ -25,6 +25,15 @@
 #include <asm/shadow.h>
 #include <asm/p2m.h>
 #include <asm/hap.h>
+#include <asm/guest_access.h>
+
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(_m) (frame_table + mfn_x(_m))
+#undef mfn_valid
+#define mfn_valid(_mfn) (mfn_x(_mfn) < max_page)
+#undef page_to_mfn
+#define page_to_mfn(_pg) (_mfn((_pg) - frame_table))
 
 /* Xen command-line option to enable hardware-assisted paging */
 int opt_hap_enabled;
@@ -42,10 +51,200 @@ boolean_param("hap", opt_hap_enabled);
     } while (0)
 
 
+/* log dirty mode lock */
+#define log_dirty_lock_init(_d)                                   \
+    do {                                                          \
+        spin_lock_init(&(_d)->arch.paging.log_dirty_lock);        \
+        (_d)->arch.paging.log_dirty_locker = -1;                  \
+        (_d)->arch.paging.log_dirty_locker_function = "nobody";   \
+    } while (0)
+
+#define log_dirty_lock(_d)                                                   \
+    do {                                                                     \
+        if (unlikely((_d)->arch.paging.log_dirty_locker==current->processor))\
+        {                                                                    \
+            printk("Error: paging log dirty lock held by %s\n",              \
+                   (_d)->arch.paging.log_dirty_locker_function);             \
+            BUG();                                                           \
+        }                                                                    \
+        spin_lock(&(_d)->arch.paging.log_dirty_lock);                        \
+        ASSERT((_d)->arch.paging.log_dirty_locker == -1);                    \
+        (_d)->arch.paging.log_dirty_locker = current->processor;             \
+        (_d)->arch.paging.log_dirty_locker_function = __func__;              \
+    } while (0)
+
+#define log_dirty_unlock(_d)                                              \
+    do {                                                                  \
+        ASSERT((_d)->arch.paging.log_dirty_locker == current->processor); \
+        (_d)->arch.paging.log_dirty_locker = -1;                          \
+        (_d)->arch.paging.log_dirty_locker_function = "nobody";           \
+        spin_unlock(&(_d)->arch.paging.log_dirty_lock);                   \
+    } while (0)
+
+
+int paging_alloc_log_dirty_bitmap(struct domain *d)
+{
+    ASSERT(d->arch.paging.dirty_bitmap == NULL);
+    d->arch.paging.dirty_bitmap_size =
+        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
+    d->arch.paging.dirty_bitmap =
+        xmalloc_array(unsigned long,
+                      d->arch.paging.dirty_bitmap_size / BITS_PER_LONG);
+    if ( d->arch.paging.dirty_bitmap == NULL )
+    {
+        d->arch.paging.dirty_bitmap_size = 0;
+        return -ENOMEM;
+    }
+    memset(d->arch.paging.dirty_bitmap, 0,
+           d->arch.paging.dirty_bitmap_size/8);
+
+    return 0;
+}
+
+void paging_free_log_dirty_bitmap(struct domain *d)
+{
+    d->arch.paging.dirty_bitmap_size = 0;
+    if ( d->arch.paging.dirty_bitmap )
+    {
+        xfree(d->arch.paging.dirty_bitmap);
+        d->arch.paging.dirty_bitmap = NULL;
+    }
+}
+
+/* Mark a page as dirty */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn)
+{
+    unsigned long pfn;
+    mfn_t gmfn;
+
+    gmfn = _mfn(guest_mfn);
+
+    if ( !paging_mode_log_dirty(d) || !mfn_valid(gmfn) )
+        return;
+
+    log_dirty_lock(d);
+
+    ASSERT(d->arch.paging.dirty_bitmap != NULL);
+
+    /* We /really/ mean PFN here, even for non-translated guests. */
+    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
+
+    /*
+     * Values with the MSB set denote MFNs that aren't really part of the 
+     * domain's pseudo-physical memory map (e.g., the shared info frame).
+     * Nothing to do here...
+     */
+    if ( unlikely(!VALID_M2P(pfn)) )
+        return;
+
+    /* N.B. Can use non-atomic TAS because protected by shadow_lock. */
+    if ( likely(pfn < d->arch.paging.dirty_bitmap_size) ) 
+    { 
+        if ( !__test_and_set_bit(pfn, d->arch.paging.dirty_bitmap) )
+        {
+            PAGING_DEBUG(LOGDIRTY, 
+                          "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
+                          mfn_x(gmfn), pfn, d->domain_id);
+            d->arch.paging.dirty_count++;
+        }
+    }
+    else
+    {
+        PAGING_PRINTK("mark_dirty OOR! "
+                       "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
+                       "owner=%d c=%08x t=%" PRtype_info "\n",
+                       mfn_x(gmfn), 
+                       pfn, 
+                       d->arch.paging.dirty_bitmap_size,
+                       d->domain_id,
+                       (page_get_owner(mfn_to_page(gmfn))
+                        ? page_get_owner(mfn_to_page(gmfn))->domain_id
+                        : -1),
+                       mfn_to_page(gmfn)->count_info, 
+                       mfn_to_page(gmfn)->u.inuse.type_info);
+    }
+
+    log_dirty_unlock(d);
+}
+
+/* Read a domain's log-dirty bitmap and stats.  If the operation is a CLEAN, 
+ * clear the bitmap and stats as well. */
+int paging_log_dirty_op(struct domain *d, struct xen_domctl_shadow_op *sc)
+{
+    int i, rv = 0, clean = 0, peek = 1;
+
+    domain_pause(d);
+    log_dirty_lock(d);
+
+    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
+
+    PAGING_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
+                  (clean) ? "clean" : "peek",
+                  d->domain_id,
+                  d->arch.paging.fault_count, 
+                  d->arch.paging.dirty_count);
+
+    sc->stats.fault_count = d->arch.paging.fault_count;
+    sc->stats.dirty_count = d->arch.paging.dirty_count;
+
+    if ( clean )
+    {
+       /* Further operations are required for XEN_DOMCTL_SHADOW_OP_CLEAN. We
+        * dispatch to next-level log_dirty functions based on paging mode */
+       if ( !paging_mode_hap(d) )
+           shadow_log_dirty_op_clean(d);
+
+        d->arch.paging.fault_count = 0;
+        d->arch.paging.dirty_count = 0;
+    }
+
+    if ( guest_handle_is_null(sc->dirty_bitmap) )
+        /* caller may have wanted just to clean the state or access stats. */
+        peek = 0;
+
+    if ( (peek || clean) && (d->arch.paging.dirty_bitmap == NULL) )
+    {
+        rv = -EINVAL; /* perhaps should be ENOMEM? */
+        goto out;
+    }
+ 
+    if ( sc->pages > d->arch.paging.dirty_bitmap_size )
+        sc->pages = d->arch.paging.dirty_bitmap_size;
+
+#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
+    for ( i = 0; i < sc->pages; i += CHUNK )
+    {
+        int bytes = ((((sc->pages - i) > CHUNK)
+                      ? CHUNK
+                      : (sc->pages - i)) + 7) / 8;
+
+        if ( likely(peek) )
+        {
+            if ( copy_to_guest_offset(
+                sc->dirty_bitmap, i/8,
+                (uint8_t *)d->arch.paging.dirty_bitmap + (i/8), bytes) )
+            {
+                rv = -EFAULT;
+                goto out;
+            }
+        }
+
+        if ( clean )
+            memset((uint8_t *)d->arch.paging.dirty_bitmap + (i/8), 0, bytes);
+    }
+#undef CHUNK
+
+ out:
+    log_dirty_unlock(d);
+    domain_unpause(d);
+    return rv;
+}
+
 /* Domain paging struct initialization. */
 void paging_domain_init(struct domain *d)
 {
     p2m_init(d);
+    log_dirty_lock_init(d);
     shadow_domain_init(d);
 
     if ( opt_hap_enabled && is_hvm_domain(d) )
@@ -65,11 +264,40 @@ int paging_domctl(struct domain *d, xen_
 int paging_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
                   XEN_GUEST_HANDLE(void) u_domctl)
 {
-    /* Here, dispatch domctl to the appropriate paging code */
-    if ( opt_hap_enabled && is_hvm_domain(d) )
-        return hap_domctl(d, sc, u_domctl);
-    else
-        return shadow_domctl(d, sc, u_domctl);
+    if ( unlikely(d == current->domain) )
+    {
+        gdprintk(XENLOG_INFO, "Dom %u tried to do a paging op on itself.\n",
+                 d->domain_id);
+        return -EINVAL;
+    }
+
+    if ( unlikely(d->is_dying) )
+    {
+        gdprintk(XENLOG_INFO, "Ignoring paging op on dying domain %u\n",
+                 d->domain_id);
+        return 0;
+    }
+
+    if ( unlikely(d->vcpu[0] == NULL) )
+    {
+        PAGING_ERROR("Paging op on a domain (%u) with no vcpus\n",
+                     d->domain_id);
+        return -EINVAL;
+    }
+
+    switch ( sc->op )
+    {
+    case XEN_DOMCTL_SHADOW_OP_CLEAN:
+    case XEN_DOMCTL_SHADOW_OP_PEEK:
+        return paging_log_dirty_op(d, sc);
+       
+    default:
+       /* Dispatch other domctl operations to the appropriate paging code */
+       if ( opt_hap_enabled && is_hvm_domain(d) )
+           return hap_domctl(d, sc, u_domctl);
+       else
+           return shadow_domctl(d, sc, u_domctl);
+    }
 }
 
 /* Call when destroying a domain */
diff -r 7ab0527484c8 xen/arch/x86/mm/shadow/common.c
--- a/xen/arch/x86/mm/shadow/common.c   Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/shadow/common.c   Tue Jun 05 17:20:34 2007 -0500
@@ -87,8 +87,6 @@ __initcall(shadow_audit_key_init);
 __initcall(shadow_audit_key_init);
 #endif /* SHADOW_AUDIT */
 
-static void sh_free_log_dirty_bitmap(struct domain *d);
-
 int _shadow_mode_refcounts(struct domain *d)
 {
     return shadow_mode_refcounts(d);
@@ -541,7 +539,7 @@ sh_validate_guest_entry(struct vcpu *v, 
     int result = 0;
     struct page_info *page = mfn_to_page(gmfn);
 
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     
     // Determine which types of shadows are affected, and update each.
     //
@@ -2565,7 +2563,7 @@ void shadow_teardown(struct domain *d)
         if (d->arch.paging.shadow.hash_table) 
             shadow_hash_teardown(d);
         /* Release the log-dirty bitmap of dirtied pages */
-        sh_free_log_dirty_bitmap(d);
+        paging_free_log_dirty_bitmap(d);
         /* Should not have any more memory held */
         SHADOW_PRINTK("teardown done."
                        "  Shadow pages total = %u, free = %u, p2m=%u\n",
@@ -2724,37 +2722,6 @@ static int shadow_test_disable(struct do
     return ret;
 }
 
-static int
-sh_alloc_log_dirty_bitmap(struct domain *d)
-{
-    ASSERT(d->arch.paging.shadow.dirty_bitmap == NULL);
-    d->arch.paging.shadow.dirty_bitmap_size =
-        (domain_get_maximum_gpfn(d) + BITS_PER_LONG) & ~(BITS_PER_LONG - 1);
-    d->arch.paging.shadow.dirty_bitmap =
-        xmalloc_array(unsigned long,
-                      d->arch.paging.shadow.dirty_bitmap_size / BITS_PER_LONG);
-    if ( d->arch.paging.shadow.dirty_bitmap == NULL )
-    {
-        d->arch.paging.shadow.dirty_bitmap_size = 0;
-        return -ENOMEM;
-    }
-    memset(d->arch.paging.shadow.dirty_bitmap, 0,
-           d->arch.paging.shadow.dirty_bitmap_size/8);
-
-    return 0;
-}
-
-static void
-sh_free_log_dirty_bitmap(struct domain *d)
-{
-    d->arch.paging.shadow.dirty_bitmap_size = 0;
-    if ( d->arch.paging.shadow.dirty_bitmap )
-    {
-        xfree(d->arch.paging.shadow.dirty_bitmap);
-        d->arch.paging.shadow.dirty_bitmap = NULL;
-    }
-}
-
 static int shadow_log_dirty_enable(struct domain *d)
 {
     int ret;
@@ -2784,16 +2751,16 @@ static int shadow_log_dirty_enable(struc
         d->arch.paging.shadow.opt_flags = SHOPT_LINUX_L3_TOPLEVEL;
 #endif
 
-    ret = sh_alloc_log_dirty_bitmap(d);
+    ret = paging_alloc_log_dirty_bitmap(d);
     if ( ret != 0 )
     {
-        sh_free_log_dirty_bitmap(d);
+        paging_free_log_dirty_bitmap(d);
         goto out;
     }
 
     ret = shadow_one_bit_enable(d, PG_log_dirty);
     if ( ret != 0 )
-        sh_free_log_dirty_bitmap(d);
+        paging_free_log_dirty_bitmap(d);
 
  out:
     shadow_unlock(d);
@@ -2809,11 +2776,21 @@ static int shadow_log_dirty_disable(stru
     shadow_lock(d);
     ret = shadow_one_bit_disable(d, PG_log_dirty);
     if ( !shadow_mode_log_dirty(d) )
-        sh_free_log_dirty_bitmap(d);
+        paging_free_log_dirty_bitmap(d);
     shadow_unlock(d);
     domain_unpause(d);
 
     return ret;
+}
+
+void shadow_log_dirty_op_clean(struct domain *d) 
+{
+    /* Need to revoke write access to the domain's pages again.
+     * In future, we'll have a less heavy-handed approach to this,
+     * but for now, we just unshadow everything except Xen. */
+    shadow_lock(d);
+    shadow_blow_tables(d);
+    shadow_unlock(d);
 }
 
 /**************************************************************************/
@@ -2892,150 +2869,6 @@ void shadow_convert_to_log_dirty(struct 
     BUG();
 }
 
-
-/* Read a domain's log-dirty bitmap and stats.  
- * If the operation is a CLEAN, clear the bitmap and stats as well. */
-static int shadow_log_dirty_op(
-    struct domain *d, struct xen_domctl_shadow_op *sc)
-{
-    int i, rv = 0, clean = 0, peek = 1;
-
-    domain_pause(d);
-    shadow_lock(d);
-
-    clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN);
-
-    SHADOW_DEBUG(LOGDIRTY, "log-dirty %s: dom %u faults=%u dirty=%u\n", 
-                  (clean) ? "clean" : "peek",
-                  d->domain_id,
-                  d->arch.paging.shadow.fault_count, 
-                  d->arch.paging.shadow.dirty_count);
-
-    sc->stats.fault_count = d->arch.paging.shadow.fault_count;
-    sc->stats.dirty_count = d->arch.paging.shadow.dirty_count;
-
-    if ( clean )
-    {
-        /* Need to revoke write access to the domain's pages again.
-         * In future, we'll have a less heavy-handed approach to this,
-         * but for now, we just unshadow everything except Xen. */
-        shadow_blow_tables(d);
-
-        d->arch.paging.shadow.fault_count = 0;
-        d->arch.paging.shadow.dirty_count = 0;
-    }
-
-    if ( guest_handle_is_null(sc->dirty_bitmap) )
-        /* caller may have wanted just to clean the state or access stats. */
-        peek = 0;
-
-    if ( (peek || clean) && (d->arch.paging.shadow.dirty_bitmap == NULL) )
-    {
-        rv = -EINVAL; /* perhaps should be ENOMEM? */
-        goto out;
-    }
- 
-    if ( sc->pages > d->arch.paging.shadow.dirty_bitmap_size )
-        sc->pages = d->arch.paging.shadow.dirty_bitmap_size;
-
-#define CHUNK (8*1024) /* Transfer and clear in 1kB chunks for L1 cache. */
-    for ( i = 0; i < sc->pages; i += CHUNK )
-    {
-        int bytes = ((((sc->pages - i) > CHUNK)
-                      ? CHUNK
-                      : (sc->pages - i)) + 7) / 8;
-
-        if ( likely(peek) )
-        {
-            if ( copy_to_guest_offset(
-                sc->dirty_bitmap, i/8,
-                (uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), bytes) )
-            {
-                rv = -EFAULT;
-                goto out;
-            }
-        }
-
-        if ( clean )
-            memset((uint8_t *)d->arch.paging.shadow.dirty_bitmap + (i/8), 0, 
bytes);
-    }
-#undef CHUNK
-
- out:
-    shadow_unlock(d);
-    domain_unpause(d);
-    return rv;
-}
-
-
-/* Mark a page as dirty */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn)
-{
-    unsigned long pfn;
-    int do_locking;
-
-    if ( !shadow_mode_log_dirty(d) || !mfn_valid(gmfn) )
-        return;
-
-    /* Although this is an externally visible function, we do not know
-     * whether the shadow lock will be held when it is called (since it
-     * can be called from __hvm_copy during emulation).
-     * If the lock isn't held, take it for the duration of the call. */
-    do_locking = !shadow_locked_by_me(d);
-    if ( do_locking ) 
-    { 
-        shadow_lock(d);
-        /* Check the mode again with the lock held */ 
-        if ( unlikely(!shadow_mode_log_dirty(d)) )
-        {
-            shadow_unlock(d);
-            return;
-        }
-    }
-
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
-
-    /* We /really/ mean PFN here, even for non-translated guests. */
-    pfn = get_gpfn_from_mfn(mfn_x(gmfn));
-
-    /*
-     * Values with the MSB set denote MFNs that aren't really part of the 
-     * domain's pseudo-physical memory map (e.g., the shared info frame).
-     * Nothing to do here...
-     */
-    if ( unlikely(!VALID_M2P(pfn)) )
-        return;
-
-    /* N.B. Can use non-atomic TAS because protected by shadow_lock. */
-    if ( likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) ) 
-    { 
-        if ( !__test_and_set_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
-        {
-            SHADOW_DEBUG(LOGDIRTY, 
-                          "marked mfn %" PRI_mfn " (pfn=%lx), dom %d\n",
-                          mfn_x(gmfn), pfn, d->domain_id);
-            d->arch.paging.shadow.dirty_count++;
-        }
-    }
-    else
-    {
-        SHADOW_PRINTK("mark_dirty OOR! "
-                       "mfn=%" PRI_mfn " pfn=%lx max=%x (dom %d)\n"
-                       "owner=%d c=%08x t=%" PRtype_info "\n",
-                       mfn_x(gmfn), 
-                       pfn, 
-                       d->arch.paging.shadow.dirty_bitmap_size,
-                       d->domain_id,
-                       (page_get_owner(mfn_to_page(gmfn))
-                        ? page_get_owner(mfn_to_page(gmfn))->domain_id
-                        : -1),
-                       mfn_to_page(gmfn)->count_info, 
-                       mfn_to_page(gmfn)->u.inuse.type_info);
-    }
-
-    if ( do_locking ) shadow_unlock(d);
-}
-
 /**************************************************************************/
 /* Shadow-control XEN_DOMCTL dispatcher */
 
@@ -3044,27 +2877,6 @@ int shadow_domctl(struct domain *d,
                   XEN_GUEST_HANDLE(void) u_domctl)
 {
     int rc, preempted = 0;
-
-    if ( unlikely(d == current->domain) )
-    {
-        gdprintk(XENLOG_INFO, "Dom %u tried to do a shadow op on itself.\n",
-                 d->domain_id);
-        return -EINVAL;
-    }
-
-    if ( unlikely(d->is_dying) )
-    {
-        gdprintk(XENLOG_INFO, "Ignoring shadow op on dying domain %u\n",
-                 d->domain_id);
-        return 0;
-    }
-
-    if ( unlikely(d->vcpu[0] == NULL) )
-    {
-        SHADOW_ERROR("Shadow op on a domain (%u) with no vcpus\n",
-                     d->domain_id);
-        return -EINVAL;
-    }
 
     switch ( sc->op )
     {
@@ -3085,10 +2897,6 @@ int shadow_domctl(struct domain *d,
 
     case XEN_DOMCTL_SHADOW_OP_ENABLE_TRANSLATE:
         return shadow_enable(d, PG_refcounts|PG_translate);
-
-    case XEN_DOMCTL_SHADOW_OP_CLEAN:
-    case XEN_DOMCTL_SHADOW_OP_PEEK:
-        return shadow_log_dirty_op(d, sc);
 
     case XEN_DOMCTL_SHADOW_OP_ENABLE:
         if ( sc->mode & XEN_DOMCTL_SHADOW_ENABLE_LOG_DIRTY )
diff -r 7ab0527484c8 xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c    Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/shadow/multi.c    Tue Jun 05 04:38:26 2007 -0500
@@ -457,7 +457,7 @@ static u32 guest_set_ad_bits(struct vcpu
     }
 
     /* Set the bit(s) */
-    sh_mark_dirty(v->domain, gmfn);
+    paging_mark_dirty(v->domain, mfn_x(gmfn));
     SHADOW_DEBUG(A_AND_D, "gfn = %" SH_PRI_gfn ", "
                  "old flags = %#x, new flags = %#x\n", 
                  gfn_x(guest_l1e_get_gfn(*ep)), guest_l1e_get_flags(*ep), 
@@ -717,7 +717,7 @@ _sh_propagate(struct vcpu *v,
     if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) )
     {
         if ( ft & FETCH_TYPE_WRITE ) 
-            sh_mark_dirty(d, target_mfn);
+            paging_mark_dirty(d, mfn_x(target_mfn));
         else if ( !sh_mfn_is_dirty(d, target_mfn) )
             sflags &= ~_PAGE_RW;
     }
@@ -2856,7 +2856,7 @@ static int sh_page_fault(struct vcpu *v,
     }
 
     perfc_incr(shadow_fault_fixed);
-    d->arch.paging.shadow.fault_count++;
+    d->arch.paging.fault_count++;
     reset_early_unshadow(v);
 
  done:
@@ -4058,7 +4058,7 @@ sh_x86_emulate_write(struct vcpu *v, uns
     else
         reset_early_unshadow(v);
     
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4114,7 +4114,7 @@ sh_x86_emulate_cmpxchg(struct vcpu *v, u
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
@@ -4158,7 +4158,7 @@ sh_x86_emulate_cmpxchg8b(struct vcpu *v,
     else
         reset_early_unshadow(v);
 
-    sh_mark_dirty(v->domain, mfn);
+    paging_mark_dirty(v->domain, mfn_x(mfn));
 
     sh_unmap_domain_page(addr);
     shadow_audit_tables(v);
diff -r 7ab0527484c8 xen/arch/x86/mm/shadow/private.h
--- a/xen/arch/x86/mm/shadow/private.h  Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/arch/x86/mm/shadow/private.h  Mon Jun 04 17:56:23 2007 -0500
@@ -496,13 +496,13 @@ sh_mfn_is_dirty(struct domain *d, mfn_t 
 {
     unsigned long pfn;
     ASSERT(shadow_mode_log_dirty(d));
-    ASSERT(d->arch.paging.shadow.dirty_bitmap != NULL);
+    ASSERT(d->arch.paging.dirty_bitmap != NULL);
 
     /* We /really/ mean PFN here, even for non-translated guests. */
     pfn = get_gpfn_from_mfn(mfn_x(gmfn));
     if ( likely(VALID_M2P(pfn))
-         && likely(pfn < d->arch.paging.shadow.dirty_bitmap_size) 
-         && test_bit(pfn, d->arch.paging.shadow.dirty_bitmap) )
+         && likely(pfn < d->arch.paging.dirty_bitmap_size) 
+         && test_bit(pfn, d->arch.paging.dirty_bitmap) )
         return 1;
 
     return 0;
diff -r 7ab0527484c8 xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h      Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/include/asm-x86/domain.h      Tue Jun 05 04:21:38 2007 -0500
@@ -92,14 +92,6 @@ struct shadow_domain {
 
     /* Fast MMIO path heuristic */
     int has_fast_mmio_entries;
-
-    /* Shadow log-dirty bitmap */
-    unsigned long *dirty_bitmap;
-    unsigned int dirty_bitmap_size;  /* in pages, bit per page */
-
-    /* Shadow log-dirty mode stats */
-    unsigned int fault_count;
-    unsigned int dirty_count;
 };
 
 struct shadow_vcpu {
@@ -164,6 +156,19 @@ struct paging_domain {
 
     /* Other paging assistance code will have structs here */
     struct hap_domain    hap;
+
+    /* log-dirty lock */
+    spinlock_t           log_dirty_lock;
+    int                  log_dirty_locker; /* processor which holds the lock */
+    const char          *log_dirty_locker_function; /* func that took it */
+
+    /* log-dirty bitmap */
+    unsigned long *dirty_bitmap;
+    unsigned int dirty_bitmap_size;  /* in pages, bit per page */
+
+    /* log-dirty mode stats */
+    unsigned int fault_count;
+    unsigned int dirty_count;
 };
 
 struct paging_vcpu {
diff -r 7ab0527484c8 xen/include/asm-x86/grant_table.h
--- a/xen/include/asm-x86/grant_table.h Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/include/asm-x86/grant_table.h Tue Jun 05 04:33:38 2007 -0500
@@ -31,7 +31,7 @@ int replace_grant_host_mapping(
 #define gnttab_shared_gmfn(d, t, i)                     \
     (mfn_to_gmfn(d, gnttab_shared_mfn(d, t, i)))
 
-#define gnttab_mark_dirty(d, f) mark_dirty((d), (f))
+#define gnttab_mark_dirty(d, f) paging_mark_dirty((d), (f))
 
 static inline void gnttab_clear_flag(unsigned long nr, uint16_t *addr)
 {
diff -r 7ab0527484c8 xen/include/asm-x86/p2m.h
--- a/xen/include/asm-x86/p2m.h Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/include/asm-x86/p2m.h Tue Jun 05 11:42:54 2007 -0500
@@ -129,6 +129,11 @@ void guest_physmap_remove_page(struct do
 void guest_physmap_remove_page(struct domain *d, unsigned long gfn,
                                unsigned long mfn);
 
+/* Configure l1e flags of P2M table */
+int p2m_set_flags_global(struct domain *d, u32 flags);
+
+/* Set P2M l1e flags of a specific page */
+int p2m_set_flags(struct domain *d, paddr_t gpa, u32 flags);
 
 #endif /* _XEN_P2M_H */
 
diff -r 7ab0527484c8 xen/include/asm-x86/paging.h
--- a/xen/include/asm-x86/paging.h      Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/include/asm-x86/paging.h      Tue Jun 05 04:55:23 2007 -0500
@@ -63,6 +63,8 @@
 #define paging_mode_translate(_d) ((_d)->arch.paging.mode & PG_translate)
 #define paging_mode_external(_d)  ((_d)->arch.paging.mode & PG_external)
 
+/* flags used for paging debug */
+#define PAGING_DEBUG_LOGDIRTY 0
 /******************************************************************************
  * The equivalent for a particular vcpu of a shadowed domain. */
 
@@ -164,6 +166,14 @@ void paging_final_teardown(struct domain
  * creation. */
 int paging_enable(struct domain *d, u32 mode);
 
+/* allocate memory resource for log dirty */
+int paging_alloc_log_dirty_bitmap(struct domain *d);
+
+/* free memory resource for log dirty */
+void paging_free_log_dirty_bitmap(struct domain *d);
+
+/* mark a page as dirty page */
+void paging_mark_dirty(struct domain *d, unsigned long guest_mfn);
 
 /* Page fault handler
  * Called from pagefault handler in Xen, and from the HVM trap handlers
diff -r 7ab0527484c8 xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h      Mon Jun 04 16:46:03 2007 -0500
+++ b/xen/include/asm-x86/shadow.h      Tue Jun 05 09:58:00 2007 -0500
@@ -75,22 +75,13 @@ void shadow_teardown(struct domain *d);
 /* Call once all of the references to the domain have gone away */
 void shadow_final_teardown(struct domain *d);
 
-/* Mark a page as dirty in the log-dirty bitmap: called when Xen 
- * makes changes to guest memory on its behalf. */
-void sh_mark_dirty(struct domain *d, mfn_t gmfn);
-/* Cleaner version so we don't pepper shadow_mode tests all over the place */
-static inline void mark_dirty(struct domain *d, unsigned long gmfn)
-{
-    if ( unlikely(shadow_mode_log_dirty(d)) )
-        /* See the comment about locking in sh_mark_dirty */
-        sh_mark_dirty(d, _mfn(gmfn));
-}
-
 /* Update all the things that are derived from the guest's CR0/CR3/CR4.
  * Called to initialize paging structures if the paging mode
  * has changed, and when bringing up a VCPU for the first time. */
 void shadow_update_paging_modes(struct vcpu *v);
 
+/* handle log_dirty CLEAN operation. */
+void shadow_log_dirty_op_clean(struct domain *d);
 
 /* Remove all mappings of the guest page from the shadows. 
  * This is called from common code.  It does not flush TLBs. */
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.