[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] CONFIG_SCRUB_DEBUG=y + arm64 + livepatch = Xen BUG at page_alloc.c:738



On Thu, Sep 14, 2017 at 05:39:23PM -0400, Boris Ostrovsky wrote:
> On 09/14/2017 05:26 PM, Konrad Rzeszutek Wilk wrote:
> > On Wed, Sep 13, 2017 at 02:49:41PM -0400, Boris Ostrovsky wrote:
> >> On 09/13/2017 02:25 PM, Julien Grall wrote:
> >>> Hi,
> >>>
> >>> On 09/13/2017 07:05 PM, Boris Ostrovsky wrote:
> >>>> On 09/13/2017 11:32 AM, Konrad Rzeszutek Wilk wrote:
> >>>> Well, that's not a fix. This eliminates the case that something in
> >>>> ARM-specific code (which I haven't tested) accidentally clears
> >>>> _PGC_need_scrub.
> >>>>
> >>>> OK, I think I know what the problem is. You are using
> >>>> CONFIG_SEPARATE_XENHEAP, are you?
> >>> It seems the bug appear on Arm64, so CONFIG_SEPARATE_XENHEAP is not set.
> >>>
> >>> Note that Arm32 is using separate heap.
> >>
> >> For separate heap we will need
> >>
> >>
> >> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> >> index b5243fc..9f62ea2 100644
> >> --- a/xen/common/page_alloc.c
> >> +++ b/xen/common/page_alloc.c
> >> @@ -2059,7 +2059,7 @@ void free_xenheap_pages(void *v, unsigned int order)
> >>
> >>      memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
> >>
> >> -    free_heap_pages(virt_to_page(v), order, false);
> >> +    free_heap_pages(virt_to_page(v), order, scrub_debug);
> >>  }
> >>
> >>  #else
> >>
> >>
> >> If that doesn't help then there are two cases where free_heap_pages is
> >> called with 'false' --- one in alloc_domheap_pages() and the other in
> >> online_page().
> >>
> >> Setting one and then the other would further narrow it down.
> > It went further. See the serial log:
> 
> Hmm. As Julien said, this is ARM64 so this patch should not have any effect.
> 
> Have you tried flipping false to true in the two alloc_domheap_pages()
> invocations that I mentioned?

Yeah, it didn't help. But I decided during a certain call to debug this.


@@ -1705,6 +1711,7 @@ static void init_heap_pages(
 {
     unsigned long i;
 
+    printk("%s: 0x%lx -> 0x%lx %s\n", __func__, page_to_mfn(pg), 
page_to_mfn(pg) + nr_pages, scrub_debug ? "scrub" : "");
     for ( i = 0; i < nr_pages; i++ )
     {
         unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
@@ -1000,7 +1001,12 @@ if ( memflags & MEMF_debug ) {
                 spin_unlock(&heap_lock);
             }
             else if ( !(memflags & MEMF_no_scrub) )
+            {
+
+       printk("%s:%d %d scrub mfn=0%lx\n", __func__, __LINE__, i, 
page_to_mfn(&pg[i]));
+
                 check_one_page(&pg[i]);
+               }
         }
 
         if ( dirty_cnt )
@@ -1836,6 +1843,7 @@ static void __init smp_scrub_heap_pages(void *data)
     else
         end = start + chunk_size;
 
+    printk("CPU%d: MFN=0x%lx -> 0x%lx\n", cpu, start, end);
     for ( mfn = start; mfn < end; mfn++ )
     {
         pg = mfn_to_page(mfn);

Shows:

(XEN) Loading dom0 DTB to 0x0000000017e00000-0x0000000017e08265
(XEN) init_domheap_pages: 0xb87b1->0xb87bc
(XEN) init_heap_pages: 0xb87b1 -> 0xb87bc
(XEN) init_domheap_pages: 0xb88f1->0xb98ae
(XEN) init_heap_pages: 0xb88f1 -> 0xb98ae       <- so the memory is from here

(XEN) Scrubbing Free RAM on 1 nodes using 8 CPUs
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Scrubbing Free RAM on 1 nodes using 8 CPUs
(XEN) CPU0: MFN=0x0 -> 0x8000
(XEN) CPU6: MFN=0x6a12e -> 0x7212e
(XEN) CPU5: MFN=0x58651 -> 0x60651
(XEN) CPU2: MFN=0x235ba -> 0x2b5ba
(XEN) CPU1: MFN=0x11add -> 0x19add
(XEN) CPU3: MFN=0x35097 -> 0x3d097
(XEN) CPU4: MFN=0x46b74 -> 0x4eb74
(XEN) CPU7: MFN=0x7bc0b -> 0x83c0b
(XEN) .(XEN) CPU6: MFN=0x7212e -> 0x7a12e
(XEN) CPU5: MFN=0x60651 -> 0x68651
(XEN) CPU4: MFN=0x4eb74 -> 0x56b74
(XEN) CPU1: MFN=0x19add -> 0x21add
CPU0: MFN=0x8000 -> 0x10000
(XEN) CPU7: MFN=0x83c0b -> 0x8bc0b
(XEN) CPU2: MFN=0x2b5ba -> 0x335ba
(XEN) CPU3: MFN=0x3d097 -> 0x45097
(XEN) .(XEN) CPU1: MFN=0x21add -> 0x235ba
(XEN) CPU2: MFN=0x335ba -> 0x35097
CPU0: MFN=0x10000 -> 0x11add
(XEN) CPU3: MFN=0x45097 -> 0x46b74
(XEN) CPU6: MFN=0x7a12e -> 0x7bc0b
(XEN) CPU4: MFN=0x56b74 -> 0x58651
(XEN) CPU5: MFN=0x68651 -> 0x6a12e
(XEN) CPU7: MFN=0x8bc0b -> 0x8d6ea
(XEN) .done.
..snip..

(XEN) alloc_heap_pages:1006 0 scrub mfn=0b98ab
(XEN) Xen BUG at page_alloc.c:738

So in other words, it looks like scrub_heap_pages is somehow not
including this MFN.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.