WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops do

On Mon, Jun 08, 2009 at 08:05:43PM +0300, Pasi Kärkkäinen wrote:
> On Mon, Jun 08, 2009 at 07:21:46PM +0300, Pasi Kärkkäinen wrote:
> > On Mon, Jun 08, 2009 at 05:17:45PM +0100, Ian Campbell wrote:
> > > On Mon, 2009-06-08 at 12:13 -0400, Pasi Kärkkäinen wrote:
> > > > On Mon, Jun 08, 2009 at 05:00:58PM +0100, Ian Campbell wrote:
> > > > > On Mon, 2009-06-08 at 11:45 -0400, Ian Campbell wrote:
> > > > > > 
> > > > > > > L4 at e1822000 is pinned contains L2 at e1977228 which points at 
> > > > > > > an
> > > > > > L1
> > > > > > > which is unpinned low mem address 0x8bf8000
> > > > > > 
> > > > > > OK so I think that is interesting. A pinned L4 referencing an 
> > > > > > unpinned
> > > > > > L1 isn't supposed to happen, I don't think (Jeremy?).
> > > > > 
> > > > > Interesting:
> > > > > 
> > > > >         pte_t *page_check_address(struct page *page, struct mm_struct 
> > > > > *mm,
> > > > >         [...]
> > > > >               pte = pte_offset_map(pmd, address); /* A */
> > > > >               /* Make a quick check before getting the lock */
> > > > >               if (!sync && !pte_present(*pte)) {
> > > > >                       pte_unmap(pte);
> > > > >                       return NULL;
> > > > >               }
> > > > >         
> > > > >               ptl = pte_lockptr(mm, pmd);
> > > > >               spin_lock(ptl);
> > > > >         [...]
> > > > >         
> > > > > So at point A we make a new mapping of a PTE without yet holding the
> > > > > corresponding PTE lock and this is precisely the point at which things
> > > > > start to go wrong for us... (coincidence? I think not ;-))
> > > > > 
> > > > > I wonder how this interacts with the logic in
> > > > > arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while waiting 
> > > > > for
> > > > > the (deferred) pin multicall to occur? Hmm, no this is about the
> > > > > PagePinned flag on the struct page which is out of date WRT the actual
> > > > > pinned status as Xen sees it -- we update the PagePinned flag early in
> > > > > xen_pin_page() long before Xen the pin hypercall so this window is the
> > > > > other way round to what would be needed to trigger this bug.
> > > > > 
> > > > > On the other hand xen_unpin_page() looks like it sets up something
> > > > > roughly like what we need for this issue to trigger.
> > > > > 
> > > > > Pasi in additional to my other mad hack could you try this:
> > > > > 
> > > > 
> > > > Ok.. do you want me to try first without this patch? Or should I cancel 
> > > > my
> > > > kernel compilation and apply this aswell? :)
> > > 
> > > Can you try the first patch first then add this one please.
> > > 
> > 
> > Ok. Will do.
> > 
> > I was already starting to feel like 'maybe my hardware is broken' but now 
> > that
> > code looks like it might be an actual bug :)
> > 
> > Let's see.
> > 
> 
> Crash with only the first patch applied:
> http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-05-with-highpte-no-swap-with-debug3.txt
> 
> Now I'll try with the second one included aswell..
> 

And here's one with the second patch applied aswell:
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-06-with-highpte-no-swap-with-debug4.txt

Seems to be different.. Xen is not complaining anymore..

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>