[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0



On Mon, Jun 08, 2009 at 05:17:45PM +0100, Ian Campbell wrote:
> On Mon, 2009-06-08 at 12:13 -0400, Pasi Kärkkäinen wrote:
> > On Mon, Jun 08, 2009 at 05:00:58PM +0100, Ian Campbell wrote:
> > > On Mon, 2009-06-08 at 11:45 -0400, Ian Campbell wrote:
> > > > 
> > > > > L4 at e1822000 is pinned contains L2 at e1977228 which points at an
> > > > L1
> > > > > which is unpinned low mem address 0x8bf8000
> > > > 
> > > > OK so I think that is interesting. A pinned L4 referencing an unpinned
> > > > L1 isn't supposed to happen, I don't think (Jeremy?).
> > > 
> > > Interesting:
> > > 
> > >         pte_t *page_check_address(struct page *page, struct mm_struct *mm,
> > >         [...]
> > >           pte = pte_offset_map(pmd, address); /* A */
> > >           /* Make a quick check before getting the lock */
> > >           if (!sync && !pte_present(*pte)) {
> > >                   pte_unmap(pte);
> > >                   return NULL;
> > >           }
> > >         
> > >           ptl = pte_lockptr(mm, pmd);
> > >           spin_lock(ptl);
> > >         [...]
> > >         
> > > So at point A we make a new mapping of a PTE without yet holding the
> > > corresponding PTE lock and this is precisely the point at which things
> > > start to go wrong for us... (coincidence? I think not ;-))
> > > 
> > > I wonder how this interacts with the logic in
> > > arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while waiting for
> > > the (deferred) pin multicall to occur? Hmm, no this is about the
> > > PagePinned flag on the struct page which is out of date WRT the actual
> > > pinned status as Xen sees it -- we update the PagePinned flag early in
> > > xen_pin_page() long before Xen the pin hypercall so this window is the
> > > other way round to what would be needed to trigger this bug.
> > > 
> > > On the other hand xen_unpin_page() looks like it sets up something
> > > roughly like what we need for this issue to trigger.
> > > 
> > > Pasi in additional to my other mad hack could you try this:
> > > 
> > 
> > Ok.. do you want me to try first without this patch? Or should I cancel my
> > kernel compilation and apply this aswell? :)
> 
> Can you try the first patch first then add this one please.
> 

Ok. Will do.

I was already starting to feel like 'maybe my hardware is broken' but now that
code looks like it might be an actual bug :)

Let's see.

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.