WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] xm pause causing lockup

To: "Kip Macy" <kip.macy@xxxxxxxxx>
Subject: RE: [Xen-devel] xm pause causing lockup
From: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
Date: Fri, 15 Apr 2005 20:29:13 +0100
Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Fri, 15 Apr 2005 19:29:07 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcVB7Q67C1HbBcXHQeSycRFKbPmVKwAAnAHg
Thread-topic: [Xen-devel] xm pause causing lockup
I need to think about this more, but it looks like you have an L2 page
that has a type count of 1 but hasn't been validated. You're then
looping when you try and increment it to 2 thinking that you're racing
someone else. 

Does this happen if you boot with 'nosmp'? I don't really believe it's a
race, but might be worth checking.

Also, it's worth adding a printk into this loop just to check that that
is where you're getting caught.

            /* Someone else is updating validation of this page. Wait...
*/
            while ( (y = page->u.inuse.type_info) == x )
                cpu_relax();
            goto again;

We need to figure out how the type count managed to get to one without
the page being validated. I presume you're doing a debug=y build of Xen?
Do you get any warnings about illegal mmu_update attempts when you boot
FreeBSD?

Ian

> Without the ability to continue and only a very basic 
> understanding of the page typing code there is not a whole 
> lot to go on. Let me know if there is some other bit of 
> information that I can provide you with.
> 
>          -Kip
> 
> Before attaching:
> (XEN) 'd' pressed -> dumping registers
> (XEN) CPU:    1
> (XEN) EIP:    0808:[<fc52d59f>]      
> (XEN) EFLAGS: 00000246   CONTEXT: hypervisor
> (XEN) eax: 40000001   ebx: 00000000   ecx: fcfe3740   edx: fcfe3740
> (XEN) esi: 00007ff0   edi: 00000001   ebp: fcffbda0   esp: fcffbd58
> (XEN) ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810   cs: 0808
> (XEN) Stack trace from ESP=fcffbd58:
> (XEN)    80000003 00000001 fcfe3740 fcfe3740 fcfe3740 80000003
> 80000004 80000003
> (XEN)    00000000 00007ff0 fcffbda0 [fc52bfec] fd494968 fcfe3740
> fcffbdc0 40000001
> (XEN)    40000001 40000002 fcffbdd0 [fc52c07b] fd494968 25fe0000
> 00000000 00000000
> (XEN)    000003d1 00000000 fcffbde0 [fc52bcec] 00000000 fd494968
> fcffbe00 [fc52c52e]
> (XEN)    0000630f 25fe0000 fcfe3740 [fc52d100] fffffffc 00000000
> fcffe000 00000001
> (XEN)    00000001 ff85b000 fcffbe40 [fc52c889] 0630f061 0000630f
> fcfe3740 000002ff
> (XEN)    00000001 f0000000 f0000000 00000004 f0000001 f0000000
> 000002ff ff85b000
> (XEN)    0000630f fcfe3740 fcffbe60 [fc52d0f0] fd494968 000001fa
> fc5b20c0 [fc53185d]
> (XEN)    40000000 00000002 fcffbeb0 [fc52d771] fd494968 40000000
> fcfe3740 fcfe3740
> (XEN)    fcfe3740 80000002 80000003 00000004 00000000 f0000000
> f0000000 00000004
> (XEN)    40000001 f0000000 fd49497c f0000000 f0000000 40000001
> fcffbee0 [fc52c07b]
> (XEN)    fd494968 40000000 002ed518 00000000 a089075b 00000001
> fcfe3740 00000000
> (XEN)    00007ff0 fd494968 fcffbfb0 [fc52df98] 0000630f 40000000
> fcfe3740 00000292
> (XEN)    fc5781c0 00000001 0019b901 00000000 00804e95 00000000
> a089075b 000000a1
> (XEN)    a10955f0 000000a1 00000001 fcfea040 00007ff0 00000001
> fcffbf80 00000000
> (XEN)    fcfe3740 00000000 fcfe3740 00000000 a10955f0 000000a1
> 00000000 fcffbf98
> (XEN)    c0293bac 0000000c 00000003 [fc515bfc] a08902cd 000000a1
> 00000002 fcfe3740
> (XEN)    fcfea040 fd494968 00000000 40000000 00000001 00000001
> 00000000 00000000
> (XEN)    00000001 0000630f c018a19b 00000001 fcfea040 00007ff0
> c0293bc8 [fc54e923]
> (XEN)    c0293bac 00000001 00000000 00007ff0 00000001 c0293bc8
> 0000001a 00000000
> (XEN) Call Trace from ESP=fcffbd58:
> (XEN)    [<fc52bfec>] [<fc52c07b>] [<fc52bcec>] [<fc52c52e>]
> [<fc52d100>] [<fc52c889>]
> (XEN)    [<fc52d0f0>] [<fc53185d>] [<fc52d771>] [<fc52c07b>]
> [<fc52df98>] [<fc515bfc>]
> (XEN)    [<fc54e923>] 
> (XEN) Waiting for GDB to attach to XenDBG
> 
> 
> gdb) bt
> #0  0xfc52d59f in get_page_type (page=0xfd494968, 
> type=0x25fe0000) at mm.c:1235
> #1  0xfc52c07b in get_page_and_type_from_pagenr 
> (page_nr=0x630f, type=0x25fe0000, d=0xfcfe3740) at mm.c:360
> #2  0xfc52c52e in get_page_from_l2e (l2e={l2_lo = 0x630f061}, 
> pfn=0x630f, d=0xfcfe3740, va_idx=0x2ff) at mm.c:495
> #3  0xfc52c889 in alloc_l2_table (page=0xfd494968) at mm.c:679
> #4  0xfc52d0f0 in alloc_page_type (page=0xfd494968, 
> type=0x40000000) at mm.c:1083
> #5  0xfc52d771 in get_page_type (page=0xfd494968, 
> type=0x40000000) at mm.c:1269
> #6  0xfc52c07b in get_page_and_type_from_pagenr 
> (page_nr=0x630f, type=0x40000000, d=0xfcfe3740) at mm.c:360
> #7  0xfc52df98 in do_mmuext_op (uops=0xc0293bac, count=0x1, pdone=0x0,
> foreigndom=0x7ff0) at mm.c:1499
> #8  0xfc54e923 in test_all_events () at bitops.h:239
> #9  0xc0293bac in ?? ()
> 
> (gdb) f 7
> #7  0xfc52df98 in do_mmuext_op (uops=0xc0293bac, count=0x1, pdone=0x0,
> foreigndom=0x7ff0)  at mm.c:1499
> 1499                okay = get_page_and_type_from_pagenr(op.mfn, type,
> FOREIGNDOM);
> (gdb) p op
> $9 = {
>   cmd = 0x1,
>   {
>     mfn = 0x630f,
>     linear_addr = 0x630f
>   },
>   {
>     nr_ents = 0xc018a19b,
>     cpuset = 0xc018a19b
>   }
> }
> (gdb) p x
> $1 = 0x40000001
> (gdb) x nx
> 0x40000002:     Ignoring packet error, continuing...
> Reply contains invalid hex digit 40
> (gdb) p y
> $2 = 0x40000001
> (gdb) p page->u.inuse.type_info
> $3 = 0x40000001
> (gdb) p x
> $4 = 0x40000001
> (gdb) p nx
> $5 = 0x40000002
> (gdb) p y
> $6 = 0x40000001
> (gdb) p x
> $7 = 0x40000001
> (gdb) p sizeof(page->u.inuse.type_info)
> $8 = 0x4
> 
> 
> 
> On 4/15/05, Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx> wrote:
> > Wild! It really is looping in get_page_type.
> > 
> > Any chance you could use the serial debugger to find out what x, nx 
> > and y are in the cmpxchg?
> > 
> > I've tried to think of duff inputs that could cause it to loop, but 
> > I'm not smart enough.
> > 
> > Ian
> > 
> > > -----Original Message-----
> > > From: Kip Macy [mailto:kip.macy@xxxxxxxxx]
> > > Sent: 15 April 2005 18:13
> > > To: Ian Pratt
> > > Cc: Keir Fraser; xen-devel; ian.pratt@xxxxxxxxxxxx
> > > Subject: Re: [Xen-devel] xm pause causing lockup
> > >
> > > Great, thanks. I'm now running a completely fresh tree from last 
> > > night.
> > >
> > > Over the course of several minutes I hit 'd' a number of 
> times. The 
> > > addresses I got were:
> > >
> > > 0xfc51c742
> > > 0xfc51c746
> > > 0xfc51c74b
> > > 0xfc51c740
> > >
> > > (gdb) x/i 0xfc51c742
> > > 0xfc51c742 <get_page_type+1218>:        mov    0x40(%esp,1),%eax
> > > (gdb) x/i 0xfc51c746
> > > 0xfc51c746 <get_page_type+1222>:        mov    0x14(%eax),%ebx
> > > (gdb) x/i 0xfc51c74b
> > > 0xfc51c74b <get_page_type+1227>:        je     0xfc51c740
> > > <get_page_type+1216>
> > > (gdb) x/i 0xfc51c740
> > > 0xfc51c740 <get_page_type+1216>:        repz nop
> > >
> > >
> > >                -Kip
> > >
> > > On 4/14/05, Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx> wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > > > > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf
> > > Of Kip Macy
> > > > > Sent: 15 April 2005 05:36
> > > > > To: Keir Fraser
> > > > > Cc: xen-devel
> > > > > Subject: Re: [Xen-devel] xm pause causing lockup
> > > > >
> > > > > To further check this I added:
> > > > >  printk("%s %d %d %d %d %d\n", __FUNCTION__, op->cmd,
> > > > > op->mfn, count, success_count, domid); to
> > > > > HYPERVISOR_mmuext_op and something similar to mmu_update.
> > > >
> > > > Is your hypothesis that Xen gets stuck in either the 
> mmuext_op or 
> > > > mmu_update loops?
> > > > Are you running with watchdog enabled?
> > > >
> > > > It might be good to add a printk at the end so that you can
> > > prove this.
> > > >
> > > > Hitting 'd' on the debug console will give us an EIP on CPU 1.
> > > >
> > > > Ian
> > > >
> > >
> >
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel