[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.3 + tmem = Xen BUG at domain_page.c:143



On Tue, Jun 11, 2013 at 4:30 PM, konrad wilk <konrad.wilk@xxxxxxxxxx> wrote:
> I think this is a more subtle bug.
> I applied a debug patch (see attached) and with the help of it and the logs:
>
> (XEN) domain_page.c:160:d1 mfn (1ebe96) -> 6 idx: 32(i:1,j:0), branch:1
> (XEN) domain_page.c:166:d1 [0] idx=26, mfn=0x1ebcd8, refcnt: 0
> (XEN) domain_page.c:166:d1 [1] idx=12, mfn=0x1ebcd9, refcnt: 0
> (XEN) domain_page.c:166:d1 [2] idx=2, mfn=0x210e9a, refcnt: 0
> (XEN) domain_page.c:166:d1 [3] idx=14, mfn=0x210e9b, refcnt: 0
> (XEN) domain_page.c:166:d1 [4] idx=7, mfn=0x210e9c, refcnt: 0
> (XEN) domain_page.c:166:d1 [5] idx=10, mfn=0x210e9d, refcnt: 0
> (XEN) domain_page.c:166:d1 [6] idx=5, mfn=0x210e9e, refcnt: 0
> (XEN) domain_page.c:166:d1 [7] idx=13, mfn=0x1ebe97, refcnt: 0
> (XEN) Xen BUG at domain_page.c:169
>
> (XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    3
> (XEN) RIP:    e008:[<ffff82c4c01606a7>] map_domain_page+0x61d/0x6e1
>
> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: ffff8300c68f9000   rcx: 0000000000000000
> (XEN) rdx: ffff8302125b2020   rsi: 000000000000000a   rdi: ffff82c4c027a6e8
> (XEN) rbp: ffff8302125afcc8   rsp: ffff8302125afc48   r8: 0000000000000004
> (XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000001
> (XEN) r12: ffff83022e2ef000   r13: 00000000001ebe96   r14: 0000000000000020
> (XEN) r15: ffff8300c68f9080   cr0: 0000000080050033   cr4: 00000000000426f0
> (XEN) cr3: 0000000209541000   cr2: ffffffffff600400
>
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff8302125afc48:
> (XEN)    00000000001ebe97 0000000000000000 0000000000000000 ffff830200000001
> (XEN)    ffff8302125afcc8 ffff82c400000000 00000000001ebe97 000000080000000d
> (XEN)    ffff83022e2ef2d8 0000000000000286 ffff82c4c0127b6b ffff83022e2ef000
> (XEN)    ffff82e003d7d2c0 ffff8302125afd60 00000000001ebe96 0000000000000000
> (XEN)    ffff8302125afd38 ffff82c4c01373de 0000000000000000 ffffffffffffffff
> (XEN)    0000000000000001 ffff8302125afd58 ffff83022e2ef2d8 0000000000000286
>
> (XEN)    0000000000000027 0000000000000000 0000000000001000 0000000000000000
> (XEN)    0000000000000000 00000000001ebe96 ffff8302125afd98 ffff82c4c01377c4
> (XEN)    0000000000000000 ffff820040017000 ffff82e003d7d2c0 00000000001ebe96
> (XEN)    ffff8302125afd98 ffff830210ecf390 00000000fffffff4 ffff820040009010
> (XEN)    ffff820040000f50 ffff83022e2f0c90 ffff8302125afe18 ffff82c4c0135929
> (XEN)    000000160000001e ffff820040000f50 0000000000000000 00000000001ebe96
> (XEN)    0000000000000000 0000000000000000 0000a2f6125afe28 ffff8302125afe00
> (XEN)    0000001675f02b51 ffff83022e2f0c90 ffff830210ecf390 0000000000000000
> (XEN)    0000000000000001 0000000000000065 ffff8302125afef8 ffff82c4c0136510
> (XEN)    ffff830200001000 0000000000000000 ffff8302125afe90 255ece02125b2040
> (XEN)    00000003125afe68 00000016742667d1 ffff8302125b2100 0000003d52299000
> (XEN)    ffff8300c68f9000 0000000001c9c380 ffff8302125b2100 ffff8302125b1808
> (XEN)    0000000000000004 0000000000000004 0000000000000000 0000000000000000
> (XEN)    000000000000a2f6 0000000000000000 00000000001ebe96 ffff82c4c0126e77
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4c01606a7>] map_domain_page+0x61d/0x6e1
>
> (XEN)    [<ffff82c4c01373de>] cli_get_page+0x15e/0x17b
> (XEN)    [<ffff82c4c01377c4>] tmh_copy_from_client+0x150/0x284
> (XEN)    [<ffff82c4c0135929>] do_tmem_put+0x323/0x5c4
> (XEN)    [<ffff82c4c0136510>] do_tmem_op+0x5a0/0xbd0
> (XEN)    [<ffff82c4c022391b>] syscall_enter+0xeb/0x145
>
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) Xen BUG at domain_page.c:169
>
> (XEN) ****************************************
> (XEN)
> (XEN) Manual reset required ('noreboot' specified)
>
> It looks as if the path that is taken is:
>
> 110     idx = find_next_zero_bit(dcache->inuse, dcache->entries,
> dcache->cursor);
> 111     if ( unlikely(idx >= dcache->entries) )
> 112     {
>
> 115         /* /First/, clean the garbage map and update the inuse list. */
> 116         for ( i = 0; i < BITS_TO_LONGS(dcache->entries); i++ )
> 117         {
> 118             dcache->inuse[i] &= ~xchg(&dcache->garbage[i], 0);
> 119             accum |= ~dcache->inuse[i];
>
> Here computes the accum
> 120         }
> 121
> 122         if ( accum )
> 123             idx = find_first_zero_bit(dcache->inuse, dcache->entries)
>
> Ok, finds the idx (32),
> 124         else
> 125         {
> .. does not go here.
> 142         }
> 143         BUG_ON(idx >= dcache->entries);
>
> And hits the BUG_ON().
>
> But I am not sure if that is appropriate. Perhaps the BUG_ON was meant as a
> check
> for the loop (lines 128 ->  141) - in case it looped around and never found
> an empty place.
> But if that is the condition then that would also look suspect as it might
> have found an
> empty hash entry and the idx would still end up being 32.

Right -- it is really curious that "accum |= ~dcache->inuse[x]"
managed to be non-zero, while find_first_zero_bit() goes off the end
(as it seems).

It seems like you should add a printk in the first loop:
   if(~dcache->inuse[i]) printk(...);

Also, I don't think you've printed what dcache->entries is -- is it 32?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.