[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] DomU panic in net_rx_action *initiated by another DomU*; 7503:20d1a79ebe31



Pardon me replying to my own post; it was only after hitting "send" on the first one I took a closer look and found a much more interesting (and worrisome) aspect to the issue I've been seeing: Another DomU crashed with the same error at the same time. This one was testing an experimental Linux kernel patch (impacting procfs handling of symlinks), and particularly unstable for that reason. It's interesting, though, that the other DomU (without any such patch applied) appeared to be impacted as well by the same issue.

This appears to be reproducible.


From the DomU initiating the issue [running an experimental kernel patch and exercising a bug in that patch]:

Bad rx buffer (memory squeeze?).
Bad rx buffer (memory squeeze?).
Unable to handle kernel paging request at ffff880000b3c700 RIP:
<ffffffff8024026a>{netif_poll+1354}
PGD c55067 PUD c56067 PMD c5c067 PTE 0
Oops: 0002 [1]
CPU 0
Modules linked in: ext3 jbd unionfs
Pid: 0, comm: swapper Tainted: GF     2.6.12.6-xenU
RIP: e030:[<ffffffff8024026a>] <ffffffff8024026a>{netif_poll+1354}
RSP: e02b:ffffffff803bbd98  EFLAGS: 00010212
RAX: ffff880000b3c700 RBX: ffff880000b97900 RCX: ffff880000b3c064
RDX: ffff880000b3c700 RSI: 0000000000000002 RDI: ffff880000b97900
RBP: ffff880000b97900 R08: 0000000000000000 R09: 0000000000000022
R10: 000000000003f998 R11: 0000000000000212 R12: ffff88003faba360
R13: ffff88003f41a138 R14: 0000000000000080 R15: 0000000000000000
FS:  00002aaaab2890a0(0000) GS:ffffffff803a7900(0000) knlGS:ffffffff80440600
CS:  e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff803ba000, task ffffffff80307380)
Stack: 0000000100000000 0000000100000040 0000000000000001 0000002800000028
       ffffffff803bbe2c ffff88003faba000 ffffffff803bbdc8 ffffffff803bbdc8
       0000000000000000 ffff88003f879c60
Call Trace:<ffffffff80255859>{net_rx_action+169} <ffffffff8013380b>{__do_softirq+107}
       <ffffffff801338ad>{do_softirq+61} <ffffffff80114e69>{do_IRQ+57}
<ffffffff8010d948>{evtchn_do_upcall+136} <ffffffff80111fb9>{do_hypervisor_callback+17}
       <ffffffff8010f9f3>{xen_idle+83} <ffffffff8010f9f3>{xen_idle+83}
       <ffffffff8010fa2f>{cpu_idle+31} <ffffffff803bc6ea>{start_kernel+490}
       <ffffffff803bc169>{_sinittext+361}

Code: c7 00 01 00 00 00 48 8b 83 10 01 00 00 c7 40 04 00 00 00 00
RIP <ffffffff8024026a>{netif_poll+1354} RSP <ffffffff803bbd98>
CR2: ffff880000b3c700
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!


From the DomU being impacted by the issue [running no unusual patches or modules, and being stable *except* when the initiating DomU is running]:


Unable to handle kernel paging request at ffff88003df8d700 RIP:
<ffffffff8024026a>{netif_poll+1354}
PGD 5e3067 PUD 5e4067 PMD 7d4067 PTE 0
Oops: 0002 [1]
CPU 0
Modules linked in: ipv6
Pid: 0, comm: swapper Tainted: GF     2.6.12.6-xenU
RIP: e030:[<ffffffff8024026a>] <ffffffff8024026a>{netif_poll+1354}
RSP: e02b:ffffffff803bbd98  EFLAGS: 00010212
RAX: ffff88003df8d700 RBX: ffff88003d99cbc0 RCX: ffff88003df8d05e
RDX: ffff88003df8d700 RSI: 0000000000000002 RDI: ffff88003d99cbc0
RBP: ffff88003d99cbc0 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff80381c20 R11: 0000000000000212 R12: ffff88003fc2e360
R13: ffff8800004dd248 R14: 0000000000000080 R15: 0000000000000000
FS:  00002aaaaade3b00(0000) GS:ffffffff803a7900(0000) knlGS:ffffffff803a7900
CS:  e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff803ba000, task ffffffff80307380)
Stack: 0000000100000000 0000000100000040 0000000000000001 0000174a0000174a
       ffffffff803bbe2c ffff88003fc2e000 ffffffff803bbdc8 ffffffff803bbdc8
       0000000000000000 ffff8800000cae60
Call Trace:<ffffffff80255859>{net_rx_action+169} <ffffffff8013380b>{__do_softirq+107}
       <ffffffff801338ad>{do_softirq+61} <ffffffff80114e69>{do_IRQ+57}
<ffffffff8010d948>{evtchn_do_upcall+136} <ffffffff80111fb9>{do_hypervisor_callback+17}
       <ffffffff8010f9f3>{xen_idle+83} <ffffffff8010f9f3>{xen_idle+83}
       <ffffffff8010fa2f>{cpu_idle+31} <ffffffff803bc6ea>{start_kernel+490}
       <ffffffff803bc169>{_sinittext+361}

Code: c7 00 01 00 00 00 48 8b 83 10 01 00 00 c7 40 04 00 00 00 00
RIP <ffffffff8024026a>{netif_poll+1354} RSP <ffffffff803bbd98>
CR2: ffff88003df8d700
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!



The "experimental kernel patch" in question is a unionfs patch found at http://permalink.gmane.org/gmane.comp.file-systems.unionfs.general/638, when applied to UnionFS 1.1.1 (a different release than that it was initially developed against, though the patch applies cleanly). The bug is repeatedly observable for me when playing with ifup on a system running said patch with a root filesystem on a unionfs mount. If anyone is interested in reproducing it and is unable to do so on the information I've provided so far, let me know and I'd be glad to try to offer additional details.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.