xen-users
Re: [Xen-users] xen 3.2.1 / 2.6.18.8-xen dom0 with pci_bus_probe_wrapper
Scott Garron wrote:
Zoltan HERPAI wrote:
> I'm running Ubuntu 8.04.1 on an Asus M2N-E mainboard, latest BIOS,
> 64-bit userland
I've also wrestled with this issue for some 36 hours or so. I'm
running Debian testing (lenny/sid) on a Supermicro X7DBE+ motherboard
(Intel 5000P chipset). It currently has a single CPU, Quad-core Xeon
E5345 (2.33GHz), 4GB RAM
64-bit Userland consists of gcc-4.3.1-2_amd64 (x86_64-linux-gnu
target, posix thread model) and libc6-2.7-10_amd64
In my case, the machine gets partway through the init process,
and while starting a few of the more involved network services, such
as bind9 or apache2, the kernel panics and the machine halts (crash).
While attempting to figure out why it was doing that, I tried
reverting back to the previous version that I had been running. Just
running ./install.sh from dist in that tree was enough to get the
machine to boot with a xen-enabled kernel, but because I had done an
aptitude dist-upgrade, none of the Xen utilities were working (xend
start, xm list, etc). I cloned the older build tree and did a
re-compile with the latest versions of the python and libc dev
libraries. That yielded a similar result as the Xen 3.2.1 compile:
During boot, the kernel would complain about the pci probe and then in
the middle of the init process, it would crash.
The only way I got the machine back to a working order was to
install the version of the kernel (2.6.18-xen) and Xen (3.0, changeset
15521) that I had compiled with earlier gcc and libraries (back in
July, 2007), and manually cherry pick the install from the
dist/install/usr/lib64/python/xen directory on the freshly compiled
copy of that same build tree. It's running again, but my net result
was just a dist-upgrade. I'm not running a newer kernel or Xen, which
is what I had set out to do in the first place.
Anyway, the point I'm trying to make is that because a fresh
compile of my old build tree, a build tree that previously worked,
yields the same crash result, it seems to be somehow related to the
version of gcc or development libraries with which I used to compile it.
The two "Oops"'s I get are:
BUG: warning at
/usr/src/linux-2.6.18-xen.hg/drivers/xen/core/pci.c:28/pci_bus_probe_wrapper()
[...]
--- and:
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
[<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200
PGD 19a2d067 PUD 19a2e067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: video button ac battery ppp_deflate zlib_deflate
bsd_comp ppp_async crc_ccitt ppp_generic slhc ipt_REDIRECT xt_tcpudp
xt_multiport iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter
ip_tables x_tables ipv6 reiserfs nls_iso8859_1 nls_cp437 vfat fat
serio_raw i2c_i801 intel_rng pcspkr i2c_core tsdev ext3 jbd dm_mirror
dm_snapshot dm_mod sd_mod usb_storage sg sr_mod cdrom usbhid 3w_9xxx
3c59x e1000 mii floppy ehci_hcd ata_piix libata scsi_mod uhci_hcd
usbcore thermal processor fan
Pid: 2964, comm: named Not tainted 2.6.18.8-xen #1
RIP: e030:[<ffffffff88214114>] [<ffffffff88214114>]
:ipv6:udp_v6_get_port+0x81/0x200
RSP: e02b:ffff880019a85e38 EFLAGS: 00010297
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000008000
RDX: 0000000000000000 RSI: 0000000000008000 RDI: 0000000000008000
RBP: 000000000000001c R08: 000000000000ee48 R09: 000000000000807f
R10: 0000000000000008 R11: 0000000000000246 R12: ffff88001b71c3c0
R13: ffff880019a85ec8 R14: 000000000000001c R15: 0000000000000000
FS: 00002b17d2a5f6e0(0063) GS:ffffffff804d9000(0000)
knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000
Process named (pid: 2964, threadinfo ffff880019a84000, task
ffff88001f4c1100)
Stack: 0000000000000000 000000000000001c ffff88001b71c3c0
ffffffff88201a64
0000000000000004 ffffffff80397979 ffff88001b71c3c0 ffff880019a85ed0
0000000000000000 ffff88001b71c698 0000000019a85f54 ffff880019341400
Call Trace:
[<ffffffff88201a64>] :ipv6:inet6_bind+0x1e6/0x2a6
[<ffffffff80397979>] sock_getsockopt+0x2d8/0x2fa
[<ffffffff8039554b>] sys_bind+0x76/0xa6
[<ffffffff88211256>] :ipv6:ipv6_setsockopt+0x3a/0x84
[<ffffffff80394ad7>] sys_setsockopt+0xa5/0xb7
[<ffffffff8020a644>] system_call+0x68/0x6d
[<ffffffff8020a5dc>] system_call+0x0/0x6d
Code: 48 8b 12 0f 18 0a ff c0 3d fe 7f 00 00 7e f1 48 ff c7 44 39
RIP [<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200
RSP <ffff880019a85e38>
CR2: 0000000000000000
<0>Kernel panic - not syncing: Aiee, killing interrupt handler!
(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
Thanks for the detailed infos. So it seems we've ran into a reproducible
bug, even if I'm luckier to have at least the dom0 working - I was able
to get guests running, both paravirt and HVM, stresstested them a bit,
they were running fine. During your session, were you playing around
with BIOS version, or were you experiencing this on another similar box
if you have one?
What could be the solution if I want to stay with 3.2.1? Running forward
to 3.2.2 doesn't seem to be a likely option.
Regards,
Zoltan HERPAI
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|
|
|