WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] xen 3.2.1 / 2.6.18.8-xen dom0 with pci_bus_probe_wrapper

To: Scott Garron <web-xenbugs@xxxxxxxxxxxxxx>
Subject: Re: [Xen-users] xen 3.2.1 / 2.6.18.8-xen dom0 with pci_bus_probe_wrapper error
From: Zoltan HERPAI <wigyori@xxxxxxx>
Date: Tue, 05 Aug 2008 21:38:07 +0200
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 05 Aug 2008 12:38:54 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <489740EA.4040705@xxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <489604E3.4090902@xxxxxxx> <489740EA.4040705@xxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Thunderbird 2.0.0.14 (X11/20080505)
Scott Garron wrote:
Zoltan HERPAI wrote:

> I'm running Ubuntu 8.04.1 on an Asus M2N-E mainboard, latest BIOS,
> 64-bit userland


I've also wrestled with this issue for some 36 hours or so. I'm running Debian testing (lenny/sid) on a Supermicro X7DBE+ motherboard (Intel 5000P chipset). It currently has a single CPU, Quad-core Xeon E5345 (2.33GHz), 4GB RAM

64-bit Userland consists of gcc-4.3.1-2_amd64 (x86_64-linux-gnu target, posix thread model) and libc6-2.7-10_amd64

In my case, the machine gets partway through the init process, and while starting a few of the more involved network services, such as bind9 or apache2, the kernel panics and the machine halts (crash).

While attempting to figure out why it was doing that, I tried reverting back to the previous version that I had been running. Just running ./install.sh from dist in that tree was enough to get the machine to boot with a xen-enabled kernel, but because I had done an aptitude dist-upgrade, none of the Xen utilities were working (xend start, xm list, etc). I cloned the older build tree and did a re-compile with the latest versions of the python and libc dev libraries. That yielded a similar result as the Xen 3.2.1 compile: During boot, the kernel would complain about the pci probe and then in the middle of the init process, it would crash.

The only way I got the machine back to a working order was to install the version of the kernel (2.6.18-xen) and Xen (3.0, changeset 15521) that I had compiled with earlier gcc and libraries (back in July, 2007), and manually cherry pick the install from the dist/install/usr/lib64/python/xen directory on the freshly compiled copy of that same build tree. It's running again, but my net result was just a dist-upgrade. I'm not running a newer kernel or Xen, which is what I had set out to do in the first place.

Anyway, the point I'm trying to make is that because a fresh compile of my old build tree, a build tree that previously worked, yields the same crash result, it seems to be somehow related to the version of gcc or development libraries with which I used to compile it.

     The two "Oops"'s I get are:

BUG: warning at /usr/src/linux-2.6.18-xen.hg/drivers/xen/core/pci.c:28/pci_bus_probe_wrapper()
[...]
--- and:

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
 [<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200
PGD 19a2d067 PUD 19a2e067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: video button ac battery ppp_deflate zlib_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ipt_REDIRECT xt_tcpudp xt_multiport iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter ip_tables x_tables ipv6 reiserfs nls_iso8859_1 nls_cp437 vfat fat serio_raw i2c_i801 intel_rng pcspkr i2c_core tsdev ext3 jbd dm_mirror dm_snapshot dm_mod sd_mod usb_storage sg sr_mod cdrom usbhid 3w_9xxx 3c59x e1000 mii floppy ehci_hcd ata_piix libata scsi_mod uhci_hcd usbcore thermal processor fan
Pid: 2964, comm: named Not tainted 2.6.18.8-xen #1
RIP: e030:[<ffffffff88214114>] [<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200
RSP: e02b:ffff880019a85e38  EFLAGS: 00010297
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000008000
RDX: 0000000000000000 RSI: 0000000000008000 RDI: 0000000000008000
RBP: 000000000000001c R08: 000000000000ee48 R09: 000000000000807f
R10: 0000000000000008 R11: 0000000000000246 R12: ffff88001b71c3c0
R13: ffff880019a85ec8 R14: 000000000000001c R15: 0000000000000000
FS: 00002b17d2a5f6e0(0063) GS:ffffffff804d9000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process named (pid: 2964, threadinfo ffff880019a84000, task ffff88001f4c1100) Stack: 0000000000000000 000000000000001c ffff88001b71c3c0 ffffffff88201a64
 0000000000000004 ffffffff80397979 ffff88001b71c3c0 ffff880019a85ed0
 0000000000000000 ffff88001b71c698 0000000019a85f54 ffff880019341400
Call Trace:
 [<ffffffff88201a64>] :ipv6:inet6_bind+0x1e6/0x2a6
 [<ffffffff80397979>] sock_getsockopt+0x2d8/0x2fa
 [<ffffffff8039554b>] sys_bind+0x76/0xa6
 [<ffffffff88211256>] :ipv6:ipv6_setsockopt+0x3a/0x84
 [<ffffffff80394ad7>] sys_setsockopt+0xa5/0xb7
 [<ffffffff8020a644>] system_call+0x68/0x6d
 [<ffffffff8020a5dc>] system_call+0x0/0x6d


Code: 48 8b 12 0f 18 0a ff c0 3d fe 7f 00 00 7e f1 48 ff c7 44 39
RIP  [<ffffffff88214114>] :ipv6:udp_v6_get_port+0x81/0x200
 RSP <ffff880019a85e38>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
 (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
Thanks for the detailed infos. So it seems we've ran into a reproducible bug, even if I'm luckier to have at least the dom0 working - I was able to get guests running, both paravirt and HVM, stresstested them a bit, they were running fine. During your session, were you playing around with BIOS version, or were you experiencing this on another similar box if you have one?

What could be the solution if I want to stay with 3.2.1? Running forward to 3.2.2 doesn't seem to be a likely option.

Regards,
Zoltan HERPAI

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users