WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-users

Re: [Xen-users] Xen 3.0.0 32bit-pae (testing changeset 8270) crashes(pgt

To: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>
Subject: Re: [Xen-users] Xen 3.0.0 32bit-pae (testing changeset 8270) crashes(pgtable.c:284, kernel bug?)
From: Ralph Passgang <ralph@xxxxxxxxxxxxx>
Date: Thu, 2 Feb 2006 13:56:19 +0100
Cc: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>, ian.pratt@xxxxxxxxxxxx, xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 02 Feb 2006 13:06:38 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <A95E2296287EAD4EB592B5DEEFCE0E9D40A4C4@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-users-request@lists.xensource.com?subject=help>
List-id: Xen user discussion <xen-users.lists.xensource.com>
List-post: <mailto:xen-users@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-users>, <mailto:xen-users-request@lists.xensource.com?subject=unsubscribe>
References: <A95E2296287EAD4EB592B5DEEFCE0E9D40A4C4@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.9.1
Am Donnerstag, 2. Februar 2006 12:52 schrieb Ian Pratt:
> > 3.0.1 seems to fix the bug I saw on my two machines, but now
> > there is another
> > (but somehow related) problem for me in 3.0.1-pae. I don't
> > know if it's still
> > related to the 3ware controller, but at least it only appears
> > for domains
> > that have memory above the 32bit adress-space again, so the
> > first started
> > domUs run fine. The big difference is that I don't have any
> > complete freezes
> > of the xen machine anymore, just domUs are crashing this time.
>
> Interesting. It looks like xen is running out of memory below 4GB, and
> can't service the domain's request for a new L3 PGD, causing the domain
> to bug out.
>
> Are you using dom0_mem= on the xen command line to constrain dom0's
> memory usage or are relying on dom0 releasing memory automatically as
> you start other domains? If the latter, I expect dom0 is hogging all the
> pages below 4GB. [Grrr, PAE is such a crock...]

No, I don't use dom0_mem since xen3 is out, but I will setting the dom0_mem 
again to check if this makes any difference. In general I like the new 
feature of letting xen handle the dom0 memory.

And I agree. pae is a crock, but the problem in my case is, that we have 
already some production system running on xen hosts in 32bit mode and it's 
not as easy to upgrade them to use 64bit (because of the downtime and so on). 
At the time the servers were bought I didn't knew that pci devices are taking 
500MB to 1GB addresspace and I thought pae is just needed for systems with 
more then 4gb physical ram.

Using a 64bit kernel but 32bit userspace is horrible too (at least in my 
opinion), because for example iptables won't run then (if I got that right). 
Mixing 64bit and 32bit userspace is possible, but I don't like that idea. 
Customers want a clean system and not a somehow working solution. But I guess 
in future we are forced to upgrade to 64bit anyway, because that is defintily 
the future.

> Given that your 3ware controller is already putting pressure on the
> bottom 4GB you'd be better off setting your initial dom0 memory at boot
> time.

so you think that even this bug I see is related to the 3ware controller and 
not a general issue?

I cannot really check it, because this 3ware-system is the only server that I 
have available with 4GB of RAM for testing.

Is this really a 3ware specific problem?

I am asking, because I want to know if I shouldn't buy 3ware controllers for 
xen systems anymore. In the next month we will need new xen systems and I 
don't want to buy wrong hw then :)

> Please let me know how you get on. BTW: can you get a serial line on the
> machine? It might be interesting to see some of xen's memory usage
> diagnostics.

There is already a serial console on the machine, but it's not showing 
anything interessting automaticly (just the normal xen output from boottime). 
I guess with some of the SysRQ's I get the information you need, right? I 
will take a look and mail you this information. That should not be a problem.

You can also have a ssh account to dom0 if you like, I just have to attach the 
server to another network then.

> Ian
>
> > the domU doesn't always crash at the very same place,
> > sometimes at the
> > beginning of the init process, sometimes when it loads
> > modules, sometimes
> > when services gets started... Sometimes this crash happens
> > more then once
> > before the domU panics.
> >
> > here is what I see in the domU console:
> >
> > ------------[ cut here ]------------
> > kernel BUG at <bad filename>:63723!
> > invalid operand: 0000 [#1]
> > SMP
> > Modules linked in: 8250 reiserfs efs isofs vfat fat ext3 jbd
> > evdev pci_hotplug
> > dm_mod sd_mod 3w_xxxx e1000 jedec_probe cfi_probe gen_probe
> > chipreg mtdcore
> > map_funcs i2c_i801 i2c_core parport_pc parport serial_core
> > usbhid pcmcia
> > yenta_socket rsrc_nonstatic pcmcia_core processor genrtc sbp2
> > ohci1394
> > ieee1394 usb_storage ohci_hcd uhci_hcd 3w_9xxx scsi_mod unix
> > CPU:    0
> > EIP:    0061:[<c01182b6>]    Not tainted VLI
> > EFLAGS: 00010282   (2.6.12.6-xen)
> > EIP is at pgd_ctor+0x26/0x30
> > eax: fffffff4   ebx: 00000001   ecx: f577e000   edx: 00000000
> > esi: c118fd80   edi: c12bd258   ebp: c12bd240   esp: c864dd38
> > ds: 007b   es: 007b   ss: 0069
> > Process rcS (pid: 1041, threadinfo=c864c000 task=c06f8a40)
> > Stack: c77ae000 00000000 00000020 c014dd51 c77ae000 c118fd80
> > 00000001 c12bd240
> >        c77ae000 c118fd80 00000000 c014decd c118fd80 c12bd240
> > 00000001 000000d0
> >        c118fde0 00000001 000000d0 c119d980 0000000c 000000d0
> > 00000000 c014e0db
> > Call Trace:
> >  [<c014dd51>] cache_init_objs+0x71/0x80
> >  [<c014decd>] cache_grow+0x10d/0x1a0
> >  [<c014e0db>] cache_alloc_refill+0x17b/0x220
> >  [<c014e39f>] kmem_cache_alloc+0x7f/0x90
> >  [<c011833d>] pgd_alloc+0x1d/0x310
> >  [<c01216fe>] mm_init+0xce/0x100
> >  [<c0121a14>] copy_mm+0xd4/0x3d0
> >  [<c0121fdf>] copy_files+0x1af/0x320
> >  [<c03f9d00>] parse_header+0xb0/0xe0
> >  [<c03f9d04>] parse_header+0xb4/0xe0
> >  [<c01225af>] copy_process+0x3df/0xd00
> >  [<c0166f4f>] fd_install+0x2f/0x60
> >  [<c0122fc9>] do_fork+0x69/0x18f
> >  [<c0130e4a>] sys_rt_sigprocmask+0xaa/0x110
> >  [<c0108f91>] sys_fork+0x31/0x40
> >  [<c010a65d>] syscall_call+0x7/0xb
> > Code: 00 f3 ab 5f c3 83 ec 0c b8 20 00 00 00 89 44 24 08 31
> > c0 89 44 24 04 8b
> > 44 24 10 89 04 24 e8 d2 2b 00 00 85 c0 75 04 83 c4 0c c3 <0f>
> > 0b eb f8 8d b6
> > 00 00 00 00 83 ec 08 b8 f8 e3 36 c0 89 5c 24
> >  /etc/init.d/rcS: line 57:  1041 Segmentation fault      (
> > trap - INT QUIT
> > TSTP; set start; . $i )
> >
> > something I can do to help resolving that?
> >
> > thx & regards,
> > -- Ralph
> >
> > > Ian

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

<Prev in Thread] Current Thread [Next in Thread>