[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RE: [Xen-devel] wget and Zope crashes on post-2.0.6 -testing



Hi Ian,

On Wed, Jun 08, 2005 at 06:58:51PM +0100, Ian Pratt wrote:
> > On 3 Jun 2005, at 10:04, Osma Suominen wrote:
> > 
> > > When you've had wget crash, you can try some of the other tests in
> > > http://thread.gmane.org/gmane.comp.emulators.xen.devel/10628
> > >
> > > Since this happens on a random PC with the demo CD, I'll 
> > > bet that this 
> > > is not some obscure problem with the specific hardware or software 
> > > installation but a real bug in Xen.
> > 
> > This bug should now be fixed in our xen-2.0.testing.bk repository.
> 
> This deserves a bit more explanation, as it probably effects all vendor
> kernels based on Xen 2.0 (SuSE 9.3 Pro, Debian, demo CD, Gentoo, etc.)
> It does *not* effect the kernel we ship in our 2.0 source and binary tar
> balls, which is why its taken so long to pin down. It does *not* effect
> the unstable branch.
> 
> The reason the bug is not present in our kernels is due to the kernel
> config: we enable CONFIG_MD_RAID5=y in our config which hides the bug,
> whereas most distros have this as a module.
> 
> The root cause of the bug is that during the boot sequence Linux tests
> to see whether the processor has the fdiv bug. This involves doing some
> floating point opertions. Unfortunately, they are not wrapped in the
> kernel_fpu_begin()/end() calls that normally surround use of fp in the
> kernel. Native linux gets away with this because it happens so early in
> the boot process that no-one else can be using the fpu. However, on Xen
> this gets us into a bad state, which will come back to haunt us much
> later on, resulting in fpu state corruption in user processes. The fix
> in 2.0-testing is simply to 'wrap' the fdiv test.
> 
> The reason the bug is not present on unstable is that the fpu code had
> already been rejigged so that we were immune to this kind of problem as
> it had been identified as a potential fragility.
> 
> Since this bug hadn't been widely reported we probably won't rush to
> release a 2.0.6a demo CD, but vendor kernel maintainers should
> definitely pick up the fix.

Thanks for informing us!
I observed that the first userspace process that uses the FPU will 
SIGFPE once. Afterwards everything runs just fine ...

You description looks like it matches exactly the misbehaviour I've been
seeing.

Is attached patch the right way to fix this?
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.

Attachment: xen-fdiv-test
Description: Text document

Attachment: pgprVvkepcen1.pgp
Description: PGP signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.