[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Re: Reproducable data corruption on xen-unstable

On Sun, 6 Feb 2005, I wrote:
A syscall was made (connect). Immediately before the syscall, the floating-point stack was empty; immediately after the syscall, the floating-point stack was nonempty, and the TS flag (Task Switch) was _cleared_.

I now have an "easier" way to reproduce this problem. Apply the patch below to a xen0-kernel, which checks the FPU state against TS. What it
basically does is:

 if (TS == 0 && fpu_stack_size > 0) panic ("Corrupt FPU");

An equivalent patch against a non-xen kernel yields no problems that I can
detect, but patching a xen0-kernel with this patch, causes it to panic and
reboot as soon as it hits the graphical login manager (in my case, kdm).
(Of course, it might be specific to kdm, or my hardware, or who knows what.)

*** HELP WANTED! ***
If someone on a machine with a debug console could reproduce this, I'd be
most grateful. I don't have a serial console yet, so I'm a bit stuck.

The logic behind this patch is, if there is something on the FPU stack from _another_ process, TS should be 1 to prevent data leakage between processes. If, on the other hand, there is something on the FPU stack from the _same_ process being switched to, TS should still be 1, because who would have cleared it since it was set when that process was last switched away from? So, in either case, TS should be 1.

Also, I was wrong in my previous post:

So, in theory there are two possible algorithms which the kernel could be supposed to be following to avoid this situation.

A. Always set TS on task switch (Seems like the logical choice!)

B. Always set TS on task switch - except when the FPU has not been used
by the switched-to process, in which case do an FINIT on task switch. (This seems pointlessly complicated and slow, so I doubt the kernel follows this approach.)

The _actual_ algorithm appears to be:

C. Always set TS on task switch - except when the FPU has not been used
in the previous timeslice by the switched-FROM process, in which case we assume (incorrectly in the case of xen0-kernels, but correctly in the case
of normal kernels!) that TS must be _already_ set if the FPU is dirty.


Attachment: xeno-fp-debug.patch
Description: task switcher debugging patch



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.