[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] x86_32: spurious page faults in guest GDT area

While under long-during stress I can reproduce this issue back to at least
c/s 16084, in older change sets it was apparently so rare that during
normal work/testing I never noticed it or had to ignore it due to not being
re-creatable. However, on recent change sets (tested with our 2.6.25-
based kernels only so far) it happens much more frequently (and
occasionally even while the machine boots).

I inserted selector validation code in the context switch path to verify
that a vcpu's selectors are okay (or better, that the guest-provided
part of the GDT is accessible). These checks never indicated a failure
so far.

The faults may happen in various places (hypervisor exit path as well
as guest code), and always involve loading a selector register with a
guest defined value (i.e. in the first page of the GDT). A page walk
in the (hypervisor) fault handler shows that all levels of the translation
exist (and are valid/consistent), and instrumentation of the selector
manipulation functions shows that none of them get called spuriously.

Hence I can only suspect some asynchronous page table manipulation
(but I'm not aware of anything like that) lacking proper TLB flushing, or
some very rare issue with the CR3 reloading code.

The same 32-bit kernel used with a 64-bit hypervisor so far did not
show similar problems - while I first thought this would help narrow
the problem, I'm pretty clueless at this point because the candidate
areas where 32-bit code is different from 64-bit all don't look
troublesome to me (most notably TLB flushing is identical between
the two).

Any ideas on how to narrow the problem would be appreciated.
Thanks, Jan

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.