Xen project Mailing List

Re: [Xen-devel] x86_32: spurious page faults in guest GDT area

To: Jan Beulich <jbeulich@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>

Date: Mon, 16 Jun 2008 11:41:27 +0100

Delivery-date: Mon, 16 Jun 2008 03:42:10 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AcjPnYd9xeC76DuQEd2lAgAX8io7RQ==

Thread-topic: [Xen-devel] x86_32: spurious page faults in guest GDT area

What's the #PF error code -- is it a not-present or an access-violation fault; read/write access; etc? Do these faults happen under stable workload (by which I mean no domains being created/destroyed -- all VMs are booted and just running normal kinds of stuff)? -- Keir On 16/6/08 11:32, "Jan Beulich" <jbeulich@xxxxxxxxxx> wrote: > While under long-during stress I can reproduce this issue back to at least > c/s 16084, in older change sets it was apparently so rare that during > normal work/testing I never noticed it or had to ignore it due to not being > re-creatable. However, on recent change sets (tested with our 2.6.25- > based kernels only so far) it happens much more frequently (and > occasionally even while the machine boots). > > I inserted selector validation code in the context switch path to verify > that a vcpu's selectors are okay (or better, that the guest-provided > part of the GDT is accessible). These checks never indicated a failure > so far. > > The faults may happen in various places (hypervisor exit path as well > as guest code), and always involve loading a selector register with a > guest defined value (i.e. in the first page of the GDT). A page walk > in the (hypervisor) fault handler shows that all levels of the translation > exist (and are valid/consistent), and instrumentation of the selector > manipulation functions shows that none of them get called spuriously. > > Hence I can only suspect some asynchronous page table manipulation > (but I'm not aware of anything like that) lacking proper TLB flushing, or > some very rare issue with the CR3 reloading code. > > The same 32-bit kernel used with a 64-bit hypervisor so far did not > show similar problems - while I first thought this would help narrow > the problem, I'm pretty clueless at this point because the candidate > areas where 32-bit code is different from 64-bit all don't look > troublesome to me (most notably TLB flushing is identical between > the two). > > Any ideas on how to narrow the problem would be appreciated. > Thanks, Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.