Xen project Mailing List

Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>

Date: Thu, 9 May 2019 11:46:20 -0600

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Mathieu Tarral <mathieu.tarral@xxxxxxxxxxxxxx>

Delivery-date: Thu, 09 May 2019 17:47:09 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, May 9, 2019 at 10:43 AM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > > On 09/05/2019 17:19, Mathieu Tarral wrote: > > Le mardi, mai 7, 2019 2:01 PM, Mathieu Tarral > > <mathieu.tarral@xxxxxxxxxxxxxx> a écrit : > > > >>> Given how many EPT flushing bugs I've already found in this area, I > >>> wouldn't be surprised if there are further ones lurking. If it is an EPT > >>> flushing bug, this delta should make it go away, but it will come with a > >>> hefty perf hit. > >>> > >>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > >>> index 283eb7b..019333d 100644 > >>> --- a/xen/arch/x86/hvm/vmx/vmx.c > >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c > >>> @@ -4285,9 +4285,7 @@ bool vmx_vmenter_helper(const struct cpu_user_regs > >>> *regs) > >>> } > >>> } > >>> > >>> - if ( inv ) > >>> - __invept(inv == 1 ? INVEPT_SINGLE_CONTEXT : > >>> INVEPT_ALL_CONTEXT, > >>> - inv == 1 ? single->eptp : 0); > >>> + __invept(INVEPT_ALL_CONTEXT, 0); > >>> } > >>> > >>> out: > >> I can give this a try, and see if it resolves the problem ! > > Just tested, on Xen 4.12.0, and the bug is still here. > > Windows 7 is having BSODs with 4 VCPUs. > > I didn't noticed a hefty performance impact though. > > > > Do we have other caches to invalidate ? > > Something else that i should test ? > > > > I don't feel comfortable digging into Xen's code, especially for something > > as complicated as page table and memory management, > > increased by the complexity of altp2m. > > What i can do however, is test your ideas and patches and report the > > information I can gather on this issue. > > > > Note: I tested with the latest commits on Drakvuf/master, especially: > > "Add a VM pause for shadow copy refresh operation" > > https://github.com/tklengyel/drakvuf/pull/626 > > > > @tamas, did you made this patch to fix these kind of race conditions issue > > that i'm reporting ? > > Or was it totally unrelated ? > > With the above change in place and BSODs still happening, I'm fairly > convinced that it not a TLB flushing issue. > > Therefore, the conclusion to draw is that it is a logical bug somewhere. I agree. > > First of all - ensure you are using up-to-date microcode. The number of > errata which have been discovered by people associated with the Xen > community is large. > > The microcode is available from > https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/ and > https://andrewcoop-xen.readthedocs.io/en/latest/admin-guide/microcode-loading.html > is some documentation I prepared earlier. > > Beyond that, I think it would help to know exactly how libvmi is > manipulating the guest. I already suggested to Mathieu to try to reproduce the issue using the xen-access test tool that's in the Xen tree to cut out all that complexity. Without being able to limit the scope of the bug and being able to reproducible trigger it I see little chance of finding the root cause. Unfortunately I don't have the time to do that myself. Tamas _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.