[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs



On Thu, May 9, 2019 at 10:43 AM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>
> On 09/05/2019 17:19, Mathieu Tarral wrote:
> > Le mardi, mai 7, 2019 2:01 PM, Mathieu Tarral 
> > <mathieu.tarral@xxxxxxxxxxxxxx> a écrit :
> >
> >>> Given how many EPT flushing bugs I've already found in this area, I 
> >>> wouldn't be surprised if there are further ones lurking.  If it is an EPT 
> >>> flushing bug, this delta should make it go away, but it will come with a 
> >>> hefty perf hit.
> >>>
> >>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> >>> index 283eb7b..019333d 100644
> >>> --- a/xen/arch/x86/hvm/vmx/vmx.c
> >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >>> @@ -4285,9 +4285,7 @@ bool vmx_vmenter_helper(const struct cpu_user_regs 
> >>> *regs)
> >>>              }
> >>>          }
> >>>
> >>> -        if ( inv )
> >>> -            __invept(inv == 1 ? INVEPT_SINGLE_CONTEXT : 
> >>> INVEPT_ALL_CONTEXT,
> >>> -                     inv == 1 ? single->eptp          : 0);
> >>> +        __invept(INVEPT_ALL_CONTEXT, 0);
> >>>      }
> >>>
> >>>   out:
> >> I can give this a try, and see if it resolves the problem !
> > Just tested, on Xen 4.12.0, and the bug is still here.
> > Windows 7 is having BSODs with 4 VCPUs.
> > I didn't noticed a hefty performance impact though.
> >
> > Do we have other caches to invalidate ?
> > Something else that i should test ?
> >
> > I don't feel comfortable digging into Xen's code, especially for something 
> > as complicated as page table and memory management,
> > increased by the complexity of altp2m.
> > What i can do however, is test your ideas and patches and report the 
> > information I can gather on this issue.
> >
> > Note: I tested with the latest commits on Drakvuf/master, especially:
> > "Add a VM pause for shadow copy refresh operation"
> > https://github.com/tklengyel/drakvuf/pull/626
> >
> > @tamas, did you made this patch to fix these kind of race conditions issue 
> > that i'm reporting ?
> > Or was it totally unrelated ?
>
> With the above change in place and BSODs still happening, I'm fairly
> convinced that it not a TLB flushing issue.
>
> Therefore, the conclusion to draw is that it is a logical bug somewhere.

I agree.

>
> First of all - ensure you are using up-to-date microcode.  The number of
> errata which have been discovered by people associated with the Xen
> community is large.
>
> The microcode is available from
> https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/ and
> https://andrewcoop-xen.readthedocs.io/en/latest/admin-guide/microcode-loading.html
> is some documentation I prepared earlier.
>
> Beyond that, I think it would help to know exactly how libvmi is
> manipulating the guest.

I already suggested to Mathieu to try to reproduce the issue using the
xen-access test tool that's in the Xen tree to cut out all that
complexity. Without being able to limit the scope of the bug and being
able to reproducible trigger it I see little chance of finding the
root cause. Unfortunately I don't have the time to do that myself.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.