[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 106698: regressions - FAIL



On Thu, 2017-03-16 at 05:15 -0600, Jan Beulich wrote:
> > > > On 16.03.17 at 10:03, <osstest-admin@xxxxxxxxxxxxxx> wrote:
> > 
> > flight 106698 xen-unstable real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/106698/ 
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-amd64-amd64-qemuu-nested-intel 16 debian-hvm-install/l1/l2 fail REGR. 
> > vs. 106652
> 
> While there's quite a bit of stuff under test, your recent vVMX series
> would seem to be the most likely candidate for a regression here. I
> am, however, puzzled by
> 
> (XEN) d1v0 VMLAUNCH error: 0
> (XEN) domain_crash_sync called from vmcs.c:1712
> 
> in the L1 log - error 0 is supposed to be "no error", and I can't see
> how VM_INSTRUCTION_ERROR would ever be written to zero.
> Which leaves there being a path (which I can't spot) where it's not
> being written, or a problem handling the respective vmread by the
> guest.
> 
> Could you take a look, please?

L1:vmlaunch failed and vmx_vmentry_failure() was called.  However it
doesn't check if the fail was Valid or Invalid. In the latter case
VM_INSTRUCTION_ERROR would be meaningless.

There are only 2 cases for vmfail_invalid() inside nvmx_handle_vmlaunch():

1. if ( vcpu_nestedhvm(v).nv_vvmcxaddr == INVALID_PADDR )

   That would imply that L0:nvmx_handle_vmptrld() returned VMfail
   and L1:__vmptrld() hit BUG() which is not the case.

2. if ( nvmx->shadow_vmcs )

I have identified one possible issue with that.  H/W looks like Haswell
and L0 has:

    (XEN)  - VMCS shadowing

However L1 is missing "VMCS shadowing" in "VMX advanced features".
I didn't expect that fact since L1 sees VMX_MISC_VMWRITE_ALL
in MSR_IA32_VMX_MISC.  It must be something else that prevents L1 from
enabling vmcs shadowing.

Above makes the follwing check inside nvmx_handle_vmptrld() incorrect:

    (!cpu_has_vmx_vmcs_shadowing && nvmx->shadow_vmcs)

Since cpu_has_vmx_vmcs_shadowing tests L0's capability and not L1's.

Shadow bit will be set by L0:nvmx_set_vmcs_pointer() which might
suggest that there are other cases with nvmx_handle_vmptrld() re-entrancy
that I have missed. If the following scenario is possible:

    nvmx_handle_vmptrld()
        nvcpu->nv_vvmcxaddr == INVALID_PADDR
            nvmx->shadow_vmcs = false
            vvmcs->vmcs_revision_id |= VMCS_RID_TYPE_MASK;

    // no nvmx_clear_vmcs_pointer() in between

    nvmx_handle_vmptrld()
        nvcpu->nv_vvmcxaddr == INVALID_PADDR
            nvmx->shadow_vmcs = true
            (!cpu_has_vmx_vmcs_shadowing && nvmx->shadow_vmcs) == false

    nvmx_handle_vmlaunch()
        nvmx->shadow_vmcs == true
            vmfail_invalid(regs);

Then it would explain the regression.

-- 
Thanks,
Sergey
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.