[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 106698: regressions - FAIL



On 16/03/17 14:26, Sergey Dyasli wrote:
> On Thu, 2017-03-16 at 05:15 -0600, Jan Beulich wrote:
>>>>> On 16.03.17 at 10:03, <osstest-admin@xxxxxxxxxxxxxx> wrote:
>>> flight 106698 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/106698/ 
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking,
>>> including tests which could not be run:
>>>  test-amd64-amd64-qemuu-nested-intel 16 debian-hvm-install/l1/l2 fail REGR. 
>>> vs. 106652
>> While there's quite a bit of stuff under test, your recent vVMX series
>> would seem to be the most likely candidate for a regression here. I
>> am, however, puzzled by
>>
>> (XEN) d1v0 VMLAUNCH error: 0
>> (XEN) domain_crash_sync called from vmcs.c:1712
>>
>> in the L1 log - error 0 is supposed to be "no error", and I can't see
>> how VM_INSTRUCTION_ERROR would ever be written to zero.
>> Which leaves there being a path (which I can't spot) where it's not
>> being written, or a problem handling the respective vmread by the
>> guest.
>>
>> Could you take a look, please?
> L1:vmlaunch failed and vmx_vmentry_failure() was called.  However it
> doesn't check if the fail was Valid or Invalid. In the latter case
> VM_INSTRUCTION_ERROR would be meaningless.
>
> There are only 2 cases for vmfail_invalid() inside nvmx_handle_vmlaunch():
>
> 1. if ( vcpu_nestedhvm(v).nv_vvmcxaddr == INVALID_PADDR )
>
>    That would imply that L0:nvmx_handle_vmptrld() returned VMfail
>    and L1:__vmptrld() hit BUG() which is not the case.
>
> 2. if ( nvmx->shadow_vmcs )
>
> I have identified one possible issue with that.  H/W looks like Haswell
> and L0 has:
>
>     (XEN)  - VMCS shadowing
>
> However L1 is missing "VMCS shadowing" in "VMX advanced features".
> I didn't expect that fact since L1 sees VMX_MISC_VMWRITE_ALL
> in MSR_IA32_VMX_MISC.  It must be something else that prevents L1 from
> enabling vmcs shadowing.
>
> Above makes the follwing check inside nvmx_handle_vmptrld() incorrect:
>
>     (!cpu_has_vmx_vmcs_shadowing && nvmx->shadow_vmcs)
>
> Since cpu_has_vmx_vmcs_shadowing tests L0's capability and not L1's.
>
> Shadow bit will be set by L0:nvmx_set_vmcs_pointer() which might
> suggest that there are other cases with nvmx_handle_vmptrld() re-entrancy
> that I have missed. If the following scenario is possible:
>
>     nvmx_handle_vmptrld()
>         nvcpu->nv_vvmcxaddr == INVALID_PADDR
>             nvmx->shadow_vmcs = false
>             vvmcs->vmcs_revision_id |= VMCS_RID_TYPE_MASK;
>
>     // no nvmx_clear_vmcs_pointer() in between
>
>     nvmx_handle_vmptrld()
>         nvcpu->nv_vvmcxaddr == INVALID_PADDR
>             nvmx->shadow_vmcs = true
>             (!cpu_has_vmx_vmcs_shadowing && nvmx->shadow_vmcs) == false
>
>     nvmx_handle_vmlaunch()
>         nvmx->shadow_vmcs == true
>             vmfail_invalid(regs);
>
> Then it would explain the regression.

Ok - we should revert dc05c0ceeb8609b6d60f6a117a0192e9160946b8 and
b22ee98c4ecc4e7c827451dee01181529df4d26c to unblock master.

I will get to this shortly, unless there are sudden objections.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.