[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

On 22.05.2020 12:19, Andrew Cooper wrote:
> On 22/05/2020 11:05, Igor Druzhinin wrote:
>> On 22/05/2020 10:45, Andrew Cooper wrote:
>>> On 21/05/2020 22:43, Igor Druzhinin wrote:
>>>> If a recalculation NPT fault hasn't been handled explicitly in
>>>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
>>>> US bit has been re-instated in PTE and any real fault would be correctly
>>>> re-raised next time.
>>>> This covers a specific case of migration with vGPU assigned on AMD:
>>>> global log-dirty is enabled and causes immediate recalculation NPT
>>>> fault in MMIO area upon access. This type of fault isn't described
>>>> explicitly in hvm_hap_nested_page_fault (this isn't called on
>>>> EPT misconfig exit on Intel) which results in domain crash.
>>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>>>> ---
>>>>  xen/arch/x86/hvm/svm/svm.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>>>> index 46a1aac..f0d0bd3 100644
>>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>>> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
>>>>          /* inject #VMEXIT(NPF) into guest. */
>>>>          nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
>>>>          return;
>>>> +    case 0:
>>>> +        /* If a recalculation page fault hasn't been handled - just 
>>>> retry. */
>>>> +        if ( pfec & PFEC_user_mode )
>>>> +            return;
>>> This smells like it is a recipe for livelocks.
>>> Everything should have been handled properly by the call to
>>> p2m_pt_handle_deferred_changes() which precedes svm_do_nested_pgfault().
>>> It is legitimate for the MMIO mapping to end up being transiently
>>> recalculated, but the fact that p2m_pt_handle_deferred_changes() doesn't
>>> fix it up suggests that the bug is there.
>>> Do you have the complete NPT walk to the bad mapping? Do we have
>>> _PAGE_USER in the leaf mapping, or is this perhaps a spurious fault?
>> It does fix it up. The problem is that currently in SVM we enter
>> svm_do_nested_pgfault immediately after p2m_pt_handle_deferred_changes
>> is finished finished.
> Oh - so we do.  I'd read the entry condition for svm_do_nested_pgfault()
> incorrectly.
> Jan - why did you chose to do it this way?  If
> p2m_pt_handle_deferred_changes() has made a modification, there is
> surely nothing relevant to do in svm_do_nested_pgfault().

Why? There can very well be multiple reasons for a single exit
to occur, and hence it did seem to make sense to allow processing
of such possible further reasons right away, instead of re-
entering the guest just to deal with another VM exit. What I can
accept is that the error code may not correctly represent the
"remaining" fault reason anymore at this point.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.