[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
  • Date: Fri, 22 May 2020 11:14:24 +0100
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, wl@xxxxxxx, jbeulich@xxxxxxxx, andrew.cooper3@xxxxxxxxxx
  • Delivery-date: Fri, 22 May 2020 10:14:35 +0000
  • Ironport-sdr: NfqJwmCDKLrTc+jp4yFEKqPI5eWD2h76oct4UKWna0tTU6a/EdOgeh6S73OwCRSjv4X6PCPkCJ fu0llXY3dX6+Lbj9ZFmAoLCt1Q/pxUIjpXR0pQcv4dyB+9s0QnV4held04X5z0kJNlCrt+ePQP hQEuZN7qPyBUnVoB597x+cHfrsfEyrA2MyszHN6BDXrMgxi8vUW6VPzoNpdveO7SckpS9p6nkS AIVXWSwGvi2JKc84tw5/UIDg3joEn95wTRPT3woHxIlRExklzudx4c23W1E2PcccAr/eJT/k3p ZKI=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22/05/2020 11:08, Roger Pau Monné wrote:
> On Thu, May 21, 2020 at 10:43:58PM +0100, Igor Druzhinin wrote:
>> If a recalculation NPT fault hasn't been handled explicitly in
>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
>> US bit has been re-instated in PTE and any real fault would be correctly
>> re-raised next time.
>>
>> This covers a specific case of migration with vGPU assigned on AMD:
>> global log-dirty is enabled and causes immediate recalculation NPT
>> fault in MMIO area upon access. This type of fault isn't described
>> explicitly in hvm_hap_nested_page_fault (this isn't called on
>> EPT misconfig exit on Intel) which results in domain crash.
> 
> Couldn't direct MMIO regions be handled like other types of memory for
> the purposes of logdiry mode?
> 
> I assume there's already a path here used for other memory types when
> logdirty is turned on, and hence would seem better to just make direct
> MMIO regions also use that path?

The proble of handling only MMIO case is that the issue still stays.
It will be hit with some other memory type since it's not MMIO specific.
The issue is that if global recalculation is called, the next hit to
this type will cause a transient fault which will not be handled
correctly after a due fixup by neither of our handlers.

>> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>> ---
>>  xen/arch/x86/hvm/svm/svm.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>> index 46a1aac..f0d0bd3 100644
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
>>          /* inject #VMEXIT(NPF) into guest. */
>>          nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
>>          return;
>> +    case 0:
>> +        /* If a recalculation page fault hasn't been handled - just retry. 
>> */
>> +        if ( pfec & PFEC_user_mode )
>> +            return;
> 
> I'm slightly worried that this diverges from the EPT implementation
> now, in the sense that returning 0 from hvm_hap_nested_page_fault will
> no longer trigger a guest crash.

My second alternative from my follow up email addresses this. I also
didn't like this aspect.

Igor



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.