[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/svm: do not try to handle recalc NPT faults immediately


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
  • Date: Fri, 29 May 2020 16:17:51 +0100
  • Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, roger.pau@xxxxxxxxxx, george.dunlap@xxxxxxxxxx, wl@xxxxxxx, andrew.cooper3@xxxxxxxxxx
  • Delivery-date: Fri, 29 May 2020 15:18:01 +0000
  • Ironport-sdr: gzE/Y/bcAQE6Hc/ZTh1X3ETkde1v3Ul5V/2yVBkTH6i3ZEA5z6XgmiS46wVLlICUHocVigImHc k36YYUwFjfcE1sZeIVWwtdW+kL1HMLMiJiF/yp5oYX8r8fiJWOROVRmKShYpCmwB3RDW7Rglet 28gROmxIWkwVl+nL8CoKEQ+8wOsgF4sj/bR+8nyINezUzYwDJhvVeHB94HCicM0Ynq0P9p4z6A LkXoRewx+D4/p9sjCt+zqku0ONExI71e1BIFWGYiCA5/5KEiuEVzZ6UZSfirBSu/46tZfEeZkl 1qw=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 29/05/2020 15:34, Jan Beulich wrote:
> On 29.05.2020 02:35, Igor Druzhinin wrote:
>> A recalculation NPT fault doesn't always require additional handling
>> in hvm_hap_nested_page_fault(), moreover in general case if there is no
>> explicit handling done there - the fault is wrongly considered fatal.
>>
>> Instead of trying to be opportunistic - use safer approach and handle
>> P2M recalculation in a separate NPT fault by attempting to retry after
>> making the necessary adjustments. This is aligned with Intel behavior
>> where there are separate VMEXITs for recalculation and EPT violations
>> (faults) and only faults are handled in hvm_hap_nested_page_fault().
>> Do it by also unifying do_recalc return code with Intel implementation
>> where returning 1 means P2M was actually changed.
>>
>> This covers a specific case of migration with vGPU assigned on AMD:
>> global log-dirty is enabled and causes immediate recalculation NPT
>> fault in MMIO area upon access.
> 
> To be honest, from this last paragraph I still can't really derive
> what goes wrong exactly why, before this change.

I admit it might require some knowledge of how vGPU is implemented. I will try
to give more info in this paragraph.

>> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>> ---
>> This is a safer alternative to:
>> https://lists.xenproject.org/archives/html/xen-devel/2020-05/msg01662.html
>> and more correct approach from my PoV.
> 
> Indeed - I was about to reply there, but then I thought I'd first
> look at this patch, in case it was a replacement.
> 
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -2923,9 +2923,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>>              v->arch.hvm.svm.cached_insn_len = vmcb->guest_ins_len & 0xf;
>>          rc = vmcb->exitinfo1 & PFEC_page_present
>>               ? p2m_pt_handle_deferred_changes(vmcb->exitinfo2) : 0;
>> -        if ( rc >= 0 )
>> +        if ( rc == 0 )
>> +            /* If no recal adjustments were being made - handle this fault 
>> */
>>              svm_do_nested_pgfault(v, regs, vmcb->exitinfo1, 
>> vmcb->exitinfo2);
>> -        else
>> +        else if ( rc < 0 )
> 
> So from going through the code and judging by the comment in
> finish_type_change() (which btw you will need to update, to avoid
> it becoming stale) the >= here was there just in case, without
> there actually being any case where a positive value would be
> returned. It that's also the conclusion you've drawn, then I
> think it would help mentioning this in the description.

I re-read the comments in finish_type_change() and to me they look
pretty much in line with the now common interface between EPT and NPT
recalc calls. 

Ok, I will point out that I concluded there was no additional intent
of necessarily calling svm_do_nested_pgfault() after recalc.

> It is also desirable to mention finish_type_change() not being
> affected, as already dealing with the > 0 case.

Will mention that as well.

>> --- a/xen/arch/x86/mm/p2m-pt.c
>> +++ b/xen/arch/x86/mm/p2m-pt.c
>> @@ -340,7 +340,7 @@ static int do_recalc(struct p2m_domain *p2m, unsigned 
>> long gfn)
>>      unsigned long gfn_remainder = gfn;
>>      unsigned int level = 4;
>>      l1_pgentry_t *pent;
>> -    int err = 0;
>> +    int err = 0, rc = 0;
>>  
>>      table = map_domain_page(pagetable_get_mfn(p2m_get_pagetable(p2m)));
>>      while ( --level )
>> @@ -402,6 +402,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned 
>> long gfn)
>>                  clear_recalc(l1, e);
>>                  err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
>>                  ASSERT(!err);
>> +
>> +                rc = 1;
>>              }
>>          }
>>          unmap_domain_page((void *)((unsigned long)pent & PAGE_MASK));
>> @@ -448,12 +450,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned 
>> long gfn)
>>              clear_recalc(l1, e);
>>          err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
>>          ASSERT(!err);
>> +
>> +        rc = 1;
>>      }
>>  
>>   out:
>>      unmap_domain_page(table);
>>  
>> -    return err;
>> +    return err ? err : rc;
> 
> Typically we write this as "err ?: rc". I'd like to ask that "rc" also
> be renamed, to something like "recalc_done", and then to become bool.

Sure.

Igor
 



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.