[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/svm: do not try to handle recalc NPT faults immediately


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
  • Date: Fri, 29 May 2020 16:24:59 +0100
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, roger.pau@xxxxxxxxxx, george.dunlap@xxxxxxxxxx, wl@xxxxxxx, andrew.cooper3@xxxxxxxxxx
  • Delivery-date: Fri, 29 May 2020 15:25:06 +0000
  • Ironport-sdr: eMASQgUNIxf6utwX+JKKmoRpxE9KsHtdpvm18Bo4OmtjLV+g5giO0c71kQ08KY8kuJEldjFzpj VYkqlU12OyVy2SK+GIdbxyljSqc+WMe9H4rXxoPc+OSsaZNK6uwdqzKP/k8rTXnop6qNEIHBAV 4zrb1uXgPxcBUpEIUsxZuKrCn1MsE12NyH7vOw+rzzwCL8RlrprbeE/6cj3yUVi8StJMcDrlq+ zARNMC96YL6LGCWcBIYl/ZxW95JuylGKpOGxOwXzRXQztHoWfo0iA8+NToVxKtqf0gToJukfy6 R0s=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 29/05/2020 16:17, Igor Druzhinin wrote:
> On 29/05/2020 15:34, Jan Beulich wrote:
>> On 29.05.2020 02:35, Igor Druzhinin wrote:
>>> A recalculation NPT fault doesn't always require additional handling
>>> in hvm_hap_nested_page_fault(), moreover in general case if there is no
>>> explicit handling done there - the fault is wrongly considered fatal.
>>>
>>> Instead of trying to be opportunistic - use safer approach and handle
>>> P2M recalculation in a separate NPT fault by attempting to retry after
>>> making the necessary adjustments. This is aligned with Intel behavior
>>> where there are separate VMEXITs for recalculation and EPT violations
>>> (faults) and only faults are handled in hvm_hap_nested_page_fault().
>>> Do it by also unifying do_recalc return code with Intel implementation
>>> where returning 1 means P2M was actually changed.
>>>
>>> This covers a specific case of migration with vGPU assigned on AMD:
>>> global log-dirty is enabled and causes immediate recalculation NPT
>>> fault in MMIO area upon access.
>>
>> To be honest, from this last paragraph I still can't really derive
>> what goes wrong exactly why, before this change.
> 
> I admit it might require some knowledge of how vGPU is implemented. I will try
> to give more info in this paragraph.
> 
>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>>> ---
>>> This is a safer alternative to:
>>> https://lists.xenproject.org/archives/html/xen-devel/2020-05/msg01662.html
>>> and more correct approach from my PoV.
>>
>> Indeed - I was about to reply there, but then I thought I'd first
>> look at this patch, in case it was a replacement.
>>
>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>> @@ -2923,9 +2923,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>>>              v->arch.hvm.svm.cached_insn_len = vmcb->guest_ins_len & 0xf;
>>>          rc = vmcb->exitinfo1 & PFEC_page_present
>>>               ? p2m_pt_handle_deferred_changes(vmcb->exitinfo2) : 0;
>>> -        if ( rc >= 0 )
>>> +        if ( rc == 0 )
>>> +            /* If no recal adjustments were being made - handle this fault 
>>> */
>>>              svm_do_nested_pgfault(v, regs, vmcb->exitinfo1, 
>>> vmcb->exitinfo2);
>>> -        else
>>> +        else if ( rc < 0 )
>>
>> So from going through the code and judging by the comment in
>> finish_type_change() (which btw you will need to update, to avoid
>> it becoming stale) the >= here was there just in case, without
>> there actually being any case where a positive value would be
>> returned. It that's also the conclusion you've drawn, then I
>> think it would help mentioning this in the description.
> 
> I re-read the comments in finish_type_change() and to me they look
> pretty much in line with the now common interface between EPT and NPT
> recalc calls. 

Sorry, upon close examination there is indeed a new case missed. Thanks
for pointing out.

Igor



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.