[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
  • Date: Fri, 22 May 2020 11:27:38 +0100
  • Authentication-results: esa2.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, wl@xxxxxxx, jbeulich@xxxxxxxx, andrew.cooper3@xxxxxxxxxx
  • Delivery-date: Fri, 22 May 2020 10:28:04 +0000
  • Ironport-sdr: 2Wg7ssBM/W1QFfjkaBmKGEA0nhL9lqdJdx1//WCDmf09/2x8MkjW+CVgw7o5NSkFBsQ7/087LY KqrJOHwi66w8+q0s7I/cjr90EvW6JQ8V9M2i+k6P2LkoEeoA+fQD3HpNtOr9FuYSSr46RUkwuk /XLJdT/1gUJJrnliK2B2zOe1dJPTUQhqKFb/A58osTid8vvWxqJzR53J51MivFDqCr1cVqTRWU ylD/i3ObDp7jYj7gKhmKzdfV3RWIGXi4Jp8ahRiUtS+gJY00wPld6XZrXXWzOPw/RMywwBkCyn W/Y=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22/05/2020 11:23, Roger Pau Monné wrote:
> On Fri, May 22, 2020 at 11:14:24AM +0100, Igor Druzhinin wrote:
>> On 22/05/2020 11:08, Roger Pau Monné wrote:
>>> On Thu, May 21, 2020 at 10:43:58PM +0100, Igor Druzhinin wrote:
>>>> If a recalculation NPT fault hasn't been handled explicitly in
>>>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
>>>> US bit has been re-instated in PTE and any real fault would be correctly
>>>> re-raised next time.
>>>>
>>>> This covers a specific case of migration with vGPU assigned on AMD:
>>>> global log-dirty is enabled and causes immediate recalculation NPT
>>>> fault in MMIO area upon access. This type of fault isn't described
>>>> explicitly in hvm_hap_nested_page_fault (this isn't called on
>>>> EPT misconfig exit on Intel) which results in domain crash.
>>>
>>> Couldn't direct MMIO regions be handled like other types of memory for
>>> the purposes of logdiry mode?
>>>
>>> I assume there's already a path here used for other memory types when
>>> logdirty is turned on, and hence would seem better to just make direct
>>> MMIO regions also use that path?
>>
>> The proble of handling only MMIO case is that the issue still stays.
>> It will be hit with some other memory type since it's not MMIO specific.
>> The issue is that if global recalculation is called, the next hit to
>> this type will cause a transient fault which will not be handled
>> correctly after a due fixup by neither of our handlers.
> 
> I admit I should go look at the code, but for example RAM p2m types
> don't require this fix, so I assume there's some different path taken
> in that case that avoids all this?
> 
> Ie: when global logdirty is enabled you will start to get nested page
> faults for every access, yet only direct MMIO types require this fix?

It's not "only MMIO" - it's just MMIO area is hit in my particular case.
I'd prefer this fix to address the general issue otherwise for SVM
we would have to write handlers in hvm_hap_nested_page_fault() for
every case as soon as we hit it.

Igor



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.