[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation

  • To: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 22 May 2020 12:23:39 +0200
  • Authentication-results: esa3.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, wl@xxxxxxx, jbeulich@xxxxxxxx, andrew.cooper3@xxxxxxxxxx
  • Delivery-date: Fri, 22 May 2020 10:23:49 +0000
  • Ironport-sdr: s/bgTSOTcEaJVbv6mKIF/XXCZ9sFNoWnbzQnvccLWj08Gw8huL5npkPCnARoqQ8wcd4r8jiH5/ tWbJD8Zkn8L9RBacqZeDrY555faKVNtKW1fkW8V+Wv8+TcVm3Zg3zRcoqBt2I3Obm4NMG++Hwk L1wBekOd7kORuGTgoSHYFmdS4bd6ab3B/g6KepU+LoCbs/i+gL7SM3e//1zvoJIagZt+VijCdZ nx+byZCBFRfhwdEh5l/o1hvPzKsys50g+YR/vrjIz8cR/Vg8CVDIekN3+SHvYotSvvhvo8jXTu ixU=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, May 22, 2020 at 11:14:24AM +0100, Igor Druzhinin wrote:
> On 22/05/2020 11:08, Roger Pau Monné wrote:
> > On Thu, May 21, 2020 at 10:43:58PM +0100, Igor Druzhinin wrote:
> >> If a recalculation NPT fault hasn't been handled explicitly in
> >> hvm_hap_nested_page_fault() then it's potentially safe to retry -
> >> US bit has been re-instated in PTE and any real fault would be correctly
> >> re-raised next time.
> >>
> >> This covers a specific case of migration with vGPU assigned on AMD:
> >> global log-dirty is enabled and causes immediate recalculation NPT
> >> fault in MMIO area upon access. This type of fault isn't described
> >> explicitly in hvm_hap_nested_page_fault (this isn't called on
> >> EPT misconfig exit on Intel) which results in domain crash.
> > 
> > Couldn't direct MMIO regions be handled like other types of memory for
> > the purposes of logdiry mode?
> > 
> > I assume there's already a path here used for other memory types when
> > logdirty is turned on, and hence would seem better to just make direct
> > MMIO regions also use that path?
> The proble of handling only MMIO case is that the issue still stays.
> It will be hit with some other memory type since it's not MMIO specific.
> The issue is that if global recalculation is called, the next hit to
> this type will cause a transient fault which will not be handled
> correctly after a due fixup by neither of our handlers.

I admit I should go look at the code, but for example RAM p2m types
don't require this fix, so I assume there's some different path taken
in that case that avoids all this?

Ie: when global logdirty is enabled you will start to get nested page
faults for every access, yet only direct MMIO types require this fix?

Thanks, Roger



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.