[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/mm: drop bogus assertion
On 30/11/17 09:10, Jan Beulich wrote: > Olaf has observed this assertion to trigger after an aborted migration > of a PV guest (it remains to be determined why there is a page fault in > the first place here: > > (XEN) Xen call trace: > (XEN) [<ffff82d0802a85dc>] do_page_fault+0x39f/0x55c > (XEN) [<ffff82d08036b7d8>] x86_64/entry.S#handle_exception_saved+0x66/0xa4 > (XEN) [<ffff82d0802a9274>] __copy_to_user_ll+0x22/0x30 > (XEN) [<ffff82d0802772d4>] update_runstate_area+0x19c/0x228 > (XEN) [<ffff82d080277371>] domain.c#_update_runstate_area+0x11/0x39 > (XEN) [<ffff82d080277596>] context_switch+0x1fd/0xf25 > (XEN) [<ffff82d0802395c5>] schedule.c#schedule+0x303/0x6a8 > (XEN) [<ffff82d08023d067>] softirq.c#__do_softirq+0x6c/0x95 > (XEN) [<ffff82d08023d0da>] do_softirq+0x13/0x15 > (XEN) [<ffff82d08036b2f1>] x86_64/entry.S#process_softirqs+0x21/0x30 > > i.e. the guest specified a runstate area address which has a non-present > mapping in the page tables [EC=0002 CR2=ffff88003d405220], but that's > not something the hypervisor needs to be concerned about.) Release > builds work fine, which is a first indication that the assertion isn't > really needed. > > What's worse though - there appears to be a timing window where the > guest runs in shadow mode, but not in log-dirty mode, and that is what > triggers the assertion (the same could, afaict, be achieved by test- > enabling shadow mode on a PV guest). This is because turing off log- > dirty mode is being performed in two steps: First the log-dirty bit gets > cleared (paging_log_dirty_disable() [having paused the domain] -> > sh_disable_log_dirty() -> shadow_one_bit_disable()), followed by > unpausing the domain and only then clearing shadow mode (via > shadow_test_disable(), which pauses the domain a second time). > > Hence besides removing the ASSERT() here (or optionally replacing it by > explicit translate and refcounts mode checks, but this seems rather > pointless now that the three are tied together) I wonder whether either > shadow_one_bit_disable() should turn off shadow mode if no other bit > besides PG_SH_enable remains set (just like shadow_one_bit_enable() > enables it if not already set), or the domain pausing scope should be > extended so that both steps occur without the domain getting a chance to > run in between. > > Reported-by: Olaf Hering <olaf@xxxxxxxxx> > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -1338,12 +1338,8 @@ static int fixup_page_fault(unsigned lon > */ > if ( paging_mode_enabled(d) && !paging_mode_external(d) ) > { > - int ret; > + int ret = paging_fault(addr, regs); > > - /* Logdirty mode is the only expected paging mode for PV guests. */ > - ASSERT(paging_mode_only_log_dirty(d)); This assert was introduced for a specific reason, as a result of d5c251c22b7, although the later bugfix 7b9f21cabc complicated things somewhat. The observation about shadow/logdirty happening in two phases does invalidate the check in this form. Its original purpose was to check that we hadn't slipped into PV autotranslate mode, and I think that is still worth keeping around. How about reducing it to just a simple ASSERT(!paging_mode_translate(d)) ? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |