[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Ping: [PATCH] x86/mm: drop bogus assertion



>>> On 30.11.17 at 10:10, <JBeulich@xxxxxxxx> wrote:
> Olaf has observed this assertion to trigger after an aborted migration
> of a PV guest (it remains to be determined why there is a page fault in
> the first place here:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0802a85dc>] do_page_fault+0x39f/0x55c
> (XEN)    [<ffff82d08036b7d8>] x86_64/entry.S#handle_exception_saved+0x66/0xa4
> (XEN)    [<ffff82d0802a9274>] __copy_to_user_ll+0x22/0x30
> (XEN)    [<ffff82d0802772d4>] update_runstate_area+0x19c/0x228
> (XEN)    [<ffff82d080277371>] domain.c#_update_runstate_area+0x11/0x39
> (XEN)    [<ffff82d080277596>] context_switch+0x1fd/0xf25
> (XEN)    [<ffff82d0802395c5>] schedule.c#schedule+0x303/0x6a8
> (XEN)    [<ffff82d08023d067>] softirq.c#__do_softirq+0x6c/0x95
> (XEN)    [<ffff82d08023d0da>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d08036b2f1>] x86_64/entry.S#process_softirqs+0x21/0x30
> 
> i.e. the guest specified a runstate area address which has a non-present
> mapping in the page tables [EC=0002 CR2=ffff88003d405220], but that's
> not something the hypervisor needs to be concerned about.) Release
> builds work fine, which is a first indication that the assertion isn't
> really needed.
> 
> What's worse though - there appears to be a timing window where the
> guest runs in shadow mode, but not in log-dirty mode, and that is what
> triggers the assertion (the same could, afaict, be achieved by test-
> enabling shadow mode on a PV guest). This is because turing off log-
> dirty mode is being performed in two steps: First the log-dirty bit gets
> cleared (paging_log_dirty_disable() [having paused the domain] ->
> sh_disable_log_dirty() -> shadow_one_bit_disable()), followed by
> unpausing the domain and only then clearing shadow mode (via
> shadow_test_disable(), which pauses the domain a second time).
> 
> Hence besides removing the ASSERT() here (or optionally replacing it by
> explicit translate and refcounts mode checks, but this seems rather
> pointless now that the three are tied together) I wonder whether either
> shadow_one_bit_disable() should turn off shadow mode if no other bit
> besides PG_SH_enable remains set (just like shadow_one_bit_enable()
> enables it if not already set), or the domain pausing scope should be
> extended so that both steps occur without the domain getting a chance to
> run in between.
> 
> Reported-by: Olaf Hering <olaf@xxxxxxxxx>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> 
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1338,12 +1338,8 @@ static int fixup_page_fault(unsigned lon
>       */
>      if ( paging_mode_enabled(d) && !paging_mode_external(d) )
>      {
> -        int ret;
> +        int ret = paging_fault(addr, regs);
>  
> -        /* Logdirty mode is the only expected paging mode for PV guests. */
> -        ASSERT(paging_mode_only_log_dirty(d));
> -
> -        ret = paging_fault(addr, regs);

Andrew had responded that he'd prefer the assertion to be
converted to !paging_mode_translate(d), but before possibly
going ahead to do so I'd really like to have your opinion here as
well as on the wider shadow mode related aspect explained in
the description, since there other options to address the issue
(perhaps even leaving the original assertion in place, if test-
enabling of shadow mode was meant to be a development/
debug-only feature).

Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.