Xen project Mailing List

Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom

To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Date: Fri, 5 Jun 2020 16:13:11 +0200

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>

Delivery-date: Fri, 05 Jun 2020 14:13:25 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 05.06.2020 14:01, Marek Marczykowski-Górecki wrote: > On Fri, Jun 05, 2020 at 11:22:46AM +0200, Jan Beulich wrote: >> On 05.06.2020 11:09, Jan Beulich wrote: >>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote: >>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off. >>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001 >>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001 >>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0 >>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1 >>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done >>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off. >>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 >>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 >>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001 >>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001 >>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0 >>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1 >>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done >>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6 >>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6 >>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 >>> >>> Perhaps in this message could you also log >>> v->domain->is_shutting_down, v->defer_shutdown, and >>> v->paused_for_shutdown? >> >> And v->domain->is_shut_down please. > > Here it is: > > (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off. > (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 > defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0 > (XEN) d3v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 > defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0 > (XEN) d3v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 > defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0 > (XEN) d3v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 > defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0 > (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0 > (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1 > (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done > (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off. > (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 > defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0 > (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 0 > defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0 > (XEN) d1v0 handle_pio port 0xb004 write 0x0001 is_shutting_down 0 > defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0 > (XEN) d1v0 handle_pio port 0xb004 write 0x2001 is_shutting_down 0 > defer_shutdown 0 paused_for_shutdown 0 is_shut_down 0 > (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0 > (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1 > (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done > (XEN) grant_table.c:3702:d0v1 Grant release 0x3 ref 0x125 flags 0x2 d6 > (XEN) grant_table.c:3702:d0v1 Grant release 0x4 ref 0x126 flags 0x2 d6 > (XEN) d1v0 handle_pio port 0xb004 read 0x0000 is_shutting_down 1 > defer_shutdown 1 paused_for_shutdown 0 is_shut_down 0 To me this is a clear indication that we did exit to guest context with ->defer_shutdown set. What I'm missing from your debugging patch is logging when the default case of the first switch() in hvmemul_do_io() gets hit. I think I said yesterday that I consider this a fair candidate of where the X86EMUL_UNHANDLEABLE is coming from. On top of that, with what we've sort of learned today, could you log (or worse) any instances of handle_pio() getting called with ->defer_shutdown set? Afaict this should never happen, but you may hit this case earlier than for the call out of the VMEXIT handler, which would then move us closer to the root of the issue. With "(or worse)" I mean it could also be as heavy as BUG(), ... > Regarding BUG/WARN - do you think I could get any more info then? I > really don't mind crashing that system, it's a virtual machine > currently used only for debugging this issue. ... and the selection here really depends on what overall impact you expect. I.e. I'm with Andrew that BUG() may be the construct of choice if otherwise you get overly much output. In other cases it may allow you to hit the same case again, with perhaps slightly changed other state, giving further hints on where the issue starts. One thing that's not clear to me here: In the title you say handle_pio() loops, but with the domain getting crashed I can't seem to see that happening. Of course it may be a wrong understanding /interpretation of mine that it is the guest doing repeated I/O from/to port 0xb004. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.