[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom



On 05.06.2020 13:05, Paul Durrant wrote:
> Sorry, only just catching up with this...
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@xxxxxxxx>
>> Sent: 05 June 2020 10:09
>> To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; xen-devel 
>> <xen-devel@xxxxxxxxxxxxxxxxxxxx>; Paul
>> Durrant <paul@xxxxxxx>
>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in 
>> stubdom
>>
>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
>>>>>>> Then, we get the main issue:
>>>>>>>
>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
>>>>>>>     (XEN) domain_crash called from io.c:178
>>>>>>>
>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
>>>>>> be any I/O there anymore, and hence no I/O emulation)?
>>>>>
>>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
>>>>> intermittently set, and skip the pause in domain_shutdown()
>>>>>
>>>>> I presume this lack of pause is to allow the vcpu in question to still
>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
>>>>> 0 clarifying details).
>>>>>
>>>>> What *should* happen is that, after consuming the reply, the vcpu should
>>>>> notice and pause itself, at which point it would yield to the
>>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
>>>>>
>>>>> Evidentially, this is not happening.
>>>>
>>>> We can't tell yet, until ...
>>>>
>>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
>>>>> confirm whether we're getting into handle_pio() via the
>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
>>>>> we're fully re-entering the guest).
>>>>
>>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
>>>> instead of plain "true"?
>>>>
>>>> Adding Paul to Cc, as being the maintainer here.
>>>
>>> Got it, by sticking BUG() just before that domain_crash() in
>>> handle_pio(). And also vcpu 0 of both HVM domains do have
>>> v->defer_shutdown.
>>
>> As per the log they did get it set. I'd be curious of the flag's
>> value (as well as v->paused_for_shutdown's) at the point of the
>> problematic handle_pio() invocation (see below). It may be
>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
>> - before exiting to guest context (in order to then come back
>> and call handle_pio()) the vCPU ought to be getting through
>> there. No indication of it doing so would be a sign that there's
>> a code path bypassing the call to vcpu_end_shutdown_deferral().
>>
>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
>>
>> Perhaps in this message could you also log
>> v->domain->is_shutting_down, v->defer_shutdown, and
>> v->paused_for_shutdown? (Would be nice if, after having made
>> changes to your debugging patch, you could point again at the
>> precise version you've used for the log provided.)
>>
>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
>>> (XEN) Xen BUG at io.c:178
>>
>> Btw, instead of BUG(), WARN() or dump_execution_state() would
>> likely also do, keeping Xen alive.
>>
> 
> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it?

Where would this originate?

> That would mean we wouldn't be seeing the "Unexpected PIO" message. From that 
> message this clearly X86EMUL_UNHANDLEABLE which suggests a race with ioreq 
> server teardown, possibly due to selecting a server but then not finding a 
> vcpu match in ioreq_vcpu_list.

I was suspecting such, but at least the tearing down of all servers
happens only from relinquish-resources, which gets started only
after ->is_shut_down got set (unless the tool stack invoked
XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
set for the domain).

For individually unregistered servers - yes, if qemu did so, this
would be a problem. They need to remain registered until all vCPU-s
in the domain got paused.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.