[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom



> -----Original Message-----
> From: Paul Durrant <xadimgnik@xxxxxxxxx>
> Sent: 05 June 2020 14:49
> To: 'Jan Beulich' <jbeulich@xxxxxxxx>
> Cc: 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; 'Andrew 
> Cooper'
> <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx>
> Subject: RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in 
> stubdom
> 
> > -----Original Message-----
> > From: Jan Beulich <jbeulich@xxxxxxxx>
> > Sent: 05 June 2020 14:47
> > To: paul@xxxxxxx
> > Cc: 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; 'Andrew 
> > Cooper'
> > <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx>
> > Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in 
> > stubdom
> >
> > On 05.06.2020 15:43, Paul Durrant wrote:
> > >> -----Original Message-----
> > >> From: Jan Beulich <jbeulich@xxxxxxxx>
> > >> Sent: 05 June 2020 14:37
> > >> To: paul@xxxxxxx
> > >> Cc: 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; 
> > >> 'Andrew Cooper'
> > >> <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx>
> > >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 
> > >> in stubdom
> > >>
> > >> On 05.06.2020 13:25, Paul Durrant wrote:
> > >>>> -----Original Message-----
> > >>>> From: Paul Durrant <xadimgnik@xxxxxxxxx>
> > >>>> Sent: 05 June 2020 12:06
> > >>>> To: 'Jan Beulich' <jbeulich@xxxxxxxx>; 'Marek Marczykowski-Górecki'
> > >> <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > >>>> Cc: 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' 
> > >>>> <xen-devel@xxxxxxxxxxxxxxxxxxxx>
> > >>>> Subject: RE: handle_pio looping during domain shutdown, with qemu 
> > >>>> 4.2.0 in stubdom
> > >>>>
> > >>>> Sorry, only just catching up with this...
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Jan Beulich <jbeulich@xxxxxxxx>
> > >>>>> Sent: 05 June 2020 10:09
> > >>>>> To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > >>>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; xen-devel 
> > >>>>> <xen-devel@xxxxxxxxxxxxxxxxxxxx>;
> Paul
> > >>>>> Durrant <paul@xxxxxxx>
> > >>>>> Subject: Re: handle_pio looping during domain shutdown, with qemu 
> > >>>>> 4.2.0 in stubdom
> > >>>>>
> > >>>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> > >>>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> > >>>>>>> On 04.06.2020 13:13, Andrew Cooper wrote:
> > >>>>>>>> On 04/06/2020 08:08, Jan Beulich wrote:
> > >>>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> > >>>>>>>>>> Then, we get the main issue:
> > >>>>>>>>>>
> > >>>>>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> > >>>>>>>>>>     (XEN) domain_crash called from io.c:178
> > >>>>>>>>>>
> > >>>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its 
> > >>>>>>>>>> stubdom
> > >>>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called 
> > >>>>>>>>>> already.
> > >>>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
> > >>>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
> > >>>>>>>>> CPUs (I assume it didn't, since once they're paused there 
> > >>>>>>>>> shouldn't
> > >>>>>>>>> be any I/O there anymore, and hence no I/O emulation)?
> > >>>>>>>>
> > >>>>>>>> The vcpu in question is talking to Qemu, so will have 
> > >>>>>>>> v->defer_shutdown
> > >>>>>>>> intermittently set, and skip the pause in domain_shutdown()
> > >>>>>>>>
> > >>>>>>>> I presume this lack of pause is to allow the vcpu in question to 
> > >>>>>>>> still
> > >>>>>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic 
> > >>>>>>>> with
> > >>>>>>>> 0 clarifying details).
> > >>>>>>>>
> > >>>>>>>> What *should* happen is that, after consuming the reply, the vcpu 
> > >>>>>>>> should
> > >>>>>>>> notice and pause itself, at which point it would yield to the
> > >>>>>>>> scheduler.  This is the purpose of 
> > >>>>>>>> vcpu_{start,end}_shutdown_deferral().
> > >>>>>>>>
> > >>>>>>>> Evidentially, this is not happening.
> > >>>>>>>
> > >>>>>>> We can't tell yet, until ...
> > >>>>>>>
> > >>>>>>>> Marek: can you add a BUG() after the weird PIO printing?  That 
> > >>>>>>>> should
> > >>>>>>>> confirm whether we're getting into handle_pio() via the
> > >>>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which 
> > >>>>>>>> case,
> > >>>>>>>> we're fully re-entering the guest).
> > >>>>>>>
> > >>>>>>> ... we know this. handle_pio() gets called from 
> > >>>>>>> handle_hvm_io_completion()
> > >>>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> > >>>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't 
> > >>>>>>> call
> > >>>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
> > >>>>>>> hvm_wait_for_io() should return 
> > >>>>>>> "!sv->vcpu->domain->is_shutting_down"
> > >>>>>>> instead of plain "true"?
> > >>>>>>>
> > >>>>>>> Adding Paul to Cc, as being the maintainer here.
> > >>>>>>
> > >>>>>> Got it, by sticking BUG() just before that domain_crash() in
> > >>>>>> handle_pio(). And also vcpu 0 of both HVM domains do have
> > >>>>>> v->defer_shutdown.
> > >>>>>
> > >>>>> As per the log they did get it set. I'd be curious of the flag's
> > >>>>> value (as well as v->paused_for_shutdown's) at the point of the
> > >>>>> problematic handle_pio() invocation (see below). It may be
> > >>>>> worthwhile to instrument vcpu_check_shutdown() (inside its if())
> > >>>>> - before exiting to guest context (in order to then come back
> > >>>>> and call handle_pio()) the vCPU ought to be getting through
> > >>>>> there. No indication of it doing so would be a sign that there's
> > >>>>> a code path bypassing the call to vcpu_end_shutdown_deferral().
> > >>>>>
> > >>>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> > >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> > >>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> > >>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> > >>>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> > >>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> > >>>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> > >>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > >>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> > >>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> > >>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> > >>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> > >>>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> > >>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> > >>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 
> > >>>>>> d6
> > >>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 
> > >>>>>> d6
> > >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> > >>>>>
> > >>>>> Perhaps in this message could you also log
> > >>>>> v->domain->is_shutting_down, v->defer_shutdown, and
> > >>>>> v->paused_for_shutdown? (Would be nice if, after having made
> > >>>>> changes to your debugging patch, you could point again at the
> > >>>>> precise version you've used for the log provided.)
> > >>>>>
> > >>>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> > >>>>>> (XEN) Xen BUG at io.c:178
> > >>>>>
> > >>>>> Btw, instead of BUG(), WARN() or dump_execution_state() would
> > >>>>> likely also do, keeping Xen alive.
> > >>>>>
> > >>>>
> > >>>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? 
> > >>>> That would mean we
> > wouldn't
> > >> be
> > >>>> seeing the "Unexpected PIO" message. From that message this clearly 
> > >>>> X86EMUL_UNHANDLEABLE which
> > >>>> suggests a race with ioreq server teardown, possibly due to selecting 
> > >>>> a server but then not
> > finding
> > >> a
> > >>>> vcpu match in ioreq_vcpu_list.
> > >>>
> > >>> Actually looking at remote_shutdown... the test of ( reason == 
> > >>> SHUTDOWN_crash ) and then
> clearing
> > >> defer_shutdown looks a bit odd... Just because the domain shutdown code 
> > >> has been set that way
> > doesn't
> > >> mean that a vcpu is not deferred in emulation; SCHEDOP_shutdown_code 
> > >> could easily be called from
> > one
> > >> vcpu whilst another has emulation pending.
> > >>
> > >> I'm confused: The deferral is of shutting down the domain, not of
> > >> a specific instance of emulation.
> > >
> > > Now I'm confused... defer_shutdown is per-vcpu.
> >
> > Right - each vCPU can individually defer shutting down of the domain
> > as a whole.
> >
> 
> Ok, I think we're only going to make more progress if we know exactly where 
> the X86EMUL_UNHANDLEABLE
> is coming from.
> 

This (untested) patch might help:

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index c55c4bc4bc..8aa8779ba2 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -109,12 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, 
uint64_t data)
     ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req;

     if ( hvm_ioreq_needs_completion(ioreq) )
-    {
-        ioreq->state = STATE_IORESP_READY;
         ioreq->data = data;
-    }
-    else
-        ioreq->state = STATE_IOREQ_NONE;

     msix_write_completion(v);
     vcpu_end_shutdown_deferral(v);
@@ -209,6 +204,9 @@ bool handle_hvm_io_completion(struct vcpu *v)
         }
     }

+    ioreq->state = hvm_ioreq_needs_completion(&vio->ioreq) ?
+        STATE_IORESP_READY : STATE_IOREQ_NONE;
+
     io_completion = vio->io_completion;
     vio->io_completion = HVMIO_no_completion;





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.