[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xl shutdown --wait "racy"



On Wed, 2014-04-16 at 17:20 +0200, Sander Eikelenboom wrote:
> Wednesday, April 16, 2014, 5:02:50 PM, you wrote:
> 
> > On Wed, 2014-04-16 at 16:55 +0200, Sander Eikelenboom wrote:
> >> Wednesday, April 16, 2014, 4:33:30 PM, you wrote:
> >> 
> >> > On Wed, 2014-04-16 at 16:26 +0200, Sander Eikelenboom wrote:
> >> >> Wednesday, April 16, 2014, 4:13:59 PM, you wrote:
> >> >> 
> >> >> > On Wed, 2014-04-16 at 16:08 +0200, Sander Eikelenboom wrote:
> >> >> >> Hi Ian (C|J) Konrad,
> >> >> >> 
> >> >> >> I'm currently trying to workaround the 
> >> >> >> pci-(detach|assignable-remove) issues i 
> >> >> >> reported earlier. 
> >> >> >> 
> >> >> >> The workaround i thought of was:
> >> >> >> - shutting down the guest
> >> >> >> - starting it without 1 of the original devices passed through
> >> >> >> - use xl pci-assignable-remove and bind the device to the dom0 
> >> >> >> driver.
> >> >> >> 
> >> >> >> But during this i noticed that a "xl shutdown --wait" does wait .. 
> >> >> >> but returns:
> >> >> >> - Before the domain is removed from for instance "xl list", it still 
> >> >> >> listed there in 
> >> >> >> "--ps--" state.
> >> >> >> - before pciback has done it's restore config space magic.
> >> >> >> 
> >> >> >> So it seems the wait loop is exiting somewhat prematurely, is this 
> >> >> >> expected ? 
> >> >> 
> >> >> > It is waiting for the domain to be shutdown (state 's') not for the
> >> >> > domain to be destroyed. So it's doing what it said it would (I
> >> >> > appreciate you might not find this distinction helpful under the
> >> >> > circumstances...)
> >> >> 
> >> >> It's at least not entirely what i expected ;-)
> >> >> 
> >> >> Is it because there can be different "follow-up actions" due to the 
> >> >> "on_poweroff=" config option ?
> >> 
> >> > Not really, those are somewhat unrelated.
> >> 
> >> > shutdown and destroy are two distinct events. Once a domain has shutdown
> >> > (called the shutdown hypercall etc) it goes into state "shutdown" and an
> >> > event is generated from the hypervisor to the toolstack. The toolstack's
> >> > response to this is to actually destroy the domain, that is to tear down
> >> > the resources it is using etc.
> >> 
> >> > on_* only matter for the destroy phase since they tell the toolstack
> >> > what it should do (restart, preserve, really destroy etc).
> >> 
> >> Hmm ok, it should be called "--wait_until_halfway" then ;-)
> 
> > ;-)
> 
> >> On the more serious side .. would patches be accepted that:
> >> 
> >> a) differentiate when it returns from waiting based on the on_*
> >> 
> >>         preserve: this could probably stay as is .. after the shutdown 
> >> event
> >>         destroy:
> >>         restart:
> >>         rename-restart:
> >>         coredump-destroy:
> >>         coredump-restart:
> >> 
> >>         for the other ones .. i don't know if there actually are events in 
> >> libxl 
> >>         that could be 'easily' coupled ?
> 
> > Might be tricky, since on_* is processed by the daemonised xl which is
> > monitoring the domain, not the xl shutdown process.
> 
> >> b) make it possible for the xl commandline to overrule the on_* from the 
> >> configfile
> 
> > I guess you mean the xl shutdown command. This will also be tricky, for
> > the same reasons as a.
> 
> >> c) also introduce a -w/--wait for xl destroy
> 
> > Yes.
> 
> > I'll add:
> 
> > d) Make "xl shutdown --wait" actually wait for the domain to be
> > destroyed.
> 
> > Probably, assuming that is possible (I'm concerned about races in the
> > implementation of this...). Might also interact weirdly with on_* I
> > suppose.
> 
> Well if we could pass down the events that "wait_for_domain_deaths" is 
> allowed 
> to return on ... now it seems to return on *any* event .. and only print 
> something different on both shutdown and complete death ... 

Any other even would be unexpected I think, since the corresponding
libxl_evenable_* would never have been called in the xl shutdown path.

> Is there a special event that's triggered on timeout (as defined in 
> /etc/defaults/xendomains: XENDOMAINS_STOP_MAXWAIT=300 ?

I don't see a timeout in the libxl_event_wait prototype so I suppose
not.

> The the solution seems to be to let the caller of "wait_for_domain_deaths" be 
> able the specify the events it should return on.

They already can -- by only enabling those events.

>  (always return on timeout ... 
> return on any unless specified ... only return on specified when specific 
> events 
> are specified)  
> 
> Could you elaborate on how you think this would get "racy" ?

Oh, it looks like libxl already solved it and has an event, never mind.
(I was concerned that a new domain with the same domid might appear
right after the domain was gone)

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.