|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] question about migration
On 04/01/16 15:31, Ian Jackson wrote:
> Andrew Cooper writes ("Re: [Xen-devel] question about migration"):
>> On 25/12/2015 03:06, Wen Congyang wrote:
>>> Another problem:
>>> If migration fails after the guest is suspended, we will resume it in the
>>> source.
>>> In this case, we cannot shutdown it. because no process hanlds the shutdown
>>> event.
>>> The log in /var/log/xen/xl-hvm_nopv.log:
>>> Waiting for domain hvm_nopv (domid 1) to die [pid 5508]
>>> Domain 1 has shut down, reason code 2 0x2
>>> Domain has suspended.
>>> Done. Exiting now
>>>
>>> The xl has exited...
> ...
>> Hmm yes. This is a libxl bug in libxl_evenable_domain_death(). CC'ing
>> the toolstack maintainers.
> AIUI this is a response to Wen's comments above.
>
>> It waits for the @releasedomain watch, but doesn't interpret the results
>> correctly. In particular, if it can still make successful hypercalls
>> with the provided domid, that domain was not the subject of
>> @releasedomain. (I also observe that domain_death_xswatch_callback() is
>> very inefficient. It only needs to make a single hypercall, not query
>> the entire state of all domains.)
> I don't understand precisely what you allege this bug to be, but:
>
> * libxl_evenable_domain_death may generate two events, a
> DOMAIN_SHUTDOWN and a DOMAIN_DEATH, or only one, a DOMAIN_DEATH.
> This is documented in libxl.h (although it refers to DESTROY rather
> than DEATH - see patch below to fix the doc).
>
> * @releaseDomain usually triggers twice for each domain: once when it
> goes to SHUTDOWN and once when it is actually destroyed. (This is
> obviously necessary to implement the above.)
So it does. I clearly had an accident with `git grep` when I came the
opposite conclusion. Apologies for the noise generated from this.
>
> * @releaseDomain does not have a specific domain which is the "subject
> of @releaseDomain". Arguably this is unhelpful, but it is not
> libxl's fault. It arises from the VIRQ generated by Xen. Note that
> xenstored needs to search its own list of active domains to see what
> has happened; it generates the @releaseDomain event and throws away
> the domid.
The semantics of @releaseDomain are quite mad, but this is have it has
always been.
The current semantics are a scalability limitation which someone in
XenServer will likely get around to in due course (we support 1000 VMs
per host).
> * It is not possible to resume the domain in the source after it has
> suspended.
This functionality exists and is already used in several circumstances,
both by libxl, and other toolstacks.
xl has an added split-brain problem here that plain demonic toolstacks
don't have; specifically that there are two completely independent
processes playing with the domain state at the same time.
The daemonic xl needs to ignore DOMAIN_SHUTDOWN and tidy up only after
DOMAIN_DEATH. Under these circumstances, a failed migrate which resumes
the domain won't result in qemu being cleaned up.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |