[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] xen: fix a (latent) cpupool-related race during domain destroy



On Fri, 2016-07-15 at 11:38 +0200, Juergen Gross wrote:
> Hmm, are you aware of commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a
> which explicitly moved cpupool_rm_domain() at the place where you are
> removing it again? Please verify that the scenario mentioned in the
> description of that commit is still working with your patch.
> 
Sorry, but I only partly see the problem.

In particular, I'm probably not fully understanding, from that commit
changelog, what is the set of operations/command that I should run to
check whether or not I reintroduced the issue back.

What I did so far is as follows:

root@Zhaman:~# xl cpupool-list 
Name               CPUs   Sched     Active   Domain count
Pool-0              12    credit       y          1
Pool-credit          4    credit       y          1
root@Zhaman:~# xl list -c
Name                                        ID   Mem VCPUs      State   Time(s) 
        Cpupool
Domain-0                                     0  1019    16     r-----      34.5 
         Pool-0
vm1                                          1  4096     4     -b----       9.7 
    Pool-credit
root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all
libxl: error: libxl.c:6998:libxl_cpupool_cpuremove: Error removing cpu 9 from 
cpupool: Device or resource busy
Some cpus may have not or only partially been removed from 'Pool-credit'.
If a cpu can't be added to another cpupool, add it to 'Pool-credit' again and 
retry.
root@Zhaman:~# xl cpupool-list -c
Name               CPU list
Pool-0             0,1,2,3,4,5,10,11,12,13,14,15
Pool-credit        9
root@Zhaman:~# xl shutdown vm1
Shutting down domain 1
root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all
root@Zhaman:~# xl cpupool-list -c
Name               CPU list
Pool-0             0,1,2,3,4,5,10,11,12,13,14,15
Pool-credit

If (with vm1 still in Pool-credit), I do this, it indeed fails:

root@Zhaman:~# xl shutdown vm1 & xl cpupool-cpu-remove Pool-credit all
[1] 3275
Shutting down domain 2
libxl: error: libxl.c:6998:libxl_cpupool_cpuremove: Error removing cpu 9 from 
cpupool: Device or resource busy
Some cpus may have not or only partially been removed from 'Pool-credit'.
If a cpu can't be added to another cpupool, add it to 'Pool-credit' again and 
retry.
[1]+  Done                    xl shutdown vm1
root@Zhaman:~# xl cpupool-list -c
Name               CPU list
Pool-0             0,1,2,3,4,5,10,11,12,13,14,15
Pool-credit        9

But that does not look too strange to me, as it's entirely possible
that the domain has not been moved yet, when we try to remove the last
cpu. And in fact, after the domain has properly shutdown:

root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all
root@Zhaman:~# xl cpupool-list 
Name               CPUs   Sched     Active   Domain count
Pool-0              12    credit       y          1
Pool-credit          0    credit       y          0

And in fact, looking at the code introduced by that commit, the
important part, to me, seems to be the moving of the domain to
cpupool0, which is indeed the right thing to do. OTOH, what I am seeing
and fixing, happens (well, could happen) all the times, even when the
domain being shutdown is already in cpupool0, and (as you say yourself
in your changelog) there not such issue as removing the last cpu of
cpupool0.

What am I missing?

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.