[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen/arm: Domain not fully destroyed when using credit2



Hi,

On 24/01/17 11:02, Jan Beulich wrote:
On 24.01.17 at 11:50, <julien.grall@xxxxxxx> wrote:
On 24/01/2017 08:20, Jan Beulich wrote:
On 23.01.17 at 20:42, <julien.grall@xxxxxxx> wrote:
Whilst testing other patches today, I have noticed that some part of the
resources allocated to a guest were not released during the destruction.

The configuration of the test is:
        - ARM platform with 6 cores
        - staging Xen with credit2 enabled by default
        - DOM0 using 2 pinned vCPUs

The test is creating a guest vCPUs and then destroyed. After the test,
some resourced are not released (or could be released a long time
after).

Looking at the code, domain resources are released in 2 phases:
        - domain_destroy: called when there is no more reference on the domain
(see put_domain)
        - complete_domain_destroy: called when the RCU is quiescent

The function domain_destroy will setup the RCU callback
(complete_domain_destroy) by calling call_rcu. call_rcu will add the
callback into the RCU list and then will may send an IPI (see
force_quiescent_state) if the threshold reached. This IPI is here to
make sure all CPUs are quiescent before calling the callbacks (e.g
complete_domain_destroy). In my case, the threshold has not reached and
therefore an IPI is not sent.

But wait - isn't it the nature of RCU that it may take arbitrary time
until the actual call(s) happen(s)?

Today this arbitrary time could be infinite if an idle pCPU does not
receive an interrupt. So some part of domain resource will never be freed.

If I am power-cycling a domain in loop, after some time the toolstack
will fail to allocate memory because of exhausted resources. Previous
instance of the domain was not yet fully destroyed (e.g
complete_domain_destroy was not called).

If an upper limit is required by
a user of RCU, I think it would need to be that entity to arrange
for early expiry.

This is happening with all the user and not only a domain. Looking at
the code, there are already some upper limit:
        - call_rcu will call force_quiescent_state if the number of element in
the RCU queue is > 10000
        - the RCU has a grace period (not sure how long), but no timer to
ensure the RCU will be called

This remark in parentheses is quite relevant here, I think: There
simply is no upper bound, aiui. This is a conceptional aspect. But
I'm in no way an RCU expert, so I may well be entirely off.

I would be surprised that it is a normal behavior to have an idle pCPU (because of wfi or equivalent instruction on x86) blocking the RCU forever as it is the case today.


Reducing the threshold in call_rcu (see qhimark) will not help as you
may still face memory exhaustion (see above). So I think the only best
solution is to actually implement properly the grace period.

Well, with the above in mind - what does "properly" mean here?

By properly, I meant that either the idle pCPU should not be taken into account into the grace period or we need a timer (or else) on the idle pCPU to check whether there is some work to do (see rcu_pending).

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.