[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 5/6] tools/xenstored: partially handle domains without a shared ring



Hi Roger,

On 23/09/2021 12:23, Roger Pau Monné wrote:
On Wed, Sep 22, 2021 at 06:46:25PM +0500, Julien Grall wrote:
I thought a bit more and looked at the code (I don't have access to a test
machine at the moment). I think there is indeed a problem.

Some watchers of @releaseDomain (such as xenconsoled) will only remove a
domain from their internal state when the domain is actually dead.

This is based on dominfo.dying which is only set when all the resources are
relinquished and waiting for the other domains to release any resources for
that domain.

The problem is Xenstore may fail to map the interface or the event channel
long before the domain is actually dead. With the current check, we would
free the allocated structure and therefore send @releaseDomain too early. So
daemon like xenconsoled, would never cleanup for the domain and leave a
zombie domain. Well... until the next @releaseDomain (or @introduceDomain
for Xenconsoled) AFAICT.

The revised patch is meant to solve it by just ignoring the connection. With
that approach, we would correctly notify watches when the domain is dead.

I think the patch provided by Julien is indeed better than the current
code, or else you will miss @releaseDomain events in corner cases for
dominfo.dying.

I think however the patch is missing a change in the order of the
checks in conn_can_{read,write}, so that the is_ignored check is
performed before calling can_{read,write} which will try to poke at
the interface and trigger a fault if not mapped.

Ah good point. I originally moved the is_ignored check after the callback because the socket connection can in theory be closed from can_{read, write}.

However, in pratice, is_ignored is only set for socket from ignore_connection() that will also close the socket.

The new use will only set is_ignored for the domain connection. So I am guessing moving the check early on ought to be fine.

The alternative would be to call ignore_connection() but this feels a bit weird because most of it would be a NOP as we are introducing the domain.

Cheers,


--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.