RE: [Xen-devel] xend leaks/bugs/etc

On Mon, 2005-04-18 at 01:00 -0500, Allen Short wrote:
> On Sun, 2005-04-17 at 16:42 +0100, Ian Pratt wrote: 
> 
> > Allen, I think we've come to the conclusion that Twisted was rather
> > overkill for our needs, and led to some rather confusing code that has
> > proved hard to maintain.
> 
> With all due respect, I work on 5 projects that use Twisted and,
> overall, they're the easiest codebases to extend that I've dealt with.
> xend's code is by far some of the worst Python code I've worked on.

Working on xend has been my first experience of using Python.  Glad to
hear it's atypical :-)

> 
> >  I've no doubt that someone more experienced
> > with using Twisted could have done a better job, but do you really think
> > it's the best route forward?  Xend is a 'control plane' daemon and
> > doesn't have to handle a high rate of invocations.
> 
> This is a point in favor of Twisted, I'd think; if you needed very high
> performance in that area, Python might not be appropriate.
> 
> > It needs some ability to handle asynchronous or out-of-order events, but 
> > this could be handled by simple language-level threads (we don't need 
> > concurrency). 
> 
> Given the current architecture (a daemon that accepts connections from a
> commandline tool or from a web interface), it would seem that you do
> need concurrency; personally, I'd find it inconvenient if this was
> handled differently. Plus, the languages that I'm familiar with that
> provide language-level threads require at least as much
> infrastructure/resource usage as Python.

I've done a lot of similar code in C using a facility similar to task
queues to handle chains of asynchronous events and I think the language
is not a significant factor:  whilst I was rewriting parts of the xend
USB code I found I could use deferreds to structure the code in the way
I was used to from my C experience; the twisted framework has a well
defined API which is well documented and I estimate the overhead for
learning enough Python and Twisted to pick up those aspects of the xend
interface at about 3 days for a competent programmer.

I think the initial confusion in xend, at least from the point of view
of extending it for new device types, lies in the use of inheritence in
the controller object model and the fact that all of the objects seem to
be called "controller".

Once you've understood the controller object model and inheritance
hierarchy, you then hit the fact that setting up the inter-domain
communication channels between front-end and back-end drivers is
overly-complex and must be reimplemented for each new device type.

The problem here is that the inter-domain communication primitives are
very low level and separated into a notification channel, a message
channel and a facility for bulk-data transfer which are all provided
independently.  The client of these interfaces must use them together to
make a communications channel but, because they are provided separately
with no constraints on correct relative sequencing of the three
interfaces the complexity from the client's perspective is cubed.

After getting over the hurdle of the inter-domain communication
mechanism, you come to the facts that the requirements for coping with
the domain lifecycle are unspecified, that the existing code doesn't
allow for loadable modules and that the controller model creates
back-end controller instances on demand during front-end creation which
makes it impossible to track the state of a loadable back-end driver
module correctly.

If you can guess what the domain lifecycle intention was, fix the bugs
in controller.py that prevent correct shutdown of driver domains (I'll
submit a patch) and work-around the above issues by constraining the
sequence of allowed driver module loads/unloads then you hit the final
hurdle of the fact that the requirements for error handling are again
unspecified and appear to be largely unmet by the existing code.

Finally, the xend code seems to trust input it receives from domains
which is incompatible with the architectural goal of VM isolation.

Even after dealing with the above issues, you'd still be left with the
problem that xend is very much a single node system when the
architectural direction for the tools is to be used to control Xen
clusters which would need to be highly-available for serious use in
enterprise environments.

So, to address the issues, I think the following steps are required:

1) Define a cluster architecture.  If the tools are going to be cluster
aware, we need to know what the definition of a cluster is and what the
cluster programming model is.  If HA is a requirement then the cluster
architecture should be HA from the start or the mechanism for making the
transition to HA should be precisely defined up-front since HA
architecture is a discriminating characteristic of any system which
makes it easier to start again than retrofit if you actually want to get
it right.

2) Define a high level inter-domain communication API.  This should be
consistent with the cluster model, should define the domain lifecycle
and contain sufficient guarantees for general purpose use. In particular
the API should deal with domain connection/disconnection notification
and elimination of stale communications. The inter-domain communication
API must be compatible with a MAC security implementation.

3) Define a dynamic resource discovery mechanism for use, for example,
by FE and BE driver domains.  This mechanism probably ought to be a
service accessible over the inter-domain communication API.

4) Define a configuration mechanism framework.  The last tools document
I read coupled the configuration aspects to the resource discovery
aspects.  I think they are distinct: the resource discovery mechanism
deals with dynamic changes which are not necessarily under user control
(loss of availability for example) whereas the configuration mechanism
is used by the user or higher level management tools to specify the
desired system configuration.

So, the language issues are insignificant compared to the architectural,
design and implementation issues of the current code.

Having said this, if you are going to get the architecture, design and
implementation right, it would be nice to also end up with minimalist
code with a small footprint with the minimum learning curve for people
joining the project.

Not sure whether the way to do that is to use C so as to have a single
language pre-req for the whole of Xen and get static-type checking or to
use Python for the tools to take advantage of its compact, expressive
qualities or to provide bindings for the core interfaces in a number of
languages so that people can extend the system however they choose.

Anyway, the main points I'm trying to make are that 1) there is a big
discussion that needs to happen on the list to define the architecture
for the tools 2) reimplementing xend better won't address the core
architectural issues 3) choosing a language to implement the tools in is
a second order concern.

Harry


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] xend leaks/bugs/etc