[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Proposal for init/kexec/hotplug format for Xen

On Sun, 2005-02-27 at 10:42 -0600, Anthony Liguori wrote:

> We could begin work today on libxen-hcall and libxen-idc while we work 
> out what the store is going to like and how the OF structure is going to 
> work.  Thoughts?

The most difficult aspect of the inter-domain communication API to
express from the point of view of forwards compatibility with a
fault-tolerant implementation is that, in a fault-tolerant system with
different levels of fault tolerance, some domains will come and go
whilst others persist across failures.

So, basically, the system model has to include the concept of different
domains coming and going and the API must be sufficient for surviving
domains to be able to implement a correct recovery.

In turn, one of the difficult aspects of recovery is the problem of
stale messages in the system.

For example, if you have a driver domain providing a fault-tolerant
domain with access to shared storage then, if the FT domain temporarily
loses connectivity with the driver domain and then reconnects, it faces
the problem that there may still be some of its old requests outstanding
in the driver domain which could interfere with its subsequent

This kind of thing is a general problem that applies to all protocols in
fault-tolerant systems and there is a choice as to whether to deal with
stale messages on a per-protocol basis or come up with a global solution
that works for all protocols.

In the past, I've had some success with small clusters with a global
approach that basically quiesces the whole system when something
changes: the domain topology is determined; communication is established
between all domains; clients in all domains are told the communication
network is connected; clients make use of it; something goes wrong; the
domain topology is redetermined; all the clients are told the
communication network is disconnected and they quiesce all stale
operations; once all clients are quiesced they are reconnected to a new
epoch of the communications network; in the new epoch, all clients are
guaranteed there is no stale activity in progress from the previous

This deals with the problem of restarting protocols amongst the domains
that recover connectivity after a failure but, on its own, isn't quite
sufficient because of the problem of disconnected domains. Consider the
following example:

A FT domain is served access to shared storage by two independent driver
domains. To start with, the FT domain sends all I/O down a path through
one of the driver domains. There is a problem and that domain becomes
disconnected from the FT domain.  The FT domain starts to send its I/O
down through the other driver domain but stale requests outstanding in
the disconnected domain are still in progress to the storage and
interfere with its operation.

One possible solution to the problem of disconnected domains is to
maintain a lease such that when a domain is disconnected it is
sufficient to wait for the lease to expire to guarantee that the
disconnected domain will have stopped and will not interfere with
subsequent operation.  Another possible solution is to have some kind of
fencing scheme which can prevent the disconnected domain from being able
to access the shared resource after it is disconnected.

The global quiesce and lease schemes are OK for fail-stop fault-tolerant
systems with relatively infrequent failures but are not appropriate for
byzantine fault tolerant systems.

For byzantine fault-tolerance you're going to need to contain the effect
of a failure to the minimum scope and you can't rely on a domain
stopping when its lease expires so you need some kind of fencing scheme
for shared resources.

Trying to think too far ahead is possibly dangerous but you might at
least like to evaluate any proposed IDC API against the above scenarios
to see how well it might serve you in the future.

Harry Butterworth <harry@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>

SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.