[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: Interdomain comms


  • To: Harry Butterworth <harry@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
  • From: Eric Van Hensbergen <ericvh@xxxxxxxxx>
  • Date: Sun, 8 May 2005 11:18:03 -0500
  • Cc: Mike Wray <mike.wray@xxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, "Ronald G. Minnich" <rminnich@xxxxxxxx>, Eric Van Hensbergen <ericvh@xxxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Sun, 08 May 2005 16:17:43 +0000
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=YBFj2UShdmil8tsLkUibG12YFGlTmL6G0JSB2Sjwqxgd9en8fShdUhNl8h+T+e8TaaQobWtfFhkLPWmT+ZmyFMBGC5eMXmV3Up2pf5SIEgqjHjZWprncDUSbeFuRhWEqAFVCMvZ+uV4+nx0be/c9WunVs1zV24l87ChsgsSnYl8=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 5/8/05, Harry Butterworth <harry@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > In our world, this would result in you holding a Fid pointing to the
> > open object.  The Fid is a pointer to meta-data and is considered
> > state on both the FE and the BE. (this has downsides in terms of
> > reliability and the ability to recover sessions or fail over to
> > different BE's -- one of our summer students will be addressing the
> > reliability problem this summer).
> 
> OK, so this is an area of concern for me.  I used the last version of
> the sketchy API I outlined to create an HA cluster infrastructure. So I
> had to solve these kind of protocol issues and, whilst it was actually
> pretty easy starting from scratch, retrofitting a solution to an
> existing protocol might be challenging, even for a summer student.
>

There are three previous attempts at providing these sort of
facilities in 9P that the student is going to be basing his work off
of.  All three worked to varying degrees of effectiveness - but
there's no magic bullet here and clients and file servers need to be
written defensively to be able to cope with such disruptions in a
graceful manner.  Its quite likely there will be different semantics
for failure recovery depending on the resource.
 
> >
> > The FE performs a read operation passing it the necessary bits:
> >   ret = read( fd, *buf, count );
> 
> Here the API is coupling the client to the memory management
> implementation by assuming that the buffer is mapped into the client's
> virtual address space.
> 
> This is probably likely to be true most of the time so an API at this
> level will be useful but I'd also like to be able to write I/O
> applications that manage the data in buffers that are never mapped into
> the application address space.
>

Well, this was the context of the example (the FE was registering a
buffer from its own address space).  The existing Plan 9 API doesn't
have a good example of how to handle the more abstract buffer handles
you describe, but I don't think there's anything in the protocol which
would prevent such a utilization.  I need to think about this scenario
a bit more, could you give an example how how you would use this
feature?
 
> Also, I'd like to be able to write applications that have clients which
> use different types of buffers without having to code for each case in
> my application.
> 

The attempt at portability is admirable, but it just seems to add
complexity -- if I want to use the reference, I'll have to make
another functional call to resolve the buffer.  I guess I'm being too
narrow minded, but I just don't have a clear idea of the utility of
hidden buffers.  I never know who I am supposed to be hiding
information from. ;)
 
> 
> So, my application can deal with buffers described like that without
> having to worry about the flavour of memory management backing them.
> 

This is important.  In my example I was working on the pretext that
the client initiating the read was consuming the data in some way.  
When that's not the case, the interface is quite different, more like
that of our file servers.  In those cases, I can easily see passing
the data by some more opaque reference (before I had figured
scatter/gather buffers would be sufficient -- but perhaps your more
abstract representation buys extra flexibility).  I still hate the
idea of having to resolve the abstract_buffer to get at the data, but
perhaps that's the cost of efficiency -- I'll have to think about it
some more.

> Also, I can change the memory management without changing all the calls
> to the API, I only have to change where I get buffers from.

Again - I agree that this is an important aspect.  Perhaps this sort
of functionality is best called out separately with its own interfaces
to provide and resolve buffer handles.   It seems like perhaps this
might be worth breaking out into its own.  It seems like there would
be three types of operations on your proposed struct:
   abstract_ref = get_ref( *real_data, flags ); /* constructor */
   real_data = resolve_ref( *abstract_ref, flags);
   forget_ref( abstract_ref ); /* destructor */
Lots of details under the hood there (as it should be).  flags could
help specify things like read-only, cow, etc.   Is such an interface
sufficient?  If I'm being naive here just tell me to shut up and I'll
won't talk about it until I've had the time to look a little deeper
into things.

> 
> BTW, this specific abstraction I learnt about from an embedded OS
> architected by Nik Shalor. He might have got it from somewhere else.
> 

Any specific paper references we should be looking at?  Or is obvious
from a google?

> > The above looks complicated, but to a FE writer would be as simple as:
> >  channel = dial("net!BE"); /* establish connection */
> > /* in my current code, channel is passed as an argument to the FE as a
> > boot arg */
> >   root = fsmount(channel, NULL); /* this does the t_version, auth, & attach 
> > */
> >   fd = open(root, "/some/path/file", OREAD);
> >   ret = read(fd, *buf, sizeof(buf));
> >   close(fd);
> >  close(root);
> >  close(channel);
> 
> So, this is obviously a blocking API.  My API was non-blocking because
> the network latency means that you need a lot of concurrency for high
> throughput and you don't necessarily want so many threads. Like AIO.
> Having a blocking API as well is convenient though.
> 

Yeah, I am betrayed by the simplicity of the existing API.  However,
just wanted to point out that there is nothing specifically
synchronous in the protocol.  I tend to like the simplicity of using
threads to deal with asynchronous behaviors, but efficient threads are
hard to come by.  Async APIs just seem to complicate driver writers
lives, but if this is the preferred methodology such an API could be
used with the 9P protocol.

> 
> One of the thoughts that did occur to me was that a reliance on in-order
> message delivery (which 9p has) turns out to be quite painful to satisfy
> 

There are certainly issues to be resolved here, but in environments
(such as using VMM transports) preserving frame boundaries on
messages, the in-order 9P requirements can be relaxed a great deal.

> 
> Yes, definitely worthwhile.  I'd like to see more discussion like this
> on the xen-devel list.  On the one hand, it's kind of embarrassing to
> discuss vaporware and half finished ideas but on the other, the
> opportunity for public comment at an early stage in the process is
> probably going to save a lot of effort in the long run.
> 

I'll try (perhaps with Ron's help) to put together some sort of white
paper on our vision.  It'd be quite easy to pull together an
organizational demonstration of what we are talking about, but working
out the performance/reliability/security details will likely take some
time.

I do like the general idea of building on top of many of the
underlying bits you describe.   I'm not quite sure we'd use all the
features (your endpoint definition seems a bit over-engineered for our
paradigm), but there are certainly lots of good things to take
advantage of.

          -eric

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.