Xen project Mailing List

Re: [Xen-devel] Re: Interdomain comms

To: Harry Butterworth <harry@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>

From: Eric Van Hensbergen <ericvh@xxxxxxxxx>

Date: Sun, 8 May 2005 11:18:03 -0500

Cc: Mike Wray <mike.wray@xxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, "Ronald G. Minnich" <rminnich@xxxxxxxx>, Eric Van Hensbergen <ericvh@xxxxxxxxxxxxxxxxxxxxx>

Delivery-date: Sun, 08 May 2005 16:17:43 +0000

Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=YBFj2UShdmil8tsLkUibG12YFGlTmL6G0JSB2Sjwqxgd9en8fShdUhNl8h+T+e8TaaQobWtfFhkLPWmT+ZmyFMBGC5eMXmV3Up2pf5SIEgqjHjZWprncDUSbeFuRhWEqAFVCMvZ+uV4+nx0be/c9WunVs1zV24l87ChsgsSnYl8=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 5/8/05, Harry Butterworth <harry@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > In our world, this would result in you holding a Fid pointing to the > > open object. The Fid is a pointer to meta-data and is considered > > state on both the FE and the BE. (this has downsides in terms of > > reliability and the ability to recover sessions or fail over to > > different BE's -- one of our summer students will be addressing the > > reliability problem this summer). > > OK, so this is an area of concern for me. I used the last version of > the sketchy API I outlined to create an HA cluster infrastructure. So I > had to solve these kind of protocol issues and, whilst it was actually > pretty easy starting from scratch, retrofitting a solution to an > existing protocol might be challenging, even for a summer student. > There are three previous attempts at providing these sort of facilities in 9P that the student is going to be basing his work off of. All three worked to varying degrees of effectiveness - but there's no magic bullet here and clients and file servers need to be written defensively to be able to cope with such disruptions in a graceful manner. Its quite likely there will be different semantics for failure recovery depending on the resource. > > > > The FE performs a read operation passing it the necessary bits: > > ret = read( fd, *buf, count ); > > Here the API is coupling the client to the memory management > implementation by assuming that the buffer is mapped into the client's > virtual address space. > > This is probably likely to be true most of the time so an API at this > level will be useful but I'd also like to be able to write I/O > applications that manage the data in buffers that are never mapped into > the application address space. > Well, this was the context of the example (the FE was registering a buffer from its own address space). The existing Plan 9 API doesn't have a good example of how to handle the more abstract buffer handles you describe, but I don't think there's anything in the protocol which would prevent such a utilization. I need to think about this scenario a bit more, could you give an example how how you would use this feature? > Also, I'd like to be able to write applications that have clients which > use different types of buffers without having to code for each case in > my application. > The attempt at portability is admirable, but it just seems to add complexity -- if I want to use the reference, I'll have to make another functional call to resolve the buffer. I guess I'm being too narrow minded, but I just don't have a clear idea of the utility of hidden buffers. I never know who I am supposed to be hiding information from. ;) > > So, my application can deal with buffers described like that without > having to worry about the flavour of memory management backing them. > This is important. In my example I was working on the pretext that the client initiating the read was consuming the data in some way. When that's not the case, the interface is quite different, more like that of our file servers. In those cases, I can easily see passing the data by some more opaque reference (before I had figured scatter/gather buffers would be sufficient -- but perhaps your more abstract representation buys extra flexibility). I still hate the idea of having to resolve the abstract_buffer to get at the data, but perhaps that's the cost of efficiency -- I'll have to think about it some more. > Also, I can change the memory management without changing all the calls > to the API, I only have to change where I get buffers from. Again - I agree that this is an important aspect. Perhaps this sort of functionality is best called out separately with its own interfaces to provide and resolve buffer handles. It seems like perhaps this might be worth breaking out into its own. It seems like there would be three types of operations on your proposed struct: abstract_ref = get_ref( *real_data, flags ); /* constructor */ real_data = resolve_ref( *abstract_ref, flags); forget_ref( abstract_ref ); /* destructor */ Lots of details under the hood there (as it should be). flags could help specify things like read-only, cow, etc. Is such an interface sufficient? If I'm being naive here just tell me to shut up and I'll won't talk about it until I've had the time to look a little deeper into things. > > BTW, this specific abstraction I learnt about from an embedded OS > architected by Nik Shalor. He might have got it from somewhere else. > Any specific paper references we should be looking at? Or is obvious from a google? > > The above looks complicated, but to a FE writer would be as simple as: > > channel = dial("net!BE"); /* establish connection */ > > /* in my current code, channel is passed as an argument to the FE as a > > boot arg */ > > root = fsmount(channel, NULL); /* this does the t_version, auth, & attach > > */ > > fd = open(root, "/some/path/file", OREAD); > > ret = read(fd, *buf, sizeof(buf)); > > close(fd); > > close(root); > > close(channel); > > So, this is obviously a blocking API. My API was non-blocking because > the network latency means that you need a lot of concurrency for high > throughput and you don't necessarily want so many threads. Like AIO. > Having a blocking API as well is convenient though. > Yeah, I am betrayed by the simplicity of the existing API. However, just wanted to point out that there is nothing specifically synchronous in the protocol. I tend to like the simplicity of using threads to deal with asynchronous behaviors, but efficient threads are hard to come by. Async APIs just seem to complicate driver writers lives, but if this is the preferred methodology such an API could be used with the 9P protocol. > > One of the thoughts that did occur to me was that a reliance on in-order > message delivery (which 9p has) turns out to be quite painful to satisfy > There are certainly issues to be resolved here, but in environments (such as using VMM transports) preserving frame boundaries on messages, the in-order 9P requirements can be relaxed a great deal. > > Yes, definitely worthwhile. I'd like to see more discussion like this > on the xen-devel list. On the one hand, it's kind of embarrassing to > discuss vaporware and half finished ideas but on the other, the > opportunity for public comment at an early stage in the process is > probably going to save a lot of effort in the long run. > I'll try (perhaps with Ron's help) to put together some sort of white paper on our vision. It'd be quite easy to pull together an organizational demonstration of what we are talking about, but working out the performance/reliability/security details will likely take some time. I do like the general idea of building on top of many of the underlying bits you describe. I'm not quite sure we'd use all the features (your endpoint definition seems a bit over-engineered for our paradigm), but there are certainly lots of good things to take advantage of. -eric _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.