[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [DOC v1] Xen transport for 9pfs



On Fri, 2 Dec 2016, Dario Faggioli wrote:
> On Thu, 2016-12-01 at 15:14 -0800, Stefano Stabellini wrote:
> > On Thu, 1 Dec 2016, Dario Faggioli wrote:
> > > 
> > > On Tue, 2016-11-29 at 15:34 -0800, Stefano Stabellini wrote:
> > > > 
> > > >     ring-ref-<num> (ring-ref-0, ring-ref-1, etc)
> > > > 
> > > blkif uses ring-ref%u, rather than ring-ref-%u (i.e., no dash
> > > before
> > > the index). Not a big deal, I guess, but I thought it could be nice
> > > to
> > > be a bit more uniform.
> > 
> > Sure, but in this case each ring-ref-%u is used to map a different
> > ring.
> >
> Yeah, right. So it may even be a good thing to differentiate, indeed...
> 
> > That said, I can make the change.
> > 
> I don't know. I, FWIW, thought it would be good, now I'm not so sure
> any longer. Yours and maintainers' call, I guess. :-)
> 
> > > If it is, what's the typical envisioned use of these multiple
> > > rings, if
> > > I can ask?
> > 
> > They are used to handle multiple read/write requests in parallel.
> > Let's
> > assume that we configure the rings to be 8K each. Given that the data
> > is
> > transmitted over the ring, each ring can hold only one outstanding 4K
> > write request (there is an header for each write request).
> > 
> Ok.
> 
> > With two 8K rings, we can have two outstanding 4K write requests,
> > each
> > of them processed in parallel on a different vcpu.
> > 
> > The system is completely configurable in terms of number and size of
> > rings, so a user can configure it to only export one 4K ring for
> > example or go as far as several 2MB rings.
> > 
> Right. So, it is indeed similar to blkif multiqueueing, with which it
> also shares the idea/objective of exploiting parallelism at the (v)CPU
> level, but without (quite obviously, in this case) any direct link to
> hardware queues in disk controllers, and without the protocol itself
> giving any direction or indication of how to actually use all this.
> 
> Got it. Nice.
> 
> FWIW, I think a few words --just a shorter version of what you just
> said-- may be useful if present in this document.

I'll do.


> > > >     /* not actually C compliant (ring_order changes from socket
> > > > to
> > > > socket) */
> > > >     struct ring_data {
> > > >         char in[((1 << ring_order) << PAGE_SHIFT) / 2];
> > > >         char out[((1 << ring_order) << PAGE_SHIFT) / 2];
> > > >     };
> > > > 
> > > Sorry, what does "ring_order changes from socket to socket" mean?
> > 
> > Woops, copy/paste error from PVCalls. I meant "ring_order changes
> > from
> > ring to ring".
> > 
> Ah, yes, now it makes sense. :-)
> 
> BTW, what's the reason for putting ring_order inside xen_9pfs_intf,
> instead of having a ring-page-order (well, actually, a
> ring-%u-page-order) xenstore key?

So that each ring can have a different size.


> > > > The binary layout of `struct xen_9pfs_intf` follows:
> > > > 
> > > >     0         4         8         12        16        20
> > > >     +---------+---------+---------+---------+---------+
> > > >     | in_cons | in_prod |out_cons |out_prod |ring_orde|
> > > >     +---------+---------+---------+---------+---------+
> > > > 
> > > >     20        24        26      4092      4096
> > > >     +---------+---------+----//---+---------+
> > > >     |  ref[0] |  ref[1] |         |  ref[N] |
> > > >     +---------+---------+----//---+---------+
> > > > 
> > > > **N.B** For one page, N is maximum 1019 ((4096-20)/4), but given
> > > > that
> > > > N
> > > > needs to be a power of two, actually max N is 512.
> > > > 
> > > It may again be me being still too naive, but I'd quickly add at
> > > least
> > > another example, with the value of N computed for a multiple pages
> > > ring. Something like: "For 4 pages (i.e., ring_orfer=2), N is..."
> > 
> > For 4 pages, N is 4. N is the number of pages that make up the ring.
> > 
> > Maybe there is a misunderstanding, let me try to explain it again:
> > each
> > page shared via xenstore contains information to handle one new ring,
> > including grant references for the pages making up the multipage ring
> > itself. I'll repeat: pages shared via xenstore are not used as a
> > ring,
> > they are used to setup the rings, each page has the info to setup a
> > new
> > ring.
> > 
> Right, I got this. And indeed I expressed myself very badly above.
> 
> So, the descriptor of 1 ring is one page. Such page contains, in signle
> page rings, the reference to another page, which is the actual ring. If
> the ring is multi-page, the descriptor page contains an array of page
> references which, together, are the actual ring.

Right


> Such array --of which N is, in the diagram above, the last index-- can
> be, as you say, up to 1019 elements big (the available space in a ring
> descriptor page). Therefore, the math I was asking about is really the
> relationship between N and max-ring-page-order. That is, a ring can
> have at most 2^max-ring-page-order pages, and N can be at most 1019
> (well, I think it's 1018 if, as in diagram above, you count from 0, but
> that does not matter much); so:
> 
>  2^max-ring-page-order <= N
>  lb(2^max-ring-page-order) <= lb(N)  //lb(): base 2 logarithm
>  max-ring-page-order <= lb(N)
> 
> and, considering that max-ring-page-order must be a natural number:
> 
>  max-ring-page-order <= floor(lb(N))
>  max-ring-page-order <= floor(lb(1018))
>  max-ring-page-order <= floor(9.9915)
>  max-ring-page-order <= 9
> 
> so a ring can be at most 2^9 pages big, which indeed matches with your
> own calculations, and bring us to the fact that the maximum size of a
> ring is 512*4Kb=2Mb
> 
> So, to recap (sorry for being so long!), I think that saying:
> 
> "**N.B** For one page, N is maximum 1019 ((4096-20)/4), but given that
> N needs to be a power of two, actually max N is 512."
> 
> is indeed correct, and probably makes it enough clear that the maximum
> ring size is 2MB. It's not equally easy, IMO, to map this back to the
> fact that this also mean max-ring-page-order must be at most 9, and
> that is not spelled out anywhere else, AFAICT.
> 
> Therefore, an example of how things look with a couple of different
> values of ring_order, or some shorter and less boring version of this
> reasoning and calculations may help with that. That's what I'm trying
> to say. :-)

I'll expand the N.B. to cover this.


> > The structure of these "setup pages" is `struct xen_9pfs_intf`. Each
> > page is completely separate and independent from the others. Given
> > that
> > one page is just 4096 bytes, it can contain max 512 grant refs (see
> > calculation above). So the max size of one multipage ring is 512
> > pages =
> > 2MB.
> > 
> > Does it make sense?
> >
> It does, and this is probably me being a mix of, not too used to this,
> and too picky... If that's the case, sorry for the noise. :-D

No worries, thanks for the review!
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.