[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] myrinet dma



On Wed, 25 Aug 2004 21:27:25 +0100
Ian Pratt <Ian.Pratt@xxxxxxxxxxxx> wrote:

> > There are some folks here interested in using Myrinet with Xen.
> > Applications using Myrinet link against a library that registers memory
> > and DMAs directly to it. The library loads the memory addresses onto the
> > card which is setting off some alarms and we're attempting to think
> > about exactly how XenoLinux guests will handle this situation.
> 
> Presumably it's not just a library, and there's some trusted OS
> component that pins the physical pages and registers their bus
> addresses with the Myrinet NIC?
> 
> I guess the library sets up a mmap of some of the NICs control
> registers, then provides a set of library functions to do
> open/close, send/receive/RDMA etc and provide block-until-receive
> functionality.

Yes, as far as I understand.  There is an OS component that pins the
pages and from then on the API can interact directly with the NIC. There
is also a processor and firmware on the NIC itself, the library on the
host is pretty lightweight in my understanding.  I think the library
does share a specific not-for-data memory range with the NIC and that is
how it can control it directly.

I wonder what effects that the control library getting less processor
time (because it is in a VMM) will have.  It is userspace so I assume it
is coded to expect unpredictable scheduling..

>  
> > What memory issues might arise using this in a guest domain with
> > privileged drivers, in non privileged domains (with one privileged
> > domain actually using the libraries), and with multiple privileged
> > domains simultaneously?  Is it possible?
> 
> Interesting.
> 
> Assuming I'm right about there being a trusted OS component that
> deals with the creation/deletion of memory apertures, you'd want
> exactly one of these, running in e.g. domain 0. You'd then need
> to create a way of virtualising this functionality to other
> domains, so their OS component doesn't talk to the card directly
> but talks to the controlling domain that will then interact with
> Xen's mmu to check the pages actually belong to the domain, and
> then pin them and register them with the NIC.
> 
> This is going to require a little coding, but shouldn't be too
> hard. It's quite an interesting problem, so I'd be happy to help
> with the design. It's something we'll have to do for inifiniband
> anyhow. 

It sounds like only the OS component will need to be modified to pass
the negotiation on to domain0 to pin the memory.  Once done, the library
can interact with this shared memory without modification to the library
I presume (i.e., no virtualized address issues, right?).  And the NIC
will also be allowed to DMA to the memory at any time.

So the coding involved would be to create an idealized interface for
guest domains specifically for Myrinet (or would it be best to create on
for a certain 'class' of devices like the I/O paper discusses)?

The modifying-the-OS-interface-only approach only allows for six guests,
I think, this is what you mean saying the six ports won't provide much
flexibility, right?  (to answer Mark's question, yes) I believe each
library thread expects to totally control each channel via shared
memory.  

To multiplex this without modifying the card or library code sounds
complicated to me, especially because we want zero-copy straight to the
application (not through the guest OS's buffers) to attain the desired
speeds. How would that even work? Control messages would be intercepted,
interpreted, and passed to the card channel's real control registers and
for the data specific subregions of the channel;s memory allocation are
mapped to separate domains?  yikes.

Do your plans for infiniband allow 100s of guests to each have high
speed networking?  How much might the performance degrade?

 
> > I would think in the case that the one privileged domain bridging to
> > other, non-privileged domains that full speed transfers would be
> > impossible if there is any copying necessary.
> 
> I presume it's also possible to use the Myrinet card as a plain
> ehernet/ip interface with a suitable kernel driver (yes, I know
> this sucks)?
> 
> You could use this to enable the privileged domain to at least
> provide network connectivity to other domains using the normal
> netback/netfront drivers. (the privileged domain could also use
> the normal library directly).  This would be zero-copy into the
> domain, but the normal OS stack would usually end up copying
> things into the application socket buffer.

Well, this 'sucks' but it is still cool to do it this way because the
domains would probably still have better than plain ethernet performance
and weird OSs could take advantage of highER performance networking.

If I'm thinking about this correctly, it sounds like all of these
domains' traffic could be put onto one Myrinet channel and five special
domains could truly take advantage of Myrinet?  Is that feasible?  This
would rule out every domain being able to use the special Myrinet
message passing protocols but there could be some interesting mixed
latency MPI simulations.  I think they might be interested in this
anyhow.

This is all a very nascent interest, I think we're just trying to grasp
the issues.   I'm sorry I don't know more about the guts of how Myrinet
works.

I appreciate everyone's responses, thankyou!


> 
> > But would several privileged guests using the libraries be able to
> > coexist?  Would there be swapping and memory pinning issues?  The card
> > returns a port, one of several (I believe six).  Would it be possible
> > for six guests to each have access to one of these channels?
> 
> Only six ports? That's a bit lame. I'd like to see the memory
> mapped communication extended right down into user-space
> applications in multiple domains, but six doesn't give a whole
> lot of flexibility...

> 
> Ian
> 
> 
> -------------------------------------------------------
> SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
> 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
> Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
> http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/xen-devel
> 


-- 


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.