[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Odd blkdev throughput results



> > The big thing is that on network RX it is currently dom0 that does the
> > copy. In the CMP case this leaves the data in the shared cache ready to
> > be accessed by the guest. In the SMP case it doesn't help at all. In
> > netchannel2 we're moving the copy to the guest CPU, and trying to
> > eliminate it with smart hardware.
> >
> > Block IO doesn't require a copy at all.
>
> Well, not in blkback by itself, but certainly from the in-memory disk
> image. Unless I misunderstoode Keirs post recently, page flipping is
> basically dead code, so I thought the number should at least point into
> roughly the same directions.

Blkback has always DMA-ed directly into guest memory when reading data from 
the disk drive (normal usecase), in which case there's no copy - I think that 
was Ian's point.  In contrast the Netback driver has to do a copy in the 
normal case.

If you're using a ramdisk then there must be a copy somewhere, although I'm 
not sure exactly where it happens!

Cheers,
Mark

> > > This is not my question. What strikes me is that for the blkdev
> > > interface, the CMP setup is 13% *slower* than SMP, at 661.99 MB/s.
> > >
> > > Now, any ideas? I'm mildly familiar with both netback and blkback, and
> > > I'd never expected something like that. Any hint appreciated.
> >
> > How stable are your results with hdparm? I've never really trusted it as
> > a benchmarking tool.
>
> So far, all the experiments I've done look fairly reasonable. Standard
> deviance is low, and since I've been tracing netback reads I'm fairly
> confident that the volume wasn't been left in domU memory somewhere.
>
> I'm not so much interested in bio or physical disk performance, but
> relative performance of how much can be squeezed through the buffer ring
> before and after applying some changes. It's hardly a physical disk
> benchmark, but it's simple and for the purpose given it seems okay.
>
> > The ramdisk isn't going to be able to DMA data into the domU's buffer on
> >  a read, so it will have to copy it.
>
> Right...
>
> > The hdparm running in domU probably
> >  doesn't actually look at any of the data it requests, so it stays local
> >  to the dom0 CPU's cache (unlike a real app).
>
> hdparm performs sequential 2MB-read()s over a 3s period. It's not
> calling the block layer directly or something. That'll certainly hit
> domU caches?
>
> > Doing all that copying
> >  in dom0 is going to beat up the domU in the shared cache in the CMP
> >  case, but won't effect it as much in the SMP case.
>
> Well, I could live with blaming L2 footprint. Just wanted to hear if
> someone has different explanations. And I would expect similar results
> on net RX then, but I may be mistaken.
>
> Furthermore, I need to apologize because I failed to use netperf
> correctly and managed to report the TX path on my original post :P. The
> real numbers are rather 885.43 (SMP) vs. 1295.46 (CMP), but the
> difference compared to blk reads as such stays the same.
>
> regards,
> daniel



-- 
Push Me Pull You - Distributed SCM tool (http://www.cl.cam.ac.uk/~maw48/pmpu/)

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.