WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

RE: [Xen-devel] MPI benchmark performance gap between native linux anddo

To: "Nivedita Singhvi" <niv@xxxxxxxxxx>, "Bin Ren" <bin.ren@xxxxxxxxx>, "Andrew Theurer" <habanero@xxxxxxxxxx>
Subject: RE: [Xen-devel] MPI benchmark performance gap between native linux anddomU
From: "Santos, Jose Renato G (Jose Renato Santos)" <joserenato.santos@xxxxxx>
Date: Tue, 5 Apr 2005 17:17:51 -0700
Cc: "Turner, Yoshio" <yoshio_turner@xxxxxx>, Aravind Menon <aravind.menon@xxxxxxx>, Xen-devel@xxxxxxxxxxxxxxxxxxx, G John Janakiraman <john@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Wed, 06 Apr 2005 00:17:55 +0000
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcU6LfZxUXcb0dHaQO2H6qEYW9ST+gADZVtA
Thread-topic: [Xen-devel] MPI benchmark performance gap between native linux anddomU
  Nivedita, Bin, Andrew and all interested in Xenoprof

  We should be posting the xenoprof patches in a few days.
  We are doing some last cleaning up in the code. Just be a little more
patient

  Thanks

  Renato 

>> -----Original Message-----
>> From: Nivedita Singhvi [mailto:niv@xxxxxxxxxx] 
>> Sent: Tuesday, April 05, 2005 3:23 PM
>> To: Santos, Jose Renato G (Jose Renato Santos)
>> Cc: xuehai zhang; Xen-devel@xxxxxxxxxxxxxxxxxxx; Turner, 
>> Yoshio; Aravind Menon; G John Janakiraman
>> Subject: Re: [Xen-devel] MPI benchmark performance gap 
>> between native linux anddomU
>> 
>> 
>> Santos, Jose Renato G (Jose Renato Santos) wrote:
>> 
>> >   Hi,
>> > 
>> >   We had a similar network problem in the past. We were 
>> using a TCP 
>> > benchmark instead of MPI but I believe your problem is 
>> probably the 
>> > same as the one we encountered.
>> >   It took us a while to get to the bottom of this and we only 
>> > identified the reason for this behavior after we ported 
>> oprofile to 
>> > Xen and did some performance profiling experiments.
>> 
>> Hello! Was this on the 2.6 kernel? Would you be able to
>> share the oprofile port? It would be very handy indeed
>> right now. (I was told by a few people that someone
>> was porting oprofile and I believe there was some status
>> on the list that went by) but haven't seen it yet...
>> 
>> >   Here is a brief explanation of the problem we found and 
>> the solution 
>> > that worked for us.
>> >   Xenolinux allocates a full page (4KB) to store socket buffers 
>> > instead of using just MTU bytes as in traditional linux. This is 
>> > necessary to enable page exchanges between the guest and the I/O 
>> > domains. The side effect of this is that memory space used 
>> for  socket 
>> > buffers is not very efficient. Even if packets have the 
>> maximum MTU 
>> > size (typically 1500 bytes for Ethernet) the total buffer 
>> utilization 
>> > is very low ( at most just slightly  higher than 35%). If packets 
>> > arrive faster than they are processed at the receiver 
>> side, they will 
>> > exhaust the receiver buffer
>> 
>> Most small connections (say upto 3 - 4K) involve only 3 to 5 
>> segments, and so the tcp window never really opens fully.  
>> On longer lived connections, it does help very much to have 
>> a large buffer.
>> 
>> > before the TCP advertised window is reached (By default 
>> Linux uses a 
>> > TCP advertised window equal to 75% of the receive buffer size. In 
>> > standard Linux, this is typically sufficient to stop packet 
>> > transmission at the sender before running out of receive 
>> buffers. The 
>> > same is not true in Xen due to inefficient use of socket buffers). 
>> > When a packet arrives and there is no receive buffer 
>> available, TCP 
>> > tries to free socket buffer space by eliminating socket buffer 
>> > fragmentation (i.e. eliminating wasted buffer space). This 
>> is done at 
>> > the cost of an extra copy of all receive buffer to new compacted 
>> > socket buffers. This introduces overhead and reduces 
>> throughput when 
>> > the CPU is the bottleneck, which seems to be your case.
>> 
>> /proc/net/netstat will show a counter of just how many times 
>> this happens (RcvPruned). Would be interesting if that was 
>> significant.
>> 
>> > This problem is not very frequent because modern CPUs are 
>> fast enough 
>> > to receive packets at Gigabit speeds and the receive 
>> buffer does not 
>> > fill up. However the problem may arise when using slower machines 
>> > and/or when the workload consumes a lot of CPU cycles, such as for 
>> > example scientific MPI applications. In your case in you have both 
>> > factors against you.
>> 
>> 
>> > The solution to this problem is trivial. You just have to 
>> change the 
>> > TCP advertised window of your guest to a lower value. In 
>> our case, we 
>> > used 25% of the receive buffer size and that was sufficient  to 
>> > eliminate the problem. This can be done using the following command
>> 
>> >>echo -2 > /proc/sys/net/ipv4/tcp_adv_win_scale
>> 
>> How much did this improve your results by? And wouldn't
>>   making the default socket buffers, max socket buffers
>> larger by, say, 5 times be more effective (other than for
>> those applications using setsockopt() to set their buffers
>> to some size already, but not large enough)?
>> 
>> > (The default 2 corresponds to 75% of receive buffer, and -2 
>> > corresponds to 25%)
>> > 
>> > Please let me know if this improve your results. You 
>> should still see 
>> > a degradation in throughput when comparing xen to 
>> traditional linux, 
>> > but hopefully you should be able to see better 
>> throughputs. You should 
>> > also try running your experiments in domain 0. This will 
>> give better 
>> > throughput although still lower than traditional linux. I 
>> am curious 
>> > to know if this have any effect in your experiments. 
>> Please, post the 
>> > new results if this has any effect in your results
>> 
>> Yep, me too..
>> 
>> thanks,
>> Nivedita
>> 
>> 
>> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>