[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] Re: domU using linux-2.6.37-xen-next pvops kernel with CONFIG_PARAVIRT_SPINLOCKS disabled results in 150% performance improvement (updated)


  • To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>, Dante Cinco <dantecinco@xxxxxxxxx>
  • From: "Lin, Ray" <Ray.Lin@xxxxxxx>
  • Date: Tue, 21 Dec 2010 12:03:02 -0700
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
  • Delivery-date: Tue, 21 Dec 2010 11:04:05 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcuhNi1EgT0pri1hSBaOMgzkAYlNVAACRNMw
  • Thread-topic: [Xen-devel] Re: domU using linux-2.6.37-xen-next pvops kernel with CONFIG_PARAVIRT_SPINLOCKS disabled results in 150% performance improvement (updated)

 

-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jeremy Fitzhardinge
Sent: Tuesday, December 21, 2010 9:39 AM
To: Dante Cinco
Cc: Xen-devel; Konrad Rzeszutek Wilk
Subject: [Xen-devel] Re: domU using linux-2.6.37-xen-next pvops kernel with 
CONFIG_PARAVIRT_SPINLOCKS disabled results in 150% performance improvement 
(updated)

On 12/20/2010 05:03 PM, Dante Cinco wrote:
> (Sorry, I accidentally sent the previous post before finishing the 
> summary table)
>
> For a couple of months now, we've been trying to track down the slow 
> I/O performance in pvops domU. Our system has 16 Fibre Channel 
> devices, all PCI-passthrough to domU. We were previously using a
> 2.6.32 (Ubuntu version) HVM kernel and were getting 511k IOPS. We 
> switched to pvops with Konrad's xen-pcifront-0.8.2 kernel and were 
> disappointed to see the performance degrade to 11k IOPS. After 
> disabling some kernel debug options including KMEMLEAK, the 
> performance jumped to 186k IOPS but still well below what we were 
> getting with the HVM kernel. We tried disabling spinlock debugging in 
> the kernel but it actually resulted in a drop in performance to 70k IOPS.
>
> Last week we switched to linux-2.6.37-xen-next and with the same 
> kernel debug options disabled, the I/O performance was slightly better 
> at 211k IOPS. We tried disabling spinlock debugging again and saw a 
> similar drop in performance to 58k IOPS. We searched around for any 
> performance-related posts regarding pvops and found two references to 
> CONFIG_PARAVIRT_SPINLOCKS (one from Jeremy and one from Konrad):
> http://lists.xensource.com/archives/html/xen-devel/2009-05/msg00660.ht
> ml 
> http://lists.xensource.com/archives/html/xen-devel/2010-11/msg01111.ht
> ml
>
> Both posts recommended (Konrad strongly) enabling PARAVIRT_SPINLOCKS 
> when running under Xen. Since it's enabled by default, we decided to 
> see what would happen if we disabled CONFIG_PARAVIRT_SPINLOCKS. With 
> the spinlock debugging enabled, we were getting 205k IOPS but with 
> spinlock debugging disabled, the performance leaped to 522k IOPS !!!
>
> I'm assuming that this behavior is unexpected.

Yeah, that would be one way to put it.

>
> Here's a summary of the kernels, config changes and performance (in
> IOPS):
>
>                       pcifront   linux
>                       0.8.2      2.6.37-xen-next
>                       pvops      pvops
> Spinlock
> debugging enabled,     186k       205k
> PARAVIRT_SPINLOCKS=y
>
> Spinlock
> debugging disabled,     70k        58k
> PARAVIRT_SPINLOCKS=y
>
> Spinlock
> debugging disabled,    247k       522k
> PARAVIRT_SPINLOCKS=n

Fascinating.

Spinlock debugging ends up bypassing all the paths that PARAVIRT_SPINLOCKS 
affects, so that's consistent with the problem being the paravirt locking code.

Basically, there's 3 reasons paravirt spinlocks could slow things down:

   1. the overhead of calling into the pv lock code is costing a lot
      (very hard to imagine how it would cause this degree of slowdown)
   2. you're hitting the spinlock slowpath very often, and end up making
      lots of hypercalls
   3. your system and/or workload gets a very strong benefit from the
      ticket lock's FIFO properties
   4. (something else entirely)

When you're running with PARAVIRT_SPINLOCKS=y, are you getting a lot of counts 
on the per-cpu spinlock irqs?

What happens if you raise the "timeout" threshold?  If you have XEN_DEBUG_FS 
enabled, you can do that on the fly by writing it to 
/sys/kernel/debug/xen/spinlocks/timeout, or adjust TIMEOUT in 
arch/x86/xen/spinlock.c.  In theory, if you set it very large it should have 
the same effect as just disabling PARAVIRT_SPINLOCKS (except still using byte 
locks).  That should help isolate which of the possibilities above are coming 
into play.

The other data in /sys/kernel/debug/xen/spinlocks could be helpful in working 
out what's going on as well.

Do you know if there are specific spinlocks being particularly pounded on by 
your workload?  I'm guessing some specific to your hardware.

Based on lock statistics we got, the top 5 lock contentions are in our software 
stack and the lock contention number are 351375/259924/246715/160125/156188 
respectively. (pretty much consistent with top 5 lock contentions when we were 
running with hvm kernel) We also note that there is kernel option to 
enable/disable Big Kernel Lock in .37 pv-ops. How does this BKL interact with 
PARAVIRT_SPINLOCK ? Does disabling BKL affect the performance ?

Thanks,
    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.