[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Poor HVM performance with 8 vcpus



At 09:16 +0100 on 14 Oct (1255511785), Juergen Gross wrote:
> as the performance of BS2000 seems to be hit by OOS optimization, I'm
> thinking of making a patch to disable this feature by a domain parameter.
> 
> Is there a way to do this without having to change all places where the
> #if statements are placed?
> I think there should be some central routines where adding an "if" could
> be enough (setting oos_active to 0 seems not to be enough, I fear).
> 
> Do you have any hint?

The simplest way is to cause sh_unsync() to immediately return 0.  That
won't be quite as fast as #defining it all away but will avoid the
expensive paths that cause lock contention.  You can add your flag to
the big if statement that's already there to avoid unsafe cases.

Incidentally, although your benchmark does poorly on 8 VCPUs it might be
worth trying a less aggressively targeted benchmark -- we found on
Windows VMs that more realistic tests (e.g. Sysmark) still showed a
slight improvement from the OOS optimization at 8 vcpus.

Cheers,

Tim.

> Juergen Gross wrote:
> > Hi,
> > 
> > Gianluca Guida wrote:
> >> Hi,
> >>
> >> On Wed, Oct 7, 2009 at 8:55 AM, Juergen Gross
> >> <juergen.gross@xxxxxxxxxxxxxx> wrote:
> >>> we've got massive performance problems running a 8 vcpu HVM-guest (BS2000)
> >>> under XEN (xen 3.3.1).
> >>>
> >>> With a specific benchmark producing a rather high load on memory 
> >>> management
> >>> operations (lots of process creation/deletion and memory allocation) the 8
> >>> vcpu performance was worse than the 4 vcpu performance. On other platforms
> >>> (/390, MIPS, SPARC) this benchmark scaled rather well with the number of 
> >>> cpus.
> >>>
> >>> The result of the usage of the software performance counters of XEN seemed
> >>> to point to the shadow lock being the reason. I modified the Hypervisor to
> >>> gather some lock statistics (patch will be sent soon) and found that the
> >>> shadow lock is really the bottleneck. On average 4 vcpus are waiting to 
> >>> get
> >>> the lock!
> >>>
> >>> Is this a known issue?
> >> Acutally, I think so. The OOS optimization is widely known not to be
> >> too scalable at 8vcpus in the current state, since its weak point is
> >> the CR3 switching time increasing linearly with the number of cpus. If
> >> you have lot of processes switches together with lot of PTE writings
> >> (as it seems to be the case for your benchmark) then that's probably
> >> the cause.
> >>
> >> Could you try disabling the OOS optimization from the
> >> SHADOW_OPTIMIZATIONS definition?
> > 
> > Great!
> > First performance data looks okay!
> > We will have to run different benchmarks in different configurations, but I
> > think you gave an excellent hint. :-)
> 
> 
> -- 
> Juergen Gross                 Principal Developer Operating Systems
> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
> Fujitsu Technolgy Solutions               e-mail: juergen.gross@xxxxxxxxxxxxxx
> Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
> D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

-- 
Tim Deegan <Tim.Deegan@xxxxxxxxxx>
Principal Software Engineer, Citrix Systems (R&D) Ltd.
[Company #02300071, SL9 0DZ, UK.]

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.