[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC 00/49] xen: add core scheduling support



On Fri, 2019-03-29 at 19:16 +0100, Dario Faggioli wrote:
> On Fri, 2019-03-29 at 16:08 +0100, Juergen Gross wrote:
> > I have done some very basic performance testing: on a 4 cpu system
> > (2 cores with 2 threads each) I did a "make -j 4" for building the
> > Xen
> > hypervisor. With This test has been run on dom0, once with no other
> > guest active and once with another guest with 4 vcpus running the
> > same
> > test.
> Just as an heads up for people (as Juergen knows this already :-D),
> I'm
> planning to run some performance evaluation of this patches.
> 
> I've got an 8 CPUs system (4 cores, 2 threads each, no-NUMA) and an
> 16
> CPUs system (2 sockets/NUMA nodes, 4 cores each, 2 threads each) on
> which I should be able to get some bench suite running relatively
> easy
> and (hopefully) quick.
> 
> I'm planning to evaluate:
> - vanilla (i.e., without this series), SMT enabled in BIOS
> - vanilla (i.e., without this series), SMT disabled in BIOS
> - patched (i.e., with this series), granularity=thread
> - patched (i.e., with this series), granularity=core
> 
> I'll do start with no overcommitment, and then move to 2x
> overcommitment (as you did above).
> 
I've got the first set of results. It's fewer than I wanted/expected to
have at this point in time, but still...

Also, it's Phoronix again. I don't especially love it, but I'm still
working on convincing our own internal automated benchmarking tool
(which I like a lot more :-) ) to be a good friend of Xen. :-P

It's a not too big set of tests, done in the following conditions:
- hardware: Intel Xeon E5620; 2 NUMA nodes, 4 cores and 2 threads each
- slow disk (old rotational HDD)
- benchmarks run in dom0
- CPU, memory and some disk IO benchmarks
- all Spec&Melt mitigations disabled both at Xen and dom0 kernel level
- cpufreq governor = performance, max_cstate = C1
- *non* debug hypervisor

In just one sentence, what I'd say is "So far so god" :-D

https://openbenchmarking.org/result/1904105-SP-1904100DA38

1) 'Xen dom0, SMT On, vanilla' is staging *without* this series even 
    applied
2) 'Xen dom0, SMT on, patched, sched_granularity=thread' is with this 
    series applied, but scheduler behavior as right now
3) 'Xen dom0, SMT on, patched, sched_granularity=core' is with this 
    series applied, and core-scheduling enabled
4) 'Xen dom0, SMT Off, vanilla' is staging *without* this series 
    applied, and SMT turned off in BIOS (i.e., we only have 8 CPUs)

So, comparing 1 and 4, we see, for each specific benchmark, what is the
cost of disabling SMT (or vice-versa, the gain of using SMT).

Comparing 1 and 2, we see the overhead introduced by this series, when
it is not used to achieve core-scheduling.

Compating 1 and 3, we see the differences with what we have right now,
and what we'll have with core-scheduling enabled, as it is implemented
in this series.

Some of the things we can see from the results:
- disabling SMT (i.e., 1 vs 4) is not always bad, but it is bad 
  overall, i.e., if you look at how many tests are better and at how 
  many are slower, with SMT off (and also, by how much). Of course, 
  this can be considered true for these specific benchmarks, on this 
  specific hardware and with this configuration
- the overhead introduced by this series is, overall, pretty small, 
  apart from not more than a couple of exceptions (e.g., Stream Triad 
  or zstd compression). OTOH, there seem to be cases where this series 
  improves performance (e.g., Stress-NG Socket Activity)
- the performance we achieve with core-scheduling are more than 
  acceptable
- between core-scheduling and disabling SMT, core-scheduling wins and
  I wouldn't even call it a match :-P

Of course, other thoughts, comments, alternative analysis are welcome.

As said above, this is less that what I wanted to have, and in fact I'm
running more stuff.

I have a much more comprehensive set of benchmarks running in these
days. It being "much more comprehensive", however, also means it takes
more time.

I have a newer and faster (both CPU and disk) machine, but I need to
re-purpose it for benchmarking purposes.

At least now that the old Xeon NUMA box is done with this first round,
I can use it for:
- running the tests inside a "regular" PV domain
- running the tests inside more than one PV domain, i.e. with some 
  degree of overcommitment

I'll push out results as soon as I have them.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.