[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?



On Tue, Mar 1, 2016 at 4:51 PM, Sander Eikelenboom <linux@xxxxxxxxxxxxxx> wrote:
>
> Tuesday, March 1, 2016, 9:39:25 PM, you wrote:
>
>> On Tue, Mar 01, 2016 at 02:52:14PM -0500, Meng Xu wrote:
>>> Hi Elena,
>>>
>>> Thank you very much for sharing this! :-)
>>>
>>> On Tue, Mar 1, 2016 at 1:20 PM, Elena Ufimtseva
>>> <elena.ufimtseva@xxxxxxxxxx> wrote:
>>> >
>>> > On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote:
>>> > > On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk
>>> > > <konrad.wilk@xxxxxxxxxx> wrote:
>>> > > >> > Hey!
>>> > > >> >
>>> > > >> > CC-ing Elena.
>>> > > >>
>>> > > >> I think you forgot you cc.ed her..
>>> > > >> Anyway, let's cc. her now... :-)
>>> > > >>
>>> > > >> >
>>> > > >> >> We are measuring the execution time between native machine 
>>> > > >> >> environment
>>> > > >> >> and xen virtualization environment using PARSEC Benchmark [1].
>>> > > >> >>
>>> > > >> >> In virtualiztion environment, we run a domU with three VCPUs, 
>>> > > >> >> each of
>>> > > >> >> them pinned to a core; we pin the dom0 to another core that is not
>>> > > >> >> used by the domU.
>>> > > >> >>
>>> > > >> >> Inside the Linux in domU in virtualization environment and in 
>>> > > >> >> native
>>> > > >> >> environment,  We used the cpuset to isolate a core (or VCPU) for 
>>> > > >> >> the
>>> > > >> >> system processors and to isolate a core for the benchmark 
>>> > > >> >> processes.
>>> > > >> >> We also configured the Linux boot command line with isocpus= 
>>> > > >> >> option to
>>> > > >> >> isolate the core for benchmark from other unnecessary processes.
>>> > > >> >
>>> > > >> > You may want to just offline them and also boot the machine with 
>>> > > >> > NUMA
>>> > > >> > disabled.
>>> > > >>
>>> > > >> Right, the machine is booted up with NUMA disabled.
>>> > > >> We will offline the unnecessary cores then.
>>> > > >>
>>> > > >> >
>>> > > >> >>
>>> > > >> >> We expect that execution time of benchmarks in xen virtualization
>>> > > >> >> environment is larger than the execution time in native machine
>>> > > >> >> environment. However, the evaluation gave us an opposite result.
>>> > > >> >>
>>> > > >> >> Below is the evaluation data for the canneal and streamcluster 
>>> > > >> >> benchmarks:
>>> > > >> >>
>>> > > >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial
>>> > > >> >> Native: 6.387s
>>> > > >> >> Virtualization: 5.890s
>>> > > >> >>
>>> > > >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial
>>> > > >> >> Native: 5.276s
>>> > > >> >> Virtualization: 5.240s
>>> > > >> >>
>>> > > >> >> Is there anything wrong with our evaluation that lead to the 
>>> > > >> >> abnormal
>>> > > >> >> performance results?
>>> > > >> >
>>> > > >> > Nothing is wrong. Virtualization is naturally faster than 
>>> > > >> > baremetal!
>>> > > >> >
>>> > > >> > :-)
>>> > > >> >
>>> > > >> > No clue sadly.
>>> > > >>
>>> > > >> Ah-ha. This is really surprising to me.... Why will it speed up the
>>> > > >> system by adding one more layer? Unless the virtualization disabled
>>> > > >> some services that occur in native and interfere with the benchmark.
>>> > > >>
>>> > > >> If virtualization is faster than baremetal by nature, why we can see
>>> > > >> that some experiment shows that virtualization introduces overhead?
>>> > > >
>>> > > > Elena told me that there were some weird regression in Linux 4.1 - 
>>> > > > where
>>> > > > CPU burning workloads were _slower_ on baremetal than as guests.
>>> > >
>>> > > Hi Elena,
>>> > > Would you mind sharing with us some of your experience of how you
>>> > > found the real reason? Did you use some tool or some methodology to
>>> > > pin down the reason (i.e,  CPU burning workloads in native is _slower_
>>> > > on baremetal than as guests)?
>>> > >
>>> >
>>> > Hi Meng
>>> >
>>> > Yes, sure!
>>> >
>>> > While working on performance tests for smt-exposing patches from Joao
>>> > I run CPU bound workload in HVM guest and using same kernel in baremetal
>>> > run same test.
>>> > While testing cpu-bound workload on baremetal linux (4.1.0-rc2)
>>> > I found that the time to complete the same test is few times more that
>>> > as it takes for the same under HVM guest.
>>> > I have tried tests where kernel threads pinned to cores and without 
>>> > pinning.
>>> > The execution times are most of the times take as twice longer, sometimes 
>>> > 4
>>> > times longer that HVM case.
>>> >
>>> > Interesting is not only that it takes sometimes 3-4 times more
>>> > than HVM guest, but also that test with bound threads (to cores) takes 
>>> > almost
>>> > 3 times longer
>>> > to execute than running same cpu-bound test under HVM (in all
>>> > configurations).
>>>
>>>
>>> wow~ I didn't expect the native performance can be so "bad".... ;-)
>
>> Yes, quite a surprise :)
>>>
>>> >
>>> >
>>> > I run each test 5 times and here are the execution times (seconds):
>>> >
>>> > -------------------------------------------------
>>> >         baremetal           |
>>> > thread_bind | thread unbind | HVM pinned to cores
>>> > ----------- |---------------|---------------------
>>> >      74     |     83        |        28
>>> >      74     |     88        |        28
>>> >      74     |     38        |        28
>>> >      74     |     73        |        28
>>> >      74     |     87        |        28
>>> >
>>> > Sometimes better times were on unbinded tests, but not often enough
>>> > to present it here. Some results are much worse and reach up to 120
>>> > seconds.
>>> >
>>> > Each test has 8 kernel threads. In baremetal case I tried the following:
>>> > - numa off,on;
>>> > - all cpus are on;
>>> > - isolate cpus from first node;
>>> > - set intel_idle.max_cstate=1;
>>> > - disable intel_pstate;
>>> >
>>> > I dont think I have exhausted all the options here, but it looked like
>>> > two last changes did improve performance, but was still not comparable to
>>> > HVM case.
>>> > I am trying to find where regression had happened. Performance on newer
>>> > kernel (I tried 4.5.0-rc4+) was close or better than HVM.
>
> Just a perhaps silly thought .. but could there be something in the
> time-measuring that could differ and explain the slightly surprising results ?

Thanks Sander! Actually, I also thought about this reason as Elena
did. If it's the time-measuring, the difference about the execution
time should not vary for different types of workload/programs. That's
why I think the time measurement is not the reason here (at least not
the main reason). :-)

Best,

Meng

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.