RE: [Xen-devel] fair scheduling

 

> -----Original Message-----
> From: rahul gundecha [mailto:rahoolgundecha@xxxxxxxxxxx] 
> Sent: 10 May 2007 15:39
> To: Petersson, Mats; harry.smith272@xxxxxxxxx; xen devel
> Subject: RE: [Xen-devel] fair scheduling
> 
> thanks alot mats for detailed clarification, it really helped 
> me as well.
> 
> coming to the issue of hyperthreded CPU, 
> 1) if we can't expect that they will fetch 200% of 
> performance then whats the benefit from using them ? ( I dont 
> know whats prime use of hyperthreding, if this is the real 
> scenario... )

Because it gives more than 100%? If you can improve the overall
throughput of the processor above the 100% that a single core gives by
adding a relatively small amount of extra logic (particularly if the
amount of extra logic is less than the improved performance). 

There are several things that can give hyperthreading an advantage over
non-hyperthreaded processors, but it's a complex subject, and it's got a
lot to do wit pipeline-length, branch predictability and other not so
trivial subjects.

But if you do some research on hyperthreading (particularly in Intel's
implementation), you'll find that on SOME workloads, the performance is
actually decreasing. I can't remember if it was Oracle or IBM that have
a recommendation that Intel processors with Hyperthreading should be
used with Hyperthreading turned off, because that gives better
performance. 

> 2) However given some processors, how to go about designing 
> workload management. Means how I can atleast estimate 
> resource usage. If I need to know that how much load my 
> infrastructure can support, while considering that load can 
> be of mixed type.  As you said " but it does depend A LOT on 
> what the exactly what the two guests are doing " ,  how to do 
> such analysis ?  As it seems that processor architecture will 
> be major player.  What's current approach regarding this.

I don't have a concrete answer here, mostly because it is a VERY complex
subject. But there are tools that can identify on a system-level what
either a particular process, or the entire system, is doing - for
example "oprofile". Analyzing for example how many TLB-misses, L1/L2
cache misses, "cycles waiting for memory access" are things that
oprofile can tell the user. 

Given understanding of what type of operations each application (or set
of applications intended for a VM) is performing, combined with some
understanding of the whole system architecture and the cabalities of the
architecture (such as memory bandwidth, I/O bandwidth, etc), can help
give an indication how a set of VM's will perform. This isn't a precise
science (unless you can predict/simulate the workload interaction very
precisely), as there are things that get much worse if you have the
right/wrong type of interaction (for example if you have a shared cache,
and one application does a lot of memory activity at sequential
locations, it will "destroy" the cache for the other processor(s)
quicker than an application that doesn't fill the cache so
"effectively"). 

Of course, if we ignore the problem with Hyperthreading, a rough
estimation of "cpu-load" will give you a pretty close to the "correct"
answer. 

The problem with hyperthreading is that some of the CPU resources are
shared (in particular the execution units), which means that two
competing threads will have to "wait their turn", a bit like a busy road
that has two lanes of traffic coming into one. It is hard to predict if
you get 80, 100, 120 or 150% of the expected single processor
performance, since it depends specifically on the "success" of the
processors scheduling the two queues of instructions. Highly optimized
code with few branches and few memory load-stalls will have lower
performance than code that has lots of (non-predictable) branches or
memory loads that the processor has to wait for (cache-misses). 

For "proper" multi-core processors, the per-core gain isn't quite 100%
(so two cores will not necessarily give 200% of a single core's
performance), but it's much closer than the hyperthreaded example
(excluding pathological cases such as cache-thrashing[1]). 

Also consider that in many cases, the idea behind using virtualization
for example in server consolidation is to move servers that are severly
under-utilized onto a single system that has similar performance of the
original server, say for example we have three servers running at an
average of 15%, peak 25%. Running those three in one machine would give
loads around 45-75% on the guest-side, which leaves at least 25% for
"overhead", which should be sufficient in most cases.

[1] Cache-thrashing is where one processor has some data in cache, and
the second processor tries to write to the cached location. This will
(in a traditional multi-CPU scenario) force the data to be written to
memory and then read into the cache in the second processor. Under the
"rigth" circumstances, this can significantly lower the performance of
multiprocessor architectures. [I remmember one case where I was
analyzing a benchmark, where the 2P case got about 75% of the 1P case
instead of the theoretical 200%. By re-arranging the data used by two
threads of the same application so that the two data-structures were
cache-line aligned, the overall performance went to 198% of the
theoretical 200%, which was expected]. 

--
Mats

> 
> I read that data centres are increasingly using 
> virtualization, in that case how above said process is carried out. 
> 
> "Petersson, Mats" <Mats.Petersson@xxxxxxx> wrote:
> 
> 
> 
>       > -----Original Message-----
>       > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
>       > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of 
>       > rahul gundecha
>       > Sent: 10 May 2007 13:00
>       > To: harry.smith272@xxxxxxxxx; xen devel
>       > Subject: Re: [Xen-devel] fair scheduling
>       > 
>       > hi everyone,
>       > 
>       > in given case, why performance of one vm affects other vm ? 
>       > if VMs are said to be independent, then why one VM's load 
>       > affects the performance of other VM?
>       > 
>       > as shown by Harry, in both cases CPU consumption by VM is 
>       > same then why webserver performance degrades ?? Does xen 
>       > doesn't provide performance isolation ?
>       
>       Well, it does have caps and weights to allow the 
> scheduler to distribute
>       the CPU performance evenly, but it is, as I explained 
> in a different
>       post just a few seconds ago, not easy to determine such 
> things as the
>       effect of memory, cache, TLB and I/O operations. If one 
> CPU is more busy
>       because it takes longer to get memory, how does the 
> scheduler know this
>       (and more importantly, what should the scheduler DO about it)?
>       
>       
>       If VM's are purely CPU-bound, such as some very simple 
> calculation
>       (small enough that both the code and data fits in cache 
> nicely for all
>       VM's at the same time) should work out nicely fair. 
> When it gets more
>       complex, where memory, cache and IO operations get 
> mixed into the
>       equation, it's so much more complex. 
>       
>       In Harry's case, the matter is further complicated by 
> the fact that half
>       of the "CPU's" are virtual CPU's (hyperthreads), which adds some
>       benefit, but certainly not as much as a real core, so 
> expecting to get
>       400% CPU performance out of 4 hyperthreads is far above 
> what you might
>       expect. 
>       
>       Finally, [I should have thought this through earlier 
> and added it to the
>       previous post to Harry] part of the missing percentage 
> in Harry's case
>       is probaly due to the fact that with one active VM and 
> Dom0 able to run
>       on two different cores, the load on Dom0 can wholly fit 
> in Core1 without
>       affecting Core0's execution at all, where when the vm2 
> is using 100% cpu
>       load, both Core0 and Core1 are fully loaded, which 
> means that the 25%
>       load on Dom0 will have to be shared out across Core0 
> and Core1. This is
>       probably the biggest factor in this case - not memory 
> or IO load.
>       
>       --
>       Mats
>       > 
>       > regards,
>       > -Rahul
>       > 
>       > 
>       > 
>       > Hi Atsushi & Pradeep,
>       > 
>       > thanks for replying back.
>       > I have 4 VCPUs for each of VM. But the point I wanted 
>       > to stress upon is - 
>       > "This happened even in the case where CPU usage by both 
>       > of vm1,vm2 is restricted to 100% each. "
>       > I had pinned all 4 VCPUs of each VM to a single phys. 
>       > CPU. & I have 4 phys. CPUs
>       > means my vm1 was using cpu1, vm2 using cpu2 & domain-0 
>       > using cpu0,cpu3 
>       > 
>       > Problem is when there is no load on vm2, webserver 
>       > performance of vm1 is better. But when vm2 has some 
>       > compute-intense load then vm1 webserver performance goes down.
>       > Please note that CPU consumption of vm1 shown by xentop 
>       > in both cases is 100%, still webserver performance goes down 
>       > by around 15-20%.
>       > Even after trying to isolate two VMs, existence of load 
>       > on one VM is affecting other. 
>       > 
>       > so is it expected behavior ?
>       > 
>       > thanks,
>       > Harry
>       > 
>       > 
>       > 
>       > 
>       > On 5/10/07, pradeep singh rautela wrote:
>       > 
>       > 
>       > 
>       > On 5/10/07, Atsushi SAKAI wrote:
>       > 
>       > One vcpu can use one pcpu at one time.
>       > It means 100% is maxium for one vcpu domain.
>       > If you want to use cpu resources, you 
>       > should set more vcpu.
>       > 
>       > 
>       > Ok, this explains a lot of things. 
>       > As i understand this , more VCPUs means more 
>       > freedom to hypervisor to migrate them among physical CPUs, 
>       > depending on the free PCPUs available. 
>       > 
>       > 
>       > In general 
>       > 
>       > domU1 
>       > / | \
>       > vcpu1 vcpu2 vcpu3 
>       > 
>       > pcpu1 pcpu2 pcpu3 pcpu4 pcpu5 pcpu6 
>       > 
>       > I mean ,domU1 can run on any vcpu , right? now 
>       > vcpu1, vcpu2, vcpu3 share a one to many reationship between 
>       > pcpus[1....6]. That is a vcpu can run on any of the pcus 
>       > available to the Xen hypervisor(unless i explicitly 
> pin it to ). 
>       > 
>       > Is my naive understanding of what you explained 
>       > is correct? 
>       > 
>       > Thank you 
>       > ~psr
>       > 
>       > 
>       > 
>       > Thanks
>       > Atsushi SAKAI
>       > 
>       > 
>       > "pradeep singh rautela" 
>       > wrote:
>       > 
>       > > Hi Atsushi,
>       > >
>       > > On 5/10/07, Atsushi SAKAI < 
>       > sakaia@xxxxxxxxxxxxxx > wrote:
>       > > >
>       > > >
>       > > > You should show detail configuration.
>       > > > Your information is too short.
>       > > >
>       > > > Anyway I guess each domain has one vcpu. 
>       > > > If so, this is normal behavior.
>       > > > Because one vcpu cannot allocate 
>       > two or more pcpu at once.
>       > >
>       > >
>       > > Right, but shouldn't Xen hypervisor 
>       > be capable of migrating the VCPU among 
>       > > the available PCPUs on a 
>       > multiprocessor system, like in this case? And
>       > > criteria should be the load on the 
>       > PCPU or the idle PCPUs.
>       > > yes/no?
>       > >
>       > > Am i missing something here?
>       > >
>       > > Thanks 
>       > > ~psr
>       > >
>       > > Thanks
>       > > > Atsushi SAKAI
>       > > >
>       > > > "Harry Smith" < 
>       > harry.smith272@xxxxxxxxx > wrote:
>       > > >
>       > > > > hi all, 
>       > > > >
>       > > > > I am using xen3.0.3 on dual core 
>       > hyperthreaded processor (in all 4
>       > > > cores).
>       > > > > There are 2 VMs vm1,vm2 among 
>       > which vm1 has a webserver running on it.
>       > > > > 
>       > > > > While testing the performance of 
>       > webserver, when I introduce some load
>       > > > on
>       > > > > vm2 which involves some 
>       > computations the webserver performance goes
>       > > > down.
>       > > > > This happened even in the case 
>       > where CPU usage by both of vm1,vm2 is 
>       > > > > restricted to 100% each.
>       > > > >
>       > > > > Is it expected behavior ? if yes 
>       > then how does one can control addition
>       > > > of
>       > > > > new virtual machines as adding 
>       > every new VM will result in lowering 
>       > > > > performance of other VMs. 
>       > Through scheduling parameters we can just
>       > > > specify
>       > > > > amount of CPU to be used in 
>       > relative sense (weight) & upper limit (cap).
>       > > > But 
>       > > > > how to tackle this point.
>       > > > >
>       > > > > I am new in this area & wanna set 
>       > up a lab using virtualization, so want
>       > > > to
>       > > > > find solution for this.
>       > > > > 
>       > > > > thanks,
>       > > > > Harry
>       > > > >
>       > > > > we always have a choice...
>       > > >
>       > > >
>       > > >
>       > > > 
>       > _______________________________________________
>       > > > Xen-devel mailing list 
>       > > > Xen-devel@xxxxxxxxxxxxxxxxxxx
>       > > > 
>       > http://lists.xensource.com/xen-devel 
>       > 
>       > > >
>       > >
>       > > 
>       > >
>       > > --
>       > > ---
>       > > pradeep singh rautela
>       > >
>       > > "Genius is 1% inspiration, and 99% 
>       > perspiration" - not me :)
>       > 
>       > 
>       > 
>       > 
>       > 
>       > 
>       > 
>       > -- 
>       > ---
>       > pradeep singh rautela 
>       > 
>       > "Genius is 1% inspiration, and 99% 
>       > perspiration" - not me :) 
>       > 
>       > 
>       > _______________________________________________
>       > Xen-devel mailing list
>       > Xen-devel@xxxxxxxxxxxxxxxxxxx
>       > http://lists.xensource.com/xen-devel 
>       > 
>       > 
>       > 
>       > ________________________________
>       > 
>       > Office firewalls, cyber cafes, college labs, don't allow you 
>       > to download CHAT? Here's a solution! 
>       > 
>       > r.yahoo.com/webmessengerpromo.php> 
>       > 
>       
>       
>       
> 
> 
> 
> 
>  
> <file:///C:/DOCUME%7E1/rahool/LOCALS%7E1/Temp/moz-screenshot.j
> pg>  
> <file:///C:/DOCUME%7E1/rahool/LOCALS%7E1/Temp/moz-screenshot-1
> .jpg>  
> <file:///C:/DOCUME%7E1/rahool/LOCALS%7E1/Temp/moz-screenshot-2
> .jpg>  
> <http://mail.google.com/mail/?realattid=f_ez7y22zo&attid=0.1&d
> isp=inline&view=att&th=11146fbd76f18c5f> 
> -Rahooooooooooooooool...
> exceptions are most common things to happen.....  
> <http://us.i1.yimg.com/us.yimg.com/i/mesg/tsmileys2/03.gif> 
> 
> 
> ________________________________
> 
> Here's a new way to find what you're looking for - Yahoo! 
> Answers 
> <http://us.rd.yahoo.com/mail/in/yanswers/*http://in.answers.ya
> hoo.com/>  
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
WARNING - OLD ARCHIVES

xen-devel

RE: [Xen-devel] fair scheduling