[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] VCPU migration overhead conpensation ?

Hi, Dario and George:

Thanks for your reply. I am doing the measurement within Xen. I agree that a more precise description would be " the time spent performing scheduling and context switch ". and it does NOT consider cache. (I think the cache effect should be measured in the application level. not in hypervisor.)

Basically I add code in trace.h, schedule.c, and domain.c (for context_switch). I measure three time here:
1. The time spent in making scheduling decisions. defined as scheduling latency. It includesÂgrabbingÂthe spinlock, call the scheduler dependent do_scheduler() function, and release the spinlock;
2. If the scheduler make a decision to perform a context switch (prev != next), it will call context_switch() in domain.c. I measure the time spent, right before the context_saved(prev);
3. The time spent in context_saved(prev), which is to insert the current VCPU back to runq. If a global queue is used and protected by a spinlock, this would include the time to grab the lock and so on.

I ran the experiments on a i7 6 core machine, hyper-thread disabled, speedstep disabled. It has one socket. each core has dedicated L1 and L2 cache, and all of them share L3 cache. Host is CentOS 6.3 with Linux 3.4.35, Xen is 4.1.4.
Domain 0 boots with one VCPU and pinned to core 0. one guest VM boots with 5 VCPUs, and the cpumask set for the remaining 5 VCPUs. I am using a gedf scheduler which I am working on. it isÂessentiallyÂa sedf scheduler sharing one global queue.
In the guest VM, I just ran 5 cpu busy loops to make all the VCPUs busy.

The data is get via xentrace with -M option, and I measured it for 10 seconds.

The numbers are scheduler dependent, but the absolute number spent in context_switch() should be scheduler independent. And in my measurement, the worst case happens when we do context switch from a IDLE VCPU to a busy VCPU (which is scheduled previously on a different core), and perform the tlb_flush. It is around 2 microseconds on my machine.

I am attaching the cdf plots of these three measurements.Â

Any feedback is welcome.



On Wed, Apr 10, 2013 at 3:25 AM, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote:
On mer, 2013-04-10 at 09:21 +0100, George Dunlap wrote
> On 10/04/13 06:44, Sisu Xi wrote:
> > I am also performing some overhead measurement for the scheduler. If a
> > VCPU is migrated from one core to another, the overhead is around 2
> > microseconds on my machine, which is much less than what is set in
> > Credit2 (50 microseconds).
> When you say "overhead", I assume you mean that's how long the whole
> migration takes?
> The point of the compensation isn't so much for the actual migration
> itself, but for the lower performance the vcpu will get after the
> migration due to having cold caches.
Which is, BTW, right the effect that we were trying to measure
(although, again, it was Linux, not Xen at that time), with the
experiments I was describing in my e-mail...

Might have been obvious, but I think it's worth making it even more that
(thanks George :-) ), since I agree with George that _this_ is what we
should be concerned about, when it comes to migration.


<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Sisu Xi, PhD Candidate

Department of Computer Science and Engineering
Campus Box 1045
Washington University in St. Louis
One Brookings Drive
St. Louis, MO 63130

Attachment: gedf_one_nopin_context_saved.eps
Description: PostScript document

Attachment: gedf_one_nopin_context_switch.eps
Description: PostScript document

Attachment: gedf_one_nopin_sched_latency.eps
Description: PostScript document

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.