[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Design and Question: Eliminate Xen (RTDS) scheduler overhead on dedicated CPU



On Tue, Mar 24, 2015 at 3:50 AM, Meng Xu <xumengpanda@xxxxxxxxx> wrote:
> Hi Dario and George,
>
> I'm exploring the design choice of eliminating the Xen scheduler overhead on
> the dedicated CPU. A dedicated CPU is a PCPU that has a full capacity VCPU
> pinned onto it and no other VCPUs will run on that PCPU.

Hey Meng!  This sounds awesome, thanks for looking into it.


> [Problems]
> The issue I'm encountering is as follows:
> After I implemented the dedicated cpu feature, I compared the latency of a
> cpu-intensive task in domU on dedicated CPU (denoted as R_dedcpu) and the
> latency on non-dedicated CPU (denoted as R_nodedcpu). The expected result
> should be R_dedcpu < R_nodedcpu since we avoid the scheduler overhead.
> However, the actual result is R_dedcpu > R_nodedcpu, and R_dedcpu -
> R_nodedcpu ~= 1000 cycles.
>
> After adding some trace to every function that may raise the
> SCHEDULE_SOFTIRQ, I found:
> When a cpu is not marked as dedicated cpu and the scheduler on it is not
> disabled, the vcpu_block() is triggered 2896 times during 58280322928ns
> (i.e., triggered once every 20,124,421ns in average) on the dedicated cpu.
> However,
> When I disable the scheduler on a dedicated cpu, the function
> vcpu_block(void) @schedule.c will be triggered very frequently; the
> vcpu_block(void) is triggered 644824 times during 8,918,636,761ns (i.e.,
> once every 13831ns in average) on the dedicated cpu.
>
> To sum up the problem I'm facing, the vcpu_block(void) is trigger much
> faster and more frequently when the scheduler is disabled on a cpu than when
> the scheduled is enabled.
>
> [My question]
> I'm very confused at the reason why vcpu_block(void) is triggered so
> frequently when the scheduler is disabled.  The vcpu_block(void) is called
> by the SCHEDOP_block hypercall, but why this hypercall will be triggered so
> frequently?
>
> It will be great if you know the answer directly. (This is just a pure hope
> and I cannot really expect it. :-) )
> But I really appreciate it if you could give me some directions on how I
> should figure it out. I grepped vcpu_block(void) and SCHEDOP_block  in the
> xen code base, but didn't found much call to them.
>
> What confused me most is that  the dedicated VCPU should be blocked less
> frequently instead of more frequently when the scheduler is disabled on the
> dedicated CPU, because the dedicated VCPU is always running on the CPU now
> without the hypervisor scheduler's interference.

So if I had to guess, I would guess that you're not actually blocking
when the guest tries to block.  Normally if the guest blocks, it
blocks in a loop like this:

do {
  enable_irqs();
  hlt;
  disable_irqs;
} while (!interrup_pending);

For a PV guest, the hlt() would be replaced with a PV block() hypercall.

Normally, when a guest calls block(), then it's taken off the
runqueue; and if there's nothing on the runqueue, then the scheduler
will run the idle domain; it's the idle domain that actually does the
blocking.

If you've hardwired it always to return the vcpu in question rather
than the idle domain, then it will never block -- it will busy-wait,
calling block millions of times.

The simplest way to get your prototype working, in that case, would be
to return the idle vcpu for that pcpu if the guest is blocked.

But a brief comment on your design:

Looking at your design at the moment, you will get rid of the overhead
of the scheduler-related interrupts, and any pluggable-cpu accounting
that needs to happen (e.g., calculating credits burned, &c).  And
that's certainly not nothing.  But it's not really accurate to say
that you're avoiding the scheduler entirely.  At the moment, as far as
I can tell, you're still going through all the normal schedule.c
machinery between wake-up and actually running the vm; and the normal
machinery for interrupt delivery.

I'm wondering -- are people really going to want to just pin a single
vcpu from a domain like this?  Or are they going to want to pin all
vcpus from a given domain?

For the first to be useful, the guest OS would need to understand
somehow that this cpu has better properties than the other vcpus on
its system.  Which I suppose could be handled manually (e.g., by the
guest admin pinning processes to that cpu or something).

The reason I'm asking is because another option that would avoid the
need for special per-cpu flags would to make a "sched_place" scheduler
(sched_partition?), which would essentially do what you've done here
-- when you add a vcpu to the scheduler, it simply chooses one of its
free cpus and dedicates it to that vcpu.  If no such cpus are
available, it returns an error.  In that case, you could use the
normal cpupool machinery to assign cpus to that scheduler, without
needing to introduce these extra flags, and to make each of the
pluggable schedulers need to deal with the complexity of implementing
the "dedicated" scheduling.

The only downside is that at the moment you can't have a domain cross
cpupools; so either all vcpus of a domain would have to be dedicated,
or none.

Thoughts?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.