[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 0/6] XEN scheduling hardening



On 26.07.19 13:56, Dario Faggioli wrote:
[Adding George plus others x86, ARM and core-Xen people]

Hi Andrii,

First of all, thanks a lot for this series!

The problem you mention is a long standing one, and I'm glad we're
eventually starting to properly look into it.

I already have one comment: I think I can see from where this come
from, but I don't think 'XEN scheduling hardening' is what we're doing
in this series... I'd go for something like "xen: sched: improve idle
and vcpu time accounting precision", or something like that.

On Fri, 2019-07-26 at 13:37 +0300, Andrii Anisov wrote:
One of the scheduling problems is a misleading CPU idle time concept.
Now
for the CPU idle time, it is taken an idle vcpu run time. But idle
vcpu run
time includes IRQ processing, softirqs processing, tasklets
processing, etc.
Those tasks are not actual idle and they accounting may mislead CPU
freq
governors who rely on the CPU idle time.

Indeed! And I agree this is quite bad.

The other problem is that pure hypervisor tasks execution time is
charged from
the guest vcpu budget.

Yep, equally bad.

For example, IRQ and softirq processing time are charged
from the current vcpu budget, which is likely the guest vcpu. This is
quite
unfair and may break scheduling reliability.
It is proposed to charge guest
vcpus for the guest actual run time and time to serve guest's
hypercalls and
access to emulated iomem. All the rest is calculated as the
hypervisor run time
(IRQ and softirq processing, branch prediction hardening, etc.)

Right.

While the series is the early RFC, several points are still
untouched:
  - Now the time elapsed from the last rescheduling is not fully
charged from
    the current vcpu budget. Are there any changes needed in the
existing
    scheduling algorithms?

I'll think about it, but out of the top of my head, I don't see how
this can be a problem. Scheduling algorithms (should!) base their logic
and their calculations on actual vcpus' runtime, not much on idle
vcpus' one.

  - How to avoid the absolute top priority of tasklets (what is obeyed
by all
    schedulers so far). Should idle vcpu be scheduled as the normal
guest vcpus
    (through queues, priorities, etc)?

Now, this is something to think about, and try to understand if
anything would break if we go for it. I mean, I see why you'd want to
do that, but tasklets and softirqs works the way they do, in Xen, since
when they were introduced, I believe.

Therefore, even if there wouldn't be any subsystem explicitly relying
on the current behavior (which should be verified), I think we are at
high risk of breaking things, if we change.

We'd break things IMO.

Tasklets are sometimes used to perform async actions which can't be done
in guest vcpu context. Like switching a domain to shadow mode for L1TF
mitigation, or marshalling all cpus for stop_machine(). You don't want
to be able to block tasklets, you want them to run as soon as possible.


That's not to mean it would not be a good change, or that it is
impossible... It's, rather, just to raise some awareness. :-)

  - Idle vcpu naming is quite misleading. It is a kind of system
(hypervisor)
    task which is responsible for some hypervisor work. Should it be
    renamed/reconsidered?

Well, that's a design question, even for this very series, isn't it? I
mean, I see two ways of achieving proper idle time accounting:
1) you leave things as they are --i.e., idle does not only do idling,
    it also does all these other things, but you make sure you don't
    count the time they take as idle time;
2) you move all these activities out of idle, and in some other
    context, and you let idle just do the idling. At that point, time
    accounted to idle will be only actual idle time, as the time it
    took to Xen to do all the other things is now accounted to the new
    execution context which is running them.

And here we are coming back to the idea of a "hypervisor domain" I
suggested about 10 years ago and which was rejected at that time...


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.