Re: [Xen-devel] [PATCH] xen: make sure stop_machine_run() is always called in a tasklet

On 14.02.20 18:34, Igor Druzhinin wrote:
On 14/02/2020 16:39, Jürgen Groß wrote:
On 14.02.20 15:06, Igor Druzhinin wrote:
On 11/02/2020 09:35, Juergen Gross wrote:
With core scheduling active it is mandatory for stop_machine_run() to
be called in a tasklet only, as otherwise a scheduling deadlock would
occur: stop_machine_run() does a cpu rendezvous by activating a tasklet
on all other cpus. In case stop_machine_run() was not called in an idle
vcpu it would block scheduling the idle vcpu on its siblings with core
scheduling being active, resulting in a hang.

I suppose rcu_barrier() is fine due to process_pending_softirqs() being
called inside? I'm a little concerned by imposing is_vcpu_idle() restriction
in that case as rcu_barrier() could be technically called from a non-tasklet

No, stop_machine_run() with core scheduling active can only work when
called in an idle vcpu.

OTOH it would be fairly easy to add another softirq for a similar
purpose and have a sync_machine_run() using that instead of tasklets.
This could be used for ucode loading, too.

stop_machine_run() and sync_machine_run() could use a common main
function. The patch should be rather simple.


I have a patch on the list (which I was planning to send a v2 for) that
fixes another issue with rcu_barrier():

As I understand it now that wouldn't work with core-scheduling. Do you think
it's possible to synchronously wait for tasklets to finish in non-tasklet
context (because that's what the purpose of rcu_barrier() is)?

No, won't work, unless we add preemption (basically would need per-vcpu
stacks instead of per-pcpu ones).

What might work IMO would be to do rcu_process_callbacks() no longer
during idle, but to have a specific softirq for that purpose. This would
remove the need to involve scheduling for rcu_barrier(). A brief check
of process_pending_softirqs() callers seems to allow that, but I'd like
to have a second opinion from someone having more rcu knowledge than me.
Single problematic users of process_pending_softirqs() could still be
switched to a variant not allowing the new rcu softirq.


