[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] xen: make sure stop_machine_run() is always called in a tasklet

On 28.02.2020 10:18, Jürgen Groß wrote:
> On 28.02.20 10:15, Jan Beulich wrote:
>> On 28.02.2020 09:58, Jürgen Groß wrote:
>>> On 28.02.20 09:27, Jan Beulich wrote:
>>>> On 28.02.2020 08:19, Juergen Gross wrote:
>>>>> With core scheduling active it is mandatory for stop_machine_run() to
>>>>> be called in idle context only (so either during boot or in a tasklet),
>>>>> as otherwise a scheduling deadlock would occur: stop_machine_run()
>>>>> does a cpu rendezvous by activating a tasklet on all other cpus. In
>>>>> case stop_machine_run() was not called in an idle vcpu it would block
>>>>> scheduling the idle vcpu on its siblings with core scheduling being
>>>>> active, resulting in a hang.
>>>>> Put a BUG_ON() into stop_machine_run() to test for being called in an
>>>>> idle vcpu only and adapt the missing call site (ucode loading) to use a
>>>>> tasklet for calling stop_machine_run().
>>>>> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
>>>>> ---
>>>>> V2:
>>>>> - rephrase commit message (Julien Grall)
>>>>> ---
>>>>>    xen/arch/x86/microcode.c  | 54 
>>>>> +++++++++++++++++++++++++++++------------------
>>>>>    xen/common/stop_machine.c |  1 +
>>>>>    2 files changed, 35 insertions(+), 20 deletions(-)
>>>> There's no mention anywhere of a connection to your RCU series,
>>>> nor to rcu_barrier(). Yet the change puts a new restriction also
>>>> on its use, and iirc this was mentioned in prior discussion. Did
>>>> I miss anything?
>>> Basically this patch makes the restriction explicit. Without the patch
>>> stop_machine_run() being called outside of a tasklet would just hang
>>> with core scheduling being active. Now it will catch those violations
>>> early even with core scheduling inactive.
>>> Note that currently there are no violations of this restriction anywhere
>>> in the hypervisor other than the one addressed by this patch.
>> Well, there is a connection to core scheduling. Without it, i.e.
>> prior to 4.13, there was no restriction on the placement of
>> rcu_barrier() invocations. According to what you say above, the
>> restriction was implicitly introduced with core scheduling. It
>> should imo be made explicit by attaching a comment, which would
>> (again imo) best be done here because now you also make this
>> case crash without core scheduling in use.
> Okay, I'll add a comment to stop_machine_run() and to rcu_barrier(). The
> rcu_barrier() comment will be then removed by my rcu series again.

Right - the alternative then would be to call out a dependency of
this patch on that series.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.