Xen project Mailing List

Re: [Xen-devel] Live-Patch application failure in core-scheduling mode

From: Jürgen Groß <jgross@xxxxxxxx>

Date: Fri, 7 Feb 2020 10:58:54 +0100

Cc: Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Fri, 07 Feb 2020 09:59:05 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 07.02.20 10:51, Jan Beulich wrote:

On 07.02.2020 10:25, Jürgen Groß wrote:

On 07.02.20 09:49, Jan Beulich wrote:

On 07.02.2020 09:42, Jürgen Groß wrote:

On 07.02.20 09:23, Jan Beulich wrote:

On 07.02.2020 09:04, Jürgen Groß wrote:

On 06.02.20 15:02, Sergey Dyasli wrote:

On 06/02/2020 11:05, Sergey Dyasli wrote:

On 06/02/2020 09:57, Jürgen Groß wrote:

On 05.02.20 17:03, Sergey Dyasli wrote:

Hello,

I'm currently investigating a Live-Patch application failure in core-
scheduling mode and this is an example of what I usually get:
(it's easily reproducible)

         (XEN) [  342.528305] livepatch: lp: CPU8 - IPIing the other 15 CPUs
         (XEN) [  342.558340] livepatch: lp: Timed out on semaphore in CPU 
quiesce phase 13/15
         (XEN) [  342.558343] bad cpus: 6 9

         (XEN) [  342.559293] CPU:    6
         (XEN) [  342.559562] Xen call trace:
         (XEN) [  342.559565]    [<ffff82d08023f304>] R 
common/schedule.c#sched_wait_rendezvous_in+0xa4/0x270
         (XEN) [  342.559568]    [<ffff82d08023f8aa>] F 
common/schedule.c#schedule+0x17a/0x260
         (XEN) [  342.559571]    [<ffff82d080240d5a>] F 
common/softirq.c#__do_softirq+0x5a/0x90
         (XEN) [  342.559574]    [<ffff82d080278ec5>] F 
arch/x86/domain.c#guest_idle_loop+0x35/0x60

         (XEN) [  342.559761] CPU:    9
         (XEN) [  342.560026] Xen call trace:
         (XEN) [  342.560029]    [<ffff82d080241661>] R _spin_lock_irq+0x11/0x40
         (XEN) [  342.560032]    [<ffff82d08023f323>] F 
common/schedule.c#sched_wait_rendezvous_in+0xc3/0x270
         (XEN) [  342.560036]    [<ffff82d08023f8aa>] F 
common/schedule.c#schedule+0x17a/0x260
         (XEN) [  342.560039]    [<ffff82d080240d5a>] F 
common/softirq.c#__do_softirq+0x5a/0x90
         (XEN) [  342.560042]    [<ffff82d080279db5>] F 
arch/x86/domain.c#idle_loop+0x55/0xb0

The first HT sibling is waiting for the second in the LP-application
context while the second waits for the first in the scheduler context.

Any suggestions on how to improve this situation are welcome.


Can you test the attached patch, please? It is only tested to boot, so
I did no livepatch tests with it.


Thank you for the patch! It seems to fix the issue in my manual testing.
I'm going to submit automatic LP testing for both thread/core modes.


Andrew suggested to test late ucode loading as well and so I did.
It uses stop_machine() to rendezvous cpus and it failed with a similar
backtrace for a problematic CPU. But in this case the system crashed
since there is no timeout involved:

        (XEN) [  155.025168] Xen call trace:
        (XEN) [  155.040095]    [<ffff82d0802417f2>] R 
_spin_unlock_irq+0x22/0x30
        (XEN) [  155.069549]    [<ffff82d08023f3c2>] S 
common/schedule.c#sched_wait_rendezvous_in+0xa2/0x270
        (XEN) [  155.109696]    [<ffff82d08023f728>] F 
common/schedule.c#sched_slave+0x198/0x260
        (XEN) [  155.145521]    [<ffff82d080240e1a>] F 
common/softirq.c#__do_softirq+0x5a/0x90
        (XEN) [  155.180223]    [<ffff82d0803716f6>] F 
x86_64/entry.S#process_softirqs+0x6/0x20

It looks like your patch provides a workaround for LP case, but other
cases like stop_machine() remain broken since the underlying issue with
the scheduler is still there.


And here is the fix for ucode loading (that was in fact the only case
where stop_machine_run() wasn't already called in a tasklet).


This is a rather odd restriction, and hence will need explaining.


stop_machine_run() is using a tasklet on each online cpu (excluding the
one it was called one) for doing a rendezvous of all cpus. With tasklets
always being executed on idle vcpus it is mandatory for
stop_machine_run() to be called on an idle vcpu as well when core
scheduling is active, as otherwise a deadlock will occur. This is being
accomplished by the use of continue_hypercall_on_cpu().


Well, it's this "a deadlock" which is too vague for me. What exactly is
it that deadlocks, and where (if not obvious from the description of
that case) is the connection to core scheduling? Fundamentally such an
issue would seem to call for an adjustment to core scheduling logic,
not placing of new restrictions on other pre-existing code.


This is the main objective of core scheduling: on all siblings of a
core only vcpus of exactly one domain are allowed to be active.

As tasklets are only running on idle vcpus and stop_machine_run()
is activating tasklets on all cpus but the one it has been called on
to rendezvous, it is mandatory for stop_machine_run() to be called on
an idle vcpu, too, as otherwise there is no way for scheduling to
activate the idle vcpu for the tasklet on the sibling of the cpu
stop_machine_run() has been called on.


I can follow all this, but it needs spelling out in the description
of the patch, I think. "only running on idle vcpus" isn't very
precise though, as this ignores softirq tasklets. Which got me to
think of an alternative (faod: without having thought through at
all whether this would indeed be viable): What if stop-machine used
softirq tasklets instead of "ordinary" ones?

This would break its use for entering ACPI S3 state where it relies on all guest vcpus being descheduled. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.