[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/SMP: don't try to stop already stopped CPUs



>>> On 03.06.19 at 16:12, <roger.pau@xxxxxxxxxx> wrote:
> On Wed, May 29, 2019 at 04:17:49AM -0600, Jan Beulich wrote:
>> In particular with an enabled IOMMU (but not really limited to this
>> case), trying to invoke fixup_irqs() after having already done
>> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea:
>> 
>>  RIP:    e008:[<ffff82d08026a036>] amd_iommu_read_ioapic_from_ire+0xde/0x113
>>  RFLAGS: 0000000000010006   CONTEXT: hypervisor (d0v0)
>>  rax: ffff8320291de00c   rbx: 0000000000000003   rcx: ffff832035000000
>>  rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff82d0805ca840
>>  rbp: ffff83009e8a79c8   rsp: ffff83009e8a79a8   r8:  0000000000000000
>>  r9:  0000000000000004   r10: 000000000008b9f9   r11: 0000000000000006
>>  r12: 0000000000010000   r13: 0000000000000003   r14: 0000000000000000
>>  r15: 00000000fffeffff   cr0: 0000000080050033   cr4: 00000000003406e0
>>  cr3: 0000002035d59000   cr2: ffff88824ccb4ee0
>>  fsb: 00007f2143f08840   gsb: ffff888256a00000   gss: 0000000000000000
>>  ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>>  Xen code around <ffff82d08026a036> 
> (amd_iommu_read_ioapic_from_ire+0xde/0x113):
>>   ff 07 00 00 39 d3 74 02 <0f> 0b 41 81 e4 00 f8 ff ff 8b 10 89 d0 25 00 00
>>  Xen stack trace from rsp=ffff83009e8a79a8:
>>  ...
>>  Xen call trace:
>>     [<ffff82d08026a036>] amd_iommu_read_ioapic_from_ire+0xde/0x113
>>     [<ffff82d08026bf7b>] iommu_read_apic_from_ire+0x10/0x12
>>     [<ffff82d08027f718>] io_apic.c#modify_IO_APIC_irq+0x5e/0x126
>>     [<ffff82d08027f9c5>] io_apic.c#unmask_IO_APIC_irq+0x2d/0x41
>>     [<ffff82d080289bc7>] fixup_irqs+0x320/0x40b
>>     [<ffff82d0802a82c4>] smp_send_stop+0x4b/0xa8
>>     [<ffff82d0802a7b2f>] machine_restart+0x98/0x288
>>     [<ffff82d080252242>] console_suspend+0/0x28
>>     [<ffff82d0802b01da>] do_general_protection+0x204/0x24e
>>     [<ffff82d080385a3d>] x86_64/entry.S#handle_exception_saved+0x68/0x94
>>     [<00000000aa5b526b>] 00000000aa5b526b
>>     [<ffff82d0802a7c7d>] machine_restart+0x1e6/0x288
>>     [<ffff82d080240f75>] hwdom_shutdown+0xa2/0x11d
>>     [<ffff82d08020baa2>] domain_shutdown+0x4f/0xd8
>>     [<ffff82d08023fe98>] do_sched_op+0x12f/0x42a
>>     [<ffff82d08037e404>] pv_hypercall+0x1e4/0x564
>>     [<ffff82d080385432>] lstar_enter+0x112/0x120
>> 
>> Don't call fixup_irqs() and don't send any IPI if there's only one
>> online CPU anyway, and don't call __stop_this_cpu() at all when the CPU
>> we're on was already marked offline (by a prior invocation of
>> __stop_this_cpu()).
>> 
>> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> 
>> --- a/xen/arch/x86/smp.c
>> +++ b/xen/arch/x86/smp.c
>> @@ -302,23 +302,31 @@ static void stop_this_cpu(void *dummy)
>>   */
>>  void smp_send_stop(void)
>>  {
>> -    int timeout = 10;
>> +    unsigned int cpu = smp_processor_id();
>>  
>> -    local_irq_disable();
>> -    fixup_irqs(cpumask_of(smp_processor_id()), 0);
>> -    local_irq_enable();
>> -
>> -    smp_call_function(stop_this_cpu, NULL, 0);
>> -
>> -    /* Wait 10ms for all other CPUs to go offline. */
>> -    while ( (num_online_cpus() > 1) && (timeout-- > 0) )
>> -        mdelay(1);
>> -
>> -    local_irq_disable();
>> -    disable_IO_APIC();
>> -    hpet_disable();
>> -    __stop_this_cpu();
>> -    local_irq_enable();
>> +    if ( num_online_cpus() > 1 )
>> +    {
>> +        int timeout = 10;
>> +
>> +        local_irq_disable();
>> +        fixup_irqs(cpumask_of(cpu), 0);
>> +        local_irq_enable();
>> +
>> +        smp_call_function(stop_this_cpu, NULL, 0);
>> +
>> +        /* Wait 10ms for all other CPUs to go offline. */
>> +        while ( (num_online_cpus() > 1) && (timeout-- > 0) )
>> +            mdelay(1);
>> +    }
>> +
>> +    if ( cpu_online(cpu) )
> 
> Won't this be better placed inside the previous if? Is it valid to
> have a single CPU and try to offline it?

No to the first question, and I'm not sure I see how you came to
the 2nd one. If there's just a single online CPU, then there's no
need to fixup_irqs(), and there's no-one to IPI. Yet the local CPU
still needs to do everything that should happen once in UP mode,
unless this CPU has been offlined already before (as was the
case in Andrew's report, at least as far as I was able to deduce).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.