[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVM CPU enumeration, mapping to VCPU ID (Was: Re: [Xen-users] FreeBSD PVHVM call for testing)



On Tue, Jun 04, 2013 at 10:22:24AM -0400, Konrad Rzeszutek Wilk wrote:
> > > > > The new hypercall to figure this out could be used, but that wouldn't
> > > > > explain why we are failing to start on the correct VCPU?
> > > > 
> > > > I didn't follow the jump here. Can you provide an example?
> > > 
> > > http://lists.xen.org/archives/html/xen-devel/2013-05/msg00941.html
> > 
> > OK, got it.
> > 
> > [   84.619508] smpboot: Booting Node 0 Processor 1 APIC 0x8
> > 
> > So it seems like, in this case:
> > 
> > int __cpuinit native_cpu_up(unsigned int cpu)
> > {
> >         int apicid = apic->cpu_present_to_apicid(cpu);
> > 
> > apic->cpu_present_to_apicid(1) returned 8 instead of 2.
> > 
> > All of that should have been set up correctly ahead of time by
> > generic_processor_info() for all possible CPUs. Do you have the full
> > boot log?
> > 
> 
..snip...
> [    0.000000] ACPI: PM-Timer IO Port: 0xb008
> [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] disabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] disabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] disabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x08] disabled)
                                                     ^^^^ - take a note of that
.. snip..
> [   84.585905] CPU 1 got hotplugged
> [   84.590192] installing Xen timer for CPU 1
> [   84.596371] SMP alternatives: lockdep: fixing up alternatives
> [   84.603560] SMP alternatives: switching to SMP code
> [   84.619508] smpboot: Booting Node 0 Processor 1 APIC 0x8
                                                          ^^^ and that

> [   84.639766] ------------[ cut here ]------------
> [   84.639766] WARNING: at 
> /home/konrad/ssd/konrad/linux/arch/x86/xen/time.c:336 
> xen_vcpuop_set_mode+0xc2/0xd0()
> [   84.639766] Hardware name: HVM domU

I discussed this with Matt over IRC, but the analysis was that the APIC
ID is wrong. Instead of using APIC 0x02 for CPU1, it ended up using APIC 0x08
(which is for CPU4). And that triggered the xen_vcpuop_set_mode to fail
(as the hypervisor would say - you are running on CPU4, not CPU1, return
-ENODEV) and we hit the BUG_ON() in the Linux.

Chuck Anderson (CC-ed here) discovered that he was hitting this as well
and realized that if he was using the QEMU with these three patches:

169b8fa piix4acpi, xen, hotplug: Fix race with ACPI AML code and hotplug.
309149c piix4acpi, xen: Clarify that the qemu_set_irq calls just do an IRQ 
pulse.
82b10d1 piix4acpi, xen, vcpu hotplug: Split the notification from the changes.

the problem would go away. We did not go any deeper to figure out the culprit.

My money is on the AML code modifying the MADT and while it is executing
another ACPI GSI is triggered (the old code would trigger it for every CPU
being hot-plugged) and as the Linux ACPI AML interpreter reads the MADT and 
writes
out the newly updated MADT another of the CPUs ends up reading the MADT as well 
-
but ends up with the data being bogus as the AML interpreter did not finish
and got garbage.

Either way, the resolution is if you see this - the fix is to update the
qemu to have those patches.

Thanks Chuck for digging in this and finding the clues.


P.S.
I don't know how SeaBIOS does the CPU hotplug, but it might be worth looking
at that too at some point.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.