[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable: Bisected Host boot failure on AMD Phenom



On 02/03/17 20:17, Andrew Cooper wrote:
> On 02/03/17 19:15, Boris Ostrovsky wrote:
>> On 03/02/2017 01:56 PM, Andrew Cooper wrote:
>>> On 02/03/17 18:51, Sander Eikelenboom wrote:
>>>> On 02/03/17 19:29, Andrew Cooper wrote:
>>>>> On 02/03/17 18:25, Sander Eikelenboom wrote:
>>>>>> On 02/03/17 18:38, Andrew Cooper wrote:
>>>>>>> On 02/03/17 17:29, Sander Eikelenboom wrote:
>>>>>>>> On 02/03/17 15:55, Andrew Cooper wrote:
>>>>>>>>> On 02/03/17 14:42, Sander Eikelenboom wrote:
>>>>>>>>>> Hi Andrew / Jan,
>>>>>>>>>>
>>>>>>>>>> While testing current xen-unstable staging i ran into my host 
>>>>>>>>>> rebooting in early kernel boot. 
>>>>>>>>>> Bisection has turned up:
>>>>>>>>>>     5cecf60f439e828f4bc0d2a368ced9a73b130cb7 is the first bad commit
>>>>>>>>>>     Author: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>>>>>>>     Date:   Fri Feb 17 17:10:50 2017 +0000
>>>>>>>>>>
>>>>>>>>>>     x86/cpuid: Handle leaf 0x1 in guest_cpuid()
>>>>>>>>>>
>>>>>>>>>> Hardware is a AMD phenom x6.
>>>>>>>>>> Below is the output of serial console of a failed boot.
>>>>>>>>> Hmm.  Sorry for breaking this (although my AMD servers are booting 
>>>>>>>>> fine).
>>>>>>>> No problem, it is the staging branch of the unstable tree anyway ;-)
>>>>>>>>
>>>>>>>>> It is unfortunately not entirely obvious what Linux is objecting to, 
>>>>>>>>> and
>>>>>>>>> must be related to something visible in the emulated view.
>>>>>>>>>
>>>>>>>>> Does this delta make any difference?
>>>>>>>> Yes it does, boots fine with this patch applied, thanks !
>>>>>>> That is bad though. :s
>>>>>>>
>>>>>>> It means that something in dom0 has an aversion to my attempt to lie
>>>>>>> less about the topology.
>>>>>>>
>>>>>>> Do you mind checking whether
>>>>>>>
>>>>>>> res->b = cpuid_ebx(0x1) & 0xff00ffffu;
>>>>>>>
>>>>>>> causes is to break again?
>>>>>> Used that in the is_hardware_domain() case and it boots fine.
>>>>> Hmm - curious.  I am now even more confused.
>>>>>
>>>>> What about this?
>>>>>
>>>>> res->b = cpuid_ebx(0x1) & 0x00ffffffu;
>>>>>
>>>>> It will leave the APIC_ID field zeroed rather than feeding v->vcpu_id
>>>>> back into it.
>>>> Also boots fine.
>>> Right.  For my sanity, what about
>>>
>>> res->b = cpuid_ebx(0x1) & 0x00ffffffu;
>>> res->b |= (v->vcpu_id * 2) << 24;

This doesn't boot.
--
Sander

>> FWIW, I booted a 2-node
>>
>>   (XEN) CPU Vendor: AMD, Family 21 (0x15), Model 1 (0x1), Stepping 2
>> (raw 00600f12)
>>
>> with Linux 4.10 and latest staging. (I thought perhaps my nightly missed
>> something because it's a single node)
> 
> I expect it might have something to do with fact that this failure to
> boot is a 6-core system, rather than a power of two, at which point I
> doubt the APIC IDs follow a linear trend.
> 
> (Properly fixing the reported topology is going to be a can of worms. 
> All this series is trying to do is use just enough duct-tape to get the
> hypervisor into a state where we can sensibly fix the reported topology.)
> 
> ~Andrew
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.