[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [linux-linus bisection] complete test-amd64-amd64-xl-pvh-intel



On 20/02/2017 00:20, Andrew Cooper wrote:
> On 19/02/2017 23:20, osstest service owner wrote:
>> branch xen-unstable
>> xenbranch xen-unstable
>> job test-amd64-amd64-xl-pvh-intel
>> testid guest-start
>>
>> Tree: linux 
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>> Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
>> Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
>> Tree: qemuu git://xenbits.xen.org/qemu-xen.git
>> Tree: xen git://xenbits.xen.org/xen.git
>>
>> *** Found and reproduced problem changeset ***
>>
>>   Bug is in tree:  xen git://xenbits.xen.org/xen.git
>>   Bug introduced:  ab914e04a62727b75782e401eaf2e8b72f717f61
>>   Bug not present: 2f4d2198a9b3ba94c959330b5c94fe95917c364c
>>   Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/105915/
>>
>>
>>   commit ab914e04a62727b75782e401eaf2e8b72f717f61
>>   Author: Jan Beulich <jbeulich@xxxxxxxx>
>>   Date:   Fri Feb 17 15:51:03 2017 +0100
>>   
>>       x86: package up context switch hook pointers
>>       
>>       They're all solely dependent on guest type, so we don't need to repeat
>>       all the same three pointers in every vCPU control structure. Instead 
>> use
>>       static const structures, and store pointers to them in the domain
>>       control structure.
>>       
>>       Since touching it anyway, take the opportunity and expand
>>       schedule_tail() in the only two places invoking it, allowing the macro
>>       to be dropped.
>>       
>>       Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>       Reviewed-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
>>       Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>       Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx>
> From
> http://logs.test-lab.xenproject.org/osstest/logs/105917/test-amd64-amd64-xl-pvh-intel/serial-fiano0.log
> around Feb 19 23:12:06.269706
>
> (XEN) ----[ Xen-4.9-unstable  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    2
> (XEN) RIP:    e008:[<ffff82d08016795a>]
> domain.c#__context_switch+0x1a3/0x3e3
> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor (d1v0)
> (XEN) rax: 0000000000000000   rbx: 0000000000000002   rcx: 0000000000000000
> (XEN) rdx: 00000031fd44b600   rsi: 0000000000000003   rdi: ffff83007de27000
> (XEN) rbp: ffff83027d78fdb0   rsp: ffff83027d78fd60   r8:  0000000000000000
> (XEN) r9:  0000005716f6126f   r10: 0000000000007ff0   r11: 0000000000000246
> (XEN) r12: ffff83007de27000   r13: ffff83027fb74000   r14: ffff83007dafd000
> (XEN) r15: ffff83027d7c8000   cr0: 000000008005003b   cr4: 00000000001526e0
> (XEN) cr3: 000000007dd05000   cr2: 0000000000000008
> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen code around <ffff82d08016795a>
> (domain.c#__context_switch+0x1a3/0x3e3):
> (XEN)  85 68 07 00 00 4c 89 e7 <ff> 50 08 4c 89 ef e8 36 e1 02 00 41 80
> bd 78 08
> (XEN) Xen stack trace from rsp=ffff83027d78fd60:
> (XEN)    ffff83027d78ffff 0000000000000003 0000000000000000 0000000000000000
> (XEN)    0000000000000000 ffff83007de27000 ffff83007dafd000 ffff83027fb74000
> (XEN)    0000000000000002 ffff83027d7c8000 ffff83027d78fe20 ffff82d08016bf1f
> (XEN)    ffff82d080131ae2 ffff83027d78fde0 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 ffff83027d78fe20 ffff83007dafd000
> (XEN)    ffff83007de27000 0000005716f5e5da ffff83027d796148 0000000000000001
> (XEN)    ffff83027d78feb0 ffff82d08012def9 ffff83027d7955a0 ffff83027d796160
> (XEN)    0000000200000004 ffff83027d796140 ffff83027d78fe70 ffff82d08014af39
> (XEN)    ffff83027d78fe70 ffff83007de27000 0000000001c9c380 ffff82d0801bf800
> (XEN)    000000107dafd000 ffff82d080322b80 ffff82d080322a80 ffffffffffffffff
> (XEN)    ffff83027d78ffff ffff83027d780000 ffff83027d78fee0 ffff82d08013128f
> (XEN)    ffff83027d78ffff ffff83007dd4c000 ffff83027d7c8000 00000000ffffffff
> (XEN)    ffff83027d78fef0 ffff82d0801312e4 ffff83027d78ff10 ffff82d080167582
> (XEN)    ffff82d0801312e4 ffff83007dafd000 ffff83027d78fdc8 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    ffffffff82374000 0000000000000000 0000000000000000 ffffffff81f59180
> (XEN)    0000000000000000 0000000000000200 ffffffff82390000 0000000000000000
> (XEN)    0000000000000000 02ffff8000000000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08016795a>] domain.c#__context_switch+0x1a3/0x3e3
> (XEN)    [<ffff82d08016bf1f>] context_switch+0x147/0xf0d
> (XEN)    [<ffff82d08012def9>] schedule.c#schedule+0x5ba/0x615
> (XEN)    [<ffff82d08013128f>] softirq.c#__do_softirq+0x7f/0x8a
> (XEN)    [<ffff82d0801312e4>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d080167582>] domain.c#idle_loop+0x55/0x62
> (XEN)
> (XEN) Pagetable walk from 0000000000000008:
> (XEN)  L4[0x000] = 000000027d7cd063 ffffffffffffffff
> (XEN)  L3[0x000] = 000000027d7cc063 ffffffffffffffff
> (XEN)  L2[0x000] = 000000027d7cb063 ffffffffffffffff
> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 2:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000008
> (XEN) ****************************************
> (XEN)
>
> We have followed the ->to() hook on a domain with a NULL ctxt_switch
> pointer (confirmed by the disassembly).  n is derived from current,
> which is d1v0, but that would mean we are trying to schedule a vcpu
> before its domain structure has been fully constructed.
>
> The problem is with hvm_domain_initialise()
>
> int hvm_domain_initialise(struct domain *d)
> {
>     ...
>     if ( is_pvh_domain(d) )
>     {
>         register_portio_handler(d, 0, 0x10003, handle_pvh_io);
>         return 0;
>     }
>     ...
>     rc = hvm_funcs.domain_initialise(d);
>     ...
> }
>
> So PVH domains exit hvm_domain_initialise() earlier than when we call
> the vendor-specific initialisation hooks.
>
> Rather than fixing this specific issue, can I suggest we properly kill
> PVH v1 at this point?  Given what else it skips in
> hvm_domain_initialise(), it clearly hasn't functioned properly in the past.

P.S. Ian: Why did this failure not block at the push gate?

It is a completely repeatable host crash, yet master has been pulled up
to match staging.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.