[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 106504: regressions - FAIL



>>> On 07.03.17 at 05:24, <chao.gao@xxxxxxxxx> wrote:
> On Tue, Mar 07, 2017 at 02:16:50AM -0700, Jan Beulich wrote:
>>>>> On 07.03.17 at 06:52, <osstest-admin@xxxxxxxxxxxxxx> wrote:
>>> flight 106504 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/106504/ 
>>> 
>>> Regressions :-(
>>> 
>>> Tests which did not succeed and are blocking,
>>> including tests which could not be run:
>>>  [...]
>>>  test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 16 guest-stop fail REGR. vs. 
>>> 106482
>>
>>Here we go:
>>
>>(XEN) d15v0: intack: 02:48 pt: 38
>>(XEN) vIRR: 00000000 00000000 00000000 00000000 00000000 00000000 00010000 
> 00000000
>>(XEN)  PIR: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
> 00000000
>>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>>(XEN) ----[ Xen-4.9-unstable  x86_64  debug=y   Not tainted ]----
>>(XEN) CPU:    0
>>(XEN) RIP:    e008:[<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>>(XEN) RFLAGS: 0000000000010292   CONTEXT: hypervisor (d15v0)
>>(XEN) rax: ffff82d0804754a8   rbx: ffff83007f375680   rcx: 0000000000000000
>>(XEN) rdx: ffff83007cd3ffff   rsi: 000000000000000a   rdi: ffff82d0803316d8
>>(XEN) rbp: ffff83007cd3ff08   rsp: ffff83007cd3fea8   r8:  ffff830277db8000
>>(XEN) r9:  0000000000000001   r10: 0000000000000000   r11: 0000000000000001
>>(XEN) r12: 00000000ffffffff   r13: ffff82d0802b5b02   r14: ffff82d0802b5b02
>>(XEN) r15: ffff83027d82e000   cr0: 0000000080050033   cr4: 00000000001526e0
>>(XEN) cr3: 0000000259135000   cr2: 000000000164f034
>>(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>(XEN) Xen code around <ffff82d0802039e8> (vmx_intr_assist+0x5fa/0x61a):
>>(XEN)  fb ff ff e9 49 fc ff ff <0f> 0b 89 ce 48 89 df e8 2a 21 00 00 e9 49 fe 
> ff
>>(XEN) Xen stack trace from rsp=ffff83007cd3fea8:
>>(XEN)    ffff82d08044ab00 00000038ffffffff ffff83007cd3ffff ffff83027d82e000
>>(XEN)    ffff83007cd3fef8 ffff82d080133a3d ffff83007f375000 ffff83007f375000
>>(XEN)    ffff83007f7fc000 ffff83026df78000 0000000000000000 ffff83027d82e000
>>(XEN)    ffff83007cd3fdb0 ffff82d080213191 0000000000000004 00000000000000c2
>>(XEN)    0000000000000020 0000000000000002 ffff880029994000 ffffffff81ade0a0
>>(XEN)    0000000000000246 0000000000000000 ffff88002d000008 0000000000000004
>>(XEN)    000000000000006c 0000000000000000 00000000000003f8 00000000000003f8
>>(XEN)    ffffffff81ade0a0 0000beef0000beef ffffffff81389ac4 000000bf0000beef
>>(XEN)    0000000000000002 ffff88002f403e08 000000000000beef 000000000000beef
>>(XEN)    000000000000beef 000000000000beef 000000000000beef 0000000000000000
>>(XEN)    ffff83007f375000 0000000000000000 00000000001526e0
>>(XEN) Xen call trace:
>>(XEN)    [<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>>(XEN)    [<ffff82d080213191>] vmx_asm_vmexit_handler+0x41/0x120
>>(XEN) 
>>(XEN) 
>>(XEN) ****************************************
>>(XEN) Panic on CPU 0:
>>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>>(XEN) ****************************************
>>
>>I didn't make an attempt at interpreting this yet, but I wonder if it
>>is more than coincidence that - just like the first time the ASSERT()
>>triggered - this is again a guest-stop of a qemuu-debianhvm.
>>
> 
> Cc: xuquan.
> 
> Exciting! I have been monitoring osstest for about one months through
> a python script. But I always crawl the flights one time a day.
> 
> From the output, the pt_vector is 0x38 and the intack.vector is
> 0x30. these two values are same with they were in the first time.
> And only one bit 0x30 is set in vIRR. PIR is NULL. So maybe
> our suspicion that PIR is not synced to vIRR is wrong. The 0x38 bit
> is not present in vIRR is strange. Is it possible that we clear the 0x38 bit
> just after we return from pt_update_irq()?

That would be done how?

> Or, just like I suspected that
> it is caused by pt_update_irq() sets 0x30 but wrongly returns 0x38.

Same here, and as expressed earlier: I'm lacking a plausible theory
on how this could be happening. In particular ...

> Do you think it worths a try to disable guest's LAPIC timer and
> force it use IRQ0 along with changing RTE very frequently?

... if this is the LAPIC timer, then the RTE isn't being read afaics
(pt_irq_vector() should be taking its very first return path in that
case). Nor am I aware that any Linux version would move around
one of its timer interrupts very frequently. But then again 0x30
or 0x38 wouldn't be use for the LAPIC timer anyway, but rather
a vector in the fixed range (0xEF on 4.10). So I think part of the
problem is to understand which timer's vector we're dealing with
here.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.