[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 106504: regressions - FAIL



On Tue, Mar 07, 2017 at 07:11:22AM -0700, Jan Beulich wrote:
>>>> On 07.03.17 at 05:24, <chao.gao@xxxxxxxxx> wrote:
>> On Tue, Mar 07, 2017 at 02:16:50AM -0700, Jan Beulich wrote:
>>>>>> On 07.03.17 at 06:52, <osstest-admin@xxxxxxxxxxxxxx> wrote:
>>>> flight 106504 xen-unstable real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/106504/ 
>>>> 
>>>> Regressions :-(
>>>> 
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>>  [...]
>>>>  test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 16 guest-stop fail REGR. 
>>>> vs. 
>>>> 106482
>>>
>>>Here we go:
>>>
>>>(XEN) d15v0: intack: 02:48 pt: 38
>>>(XEN) vIRR: 00000000 00000000 00000000 00000000 00000000 00000000 00010000 
>> 00000000
>>>(XEN)  PIR: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
>> 00000000
>>>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>>>(XEN) ----[ Xen-4.9-unstable  x86_64  debug=y   Not tainted ]----
>>>(XEN) CPU:    0
>>>(XEN) RIP:    e008:[<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>>>(XEN) RFLAGS: 0000000000010292   CONTEXT: hypervisor (d15v0)
>>>(XEN) rax: ffff82d0804754a8   rbx: ffff83007f375680   rcx: 0000000000000000
>>>(XEN) rdx: ffff83007cd3ffff   rsi: 000000000000000a   rdi: ffff82d0803316d8
>>>(XEN) rbp: ffff83007cd3ff08   rsp: ffff83007cd3fea8   r8:  ffff830277db8000
>>>(XEN) r9:  0000000000000001   r10: 0000000000000000   r11: 0000000000000001
>>>(XEN) r12: 00000000ffffffff   r13: ffff82d0802b5b02   r14: ffff82d0802b5b02
>>>(XEN) r15: ffff83027d82e000   cr0: 0000000080050033   cr4: 00000000001526e0
>>>(XEN) cr3: 0000000259135000   cr2: 000000000164f034
>>>(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>(XEN) Xen code around <ffff82d0802039e8> (vmx_intr_assist+0x5fa/0x61a):
>>>(XEN)  fb ff ff e9 49 fc ff ff <0f> 0b 89 ce 48 89 df e8 2a 21 00 00 e9 49 
>>>fe 
>> ff
>>>(XEN) Xen stack trace from rsp=ffff83007cd3fea8:
>>>(XEN)    ffff82d08044ab00 00000038ffffffff ffff83007cd3ffff ffff83027d82e000
>>>(XEN)    ffff83007cd3fef8 ffff82d080133a3d ffff83007f375000 ffff83007f375000
>>>(XEN)    ffff83007f7fc000 ffff83026df78000 0000000000000000 ffff83027d82e000
>>>(XEN)    ffff83007cd3fdb0 ffff82d080213191 0000000000000004 00000000000000c2
>>>(XEN)    0000000000000020 0000000000000002 ffff880029994000 ffffffff81ade0a0
>>>(XEN)    0000000000000246 0000000000000000 ffff88002d000008 0000000000000004
>>>(XEN)    000000000000006c 0000000000000000 00000000000003f8 00000000000003f8
>>>(XEN)    ffffffff81ade0a0 0000beef0000beef ffffffff81389ac4 000000bf0000beef
>>>(XEN)    0000000000000002 ffff88002f403e08 000000000000beef 000000000000beef
>>>(XEN)    000000000000beef 000000000000beef 000000000000beef 0000000000000000
>>>(XEN)    ffff83007f375000 0000000000000000 00000000001526e0
>>>(XEN) Xen call trace:
>>>(XEN)    [<ffff82d0802039e8>] vmx_intr_assist+0x5fa/0x61a
>>>(XEN)    [<ffff82d080213191>] vmx_asm_vmexit_handler+0x41/0x120
>>>(XEN) 
>>>(XEN) 
>>>(XEN) ****************************************
>>>(XEN) Panic on CPU 0:
>>>(XEN) Assertion 'intack.vector >= pt_vector' failed at intr.c:360
>>>(XEN) ****************************************
>>>
>>>I didn't make an attempt at interpreting this yet, but I wonder if it
>>>is more than coincidence that - just like the first time the ASSERT()
>>>triggered - this is again a guest-stop of a qemuu-debianhvm.
>>>
>> 
>> Cc: xuquan.
>> 
>> Exciting! I have been monitoring osstest for about one months through
>> a python script. But I always crawl the flights one time a day.
>> 
>> From the output, the pt_vector is 0x38 and the intack.vector is
>> 0x30. these two values are same with they were in the first time.
>> And only one bit 0x30 is set in vIRR. PIR is NULL. So maybe
>> our suspicion that PIR is not synced to vIRR is wrong. The 0x38 bit
>> is not present in vIRR is strange. Is it possible that we clear the 0x38 bit
>> just after we return from pt_update_irq()?
>
>That would be done how?
>
>> Or, just like I suspected that
>> it is caused by pt_update_irq() sets 0x30 but wrongly returns 0x38.
>
>Same here, and as expressed earlier: I'm lacking a plausible theory
>on how this could be happening. In particular ...
>
>> Do you think it worths a try to disable guest's LAPIC timer and
>> force it use IRQ0 along with changing RTE very frequently?
>
>... if this is the LAPIC timer, then the RTE isn't being read afaics
>(pt_irq_vector() should be taking its very first return path in that
>case). Nor am I aware that any Linux version would move around
>one of its timer interrupts very frequently. But then again 0x30
>or 0x38 wouldn't be use for the LAPIC timer anyway, but rather
>a vector in the fixed range (0xEF on 4.10). So I think part of the
>problem is to understand which timer's vector we're dealing with
>here.
>

I have written a xtf test case (many codes are from hvmloader) to
trigger this assertion. The test case is in attachments. Bottom is the output
of this test. This test initializes PIT channel0 to generate periodic timer
interrupt at 1000hz per second. The timer interrupt is delivered to vCPU0. And
vCPU1 is used to change IOAPIC RTE 2 frequently.

The assertion can be triggered by guest. To fix assertion failure,
I propose to remove this assertion for the reason below:
1. Operations in this test case are very intrusive and abnormal. It updates 
RTE frequently without disabling interrupt source. In this case, I think 
software can't assume hardware works correctly.

2. If we remove this assertion(means we admit pt_vector may be different
from (or bigger than) the vector we set in vIRR in a rare case), the side
effect is that we won't decrease the counter pt->ending_intr_nr in
pt_intr_post() and one more timer interrupt in number is injected to guest. 

3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens when
pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest changes
the vector in RTE during the window, it will also incur losing or getting
more periodic timer interrupt.

(d1) [ 1409.741660] --- Xen Test Framework ---
(d1) [ 1409.741869] Environment: HVM 32bit (No paging)
(d1) [ 1409.741964] Test periodic-timer
(d1) [ 1409.742077] activate cpu1
(XEN) [ 1423.581228] d1v0: intack: 02:48 pt: 38
(XEN) [ 1423.581234] vIRR: 00000000 00000000 00000000 00000000 00000000 
00000000 00010000 00000000
(XEN) [ 1423.581246]  PIR: 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000
(XEN) [ 1423.581286] Assertion 'intack.vector >= pt_vector' failed at intr.c:360
(XEN) [ 1423.581294] ----[ Xen-4.9-unstable  x86_64  debug=y   Not tainted ]----
(XEN) [ 1423.581370] CPU:    58
(XEN) [ 1423.581375] RIP:    e008:[<ffff82d0801fe405>] 
vmx_intr_assist+0x605/0x625
(XEN) [ 1423.581389] RFLAGS: 0000000000010296   CONTEXT: hypervisor (d1v0)
(XEN) [ 1423.581398] rax: ffff830837e0402c   rbx: ffff83006a093680   rcx: 
0000000000000000
(XEN) [ 1423.581404] rdx: ffff831075e17fff   rsi: 000000000000000a   rdi: 
ffff82d08032f6b8
(XEN) [ 1423.581410] rbp: ffff831075e17f08   rsp: ffff831075e17e98   r8:  
ffff83083e000000
(XEN) [ 1423.581416] r9:  0000000000000001   r10: 0000000000000000   r11: 
0000000000000001
(XEN) [ 1423.581422] r12: 00000000ffffffff   r13: ffff82d0802a4c31   r14: 
ffff82c000408200
(XEN) [ 1423.581427] r15: 0000000000004016   cr0: 000000008005003b   cr4: 
00000000003526e0
(XEN) [ 1423.581432] cr3: 000000081e2bf000   cr2: 0000000000000000
(XEN) [ 1423.581437] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: 
e008
(XEN) [ 1423.581446] Xen code around <ffff82d0801fe405> 
(vmx_intr_assist+0x605/0x625):
(XEN) [ 1423.581450]  fb ff ff e9 5e fc ff ff <0f> 0b 89 ce 48 89 df e8 03 21 
00 00 e9 62 fe ff
(XEN) [ 1423.581470] Xen stack trace from rsp=ffff831075e17e98:
(XEN) [ 1423.581473]    ffff831075e17f08 ffff82d08034c700 ffff82d000000030 
ffffffffffffffff
(XEN) [ 1423.581483]    ffff831075e17fff 0000000000000000 ffff831075e17ef8 
ffff82d0801340ff
(XEN) [ 1423.581491]    ffff83006a093000 ffff83006a093000 ffff83006a093000 
ffff830837e04148
(XEN) [ 1423.581500]    0000014b740caab6 0000000001c9c380 ffff831075e17e28 
ffff82d08020da51
(XEN) [ 1423.581509]    0000000000000000 0000000000000000 0000000000000000 
0000000000000000
(XEN) [ 1423.581515]    0000000000000000 00000000fee00000 0000000000000000 
0000000000000000
(XEN) [ 1423.581522]    0000000000000000 0000000000000000 0000000000000004 
000000000010260d
(XEN) [ 1423.581529]    0000000000000001 0000000000000000 0000000000000000 
0000beef0000beef
(XEN) [ 1423.581536]    0000000000102928 000000bf0000beef 0000000000000206 
0000000000115fa0
(XEN) [ 1423.581544]    000000000000beef 000000000000beef 000000000000beef 
000000000000beef
(XEN) [ 1423.581551]    000000000000beef 000000000000003a ffff83006a093000 
00000037b7a91900
(XEN) [ 1423.581559]    00000000003526e0
(XEN) [ 1423.581564] Xen call trace:
(XEN) [ 1423.581570]    [<ffff82d0801fe405>] vmx_intr_assist+0x605/0x625
(XEN) [ 1423.581580]    [<ffff82d08020da51>] vmx_asm_vmexit_handler+0x41/0x120
(XEN) [ 1423.581584] 
(XEN) [ 1423.827761] 
(XEN) [ 1423.829753] ****************************************
(XEN) [ 1423.835210] Panic on CPU 58:
(XEN) [ 1423.838591] Assertion 'intack.vector >= pt_vector' failed at intr.c:360
(XEN) [ 1423.845698] ****************************************

Attachment: periodic-timer-test
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.