[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 106504: regressions - FAIL



>>> On 22.03.17 at 05:53, <chao.gao@xxxxxxxxx> wrote:
> I have written a xtf test case (many codes are from hvmloader) to
> trigger this assertion. The test case is in attachments.

Thanks for doing this.

> Bottom is the output
> of this test. This test initializes PIT channel0 to generate periodic timer
> interrupt at 1000hz per second. The timer interrupt is delivered to vCPU0. And
> vCPU1 is used to change IOAPIC RTE 2 frequently.

Well, this is certainly helpful (due to some of the conclusions you
draw below), but it is very likely not what has caused the assertion
to trigger in osstest. So by removing the assertion (as you suggest
below) we then will have a silent, non-understood misbehavior.

> The assertion can be triggered by guest. To fix assertion failure,
> I propose to remove this assertion for the reason below:

Of course I agree that a guest triggerable assertion is bad, and
hence needs a correction somewhere.

> 1. Operations in this test case are very intrusive and abnormal. It updates 
> RTE frequently without disabling interrupt source. In this case, I think 
> software can't assume hardware works correctly.

I guess hardware behavior simply is unspecified in such a case, so
it's hard to judge whether it works "correctly".

> 2. If we remove this assertion(means we admit pt_vector may be different
> from (or bigger than) the vector we set in vIRR in a rare case), the side
> effect is that we won't decrease the counter pt->ending_intr_nr in
> pt_intr_post() and one more timer interrupt in number is injected to guest. 

Which is clearly wrong, afaict, as that may drive the guest clock
off (depending on how the guest OS does its accounting).

> 3. We read RTE 3 times. 1st happens when we set vIRR. 2nd happens when
> pt_update_irq() returns. 3rd happens in pt_intr_post(). If guest changes
> the vector in RTE during the window, it will also incur losing or getting
> more periodic timer interrupt.

Which raises the question whether latching the value read the first
time would address the issue you demonstrate with the test case.
Or alternatively deferring writes to take effect only once readers
are done with their perhaps multiple accesses?

Can you get in touch with your chipset folks to find out whether
hardware has cases where multiple reads occur during the
processing of a single event?

> (d1) [ 1409.741660] --- Xen Test Framework ---
> (d1) [ 1409.741869] Environment: HVM 32bit (No paging)
> (d1) [ 1409.741964] Test periodic-timer
> (d1) [ 1409.742077] activate cpu1
> (XEN) [ 1423.581228] d1v0: intack: 02:48 pt: 38

I keep getting confused by my own mistake of getting the format
string wrong here (the above should be intack: 2:30 pt: 38). I.e.
I was about to complain that there's no use vector 48 in your
test code, when I remembered that it's being wrongly printed in
decimal.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.