[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]



Hi Dario,

On 09/26/2017 08:33 AM, Dario Faggioli wrote:
> On Mon, 2017-09-25 at 17:23 +0100, Julien Grall wrote:
>> On 09/25/2017 03:07 PM, Dario Faggioli wrote:
>>> I don't see much in the logs, TBH, but both `xl vcpu-list' and the
>>> 'r'
>>> debug key seem to suggest that vCPU 0 is running, while the other
>>> vCPUs
>>> have never run... like it was an issue with secondary (v)CPU
>>> bringup.
>>>
>>> It indeed shows up with Credit2, as it were _specific_ to it, but
>>> I'm
>>> not 100% sure. In fact, it indeed seems to never show up here:
>>> http://logs.test-lab.xenproject.org/osstest/results/history/test-ar
>>> mhf-
>>> armhf-xl/xen-unstable
>>>
>> Most of the time guest-start/debian.repeat fails, vCPU 0 is in
>> data/prefetch abort state. My guess is a latent cache bug that
>> credit2
>> appears to expose.
>>
> So, forgive my ARM ignorance, but how do you tell that the vCPU(s)
> is(are) in that particular state?

I was looking at the guest state dumped:

Sep 24 15:10:43.275221 (XEN) *** Dumping CPU1 guest state (d3v0): ***

Sep 24 15:10:43.279352 (XEN) ----[ Xen-4.10-unstable  arm32  debug=y   Not 
tainted ]----

Sep 24 15:10:43.285242 (XEN) CPU:    1

Sep 24 15:10:43.286597 (XEN) PC:     0000000c

Sep 24 15:10:43.288743 (XEN) CPSR:   800001d7 MODE:32-bit Guest ABT

Sep 24 15:10:43.292741 (XEN)      R0: 00400000 R1: ffffffff R2: 48c24000 R3: 
80000000

Sep 24 15:10:43.298241 (XEN)      R4: 410aa758 R5: 410aacf8 R6: 00000080 R7: 
c2c2c2c2

Sep 24 15:10:43.303850 (XEN)      R8: 40000000 R9: 410fc074 R10:40b7923c 
R11:10101105 R12:ffffffff

Sep 24 15:10:43.310457 (XEN) USR: SP: 00000000 LR: 00000000

Sep 24 15:10:43.313714 (XEN) SVC: SP: 4199fb70 LR: 40208060 SPSR:400001d3

Sep 24 15:10:43.318334 (XEN) ABT: SP: 00000000 LR: 0000000c SPSR:800001d7

Sep 24 15:10:43.322863 (XEN) UND: SP: 00000000 LR: 00000000 SPSR:00000000

Sep 24 15:10:43.327361 (XEN) IRQ: SP: 00000000 LR: 00000000 SPSR:00000000

Sep 24 15:10:43.331855 (XEN) FIQ: SP: 00000000 LR: c1318ae4 SPSR:00000000

Sep 24 15:10:43.336349 (XEN) FIQ: R8: 00000000 R9: 00000000 R10:00000000 
R11:00000000 R12:00000000


"MODE:..." is the current mode of the vCPU. In that case ABT means it receive 
an abort (e.g data/prefetch abort).

There are other mode such as:
        - USR : User mode
        - SVC : Kernel mode

> 
> I'm asking because I now wonder whether this same issue could also be
> the cause of these other failures, which we see from time to time:
> 
>    flight 113816 xen-unstable real [real]
>    http://logs.test-lab.xenproject.org/osstest/logs/113816/
> 
>    [...]
> 
>    Tests which did not succeed, but are not blocking:
>     test-armhf-armhf-xl-rtds   16 guest-start/debian.repeat fail blocked in 
> 113387
> 
> Here's the logs:
> http://logs.test-lab.xenproject.org/osstest/logs/113816/test-armhf-armhf-xl-rtds/info.html

It does not seem to be similar, in the credit2 case the kernel is stuck at very 
early boot.
Here it seems it is running (there are grants setup).

This seem to be confirmed from the guest console log, I can see the prompt. 
Interestingly
when the guest job fails, it has been waiting for a long time disk and hvc0. 
Although, it
does not timeout.

I am actually quite surprised that we start a 4 vCPUs guest on a 2 pCPUs 
platform. The total of
vCPUs is 6 (2 DOM0 + 4 DOMU). The processors in are not the greatest for 
testing. So I was
wondering if we end up to have too many vCPUs running on the platform and 
making it unreliable
the test?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.