WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject: Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
From: Marek Marczykowski <marmarek@xxxxxxxxxxxx>
Date: Tue, 30 Aug 2011 19:18:15 +0200
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Rafal Wojtczuk <rafal@xxxxxxxxxxxxxxxxxxxxxx>, Joanna Rutkowska <joanna@xxxxxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 30 Aug 2011 10:20:12 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20110829205938.GB18697@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <4E5A3F0A.8060700@xxxxxxxxxxxx> <20110829200749.GA17265@xxxxxxxxxxxx> <4E5BF4C3.2050108@xxxxxxxxxxxx> <20110829205938.GB18697@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110621 Fedora/3.1.11-1.fc14 Lightning/1.0b3pre Thunderbird/3.1.11
On 29.08.2011 22:59, Konrad Rzeszutek Wilk wrote:
> Ok, but I am still unsure where it is hanging in DomU. Can you run with
> 'console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen' to get an idea
> of what is stuck in the guest? 

With "initcall_debug" parameter problem does not appear (at least for
200 domU starts)... It looks like race condition which doesn't happens
on slowed down kernel (by printing lots of debug info). This also
explains why this bug appears only on fast hardware.

> You might also have better luck using
> 'xenctx' to get a stack trace of what is hangning in the guest.
> (you will need the System.map file from the guest's kernel.. but that should
> be fairly easy to extract).

xenctx didn't provide any useful data :/ It always shows following trace
for hanged domU:
-----------------
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001246 i z p
rsp: ffffffff81801ee0
rax: 0000000000000000   rcx: ffffffff810013aa   rdx: 0000000000000000
rbx: ffffffff81800010   rsi: 00000000deadbeef   rdi: 00000000deadbeef
rbp: ffffffff81801ef8    r8: 0000000000000000    r9: 0000000000000000
r10: 0000000000000000   r11: 0000000000000246   r12: 0000000000000000
r13: 0000000000000000   r14: ffffffffffffffff   r15: 0000000000000000
 cs: e033        ss: e02b        ds: 0000        es: 0000
 fs: 0000 @ 0000000000000000
 gs: 0000 @ ffff880018ee7000/0000000000000000
Code (instr addr ffffffff810013aa)
cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b
59 c3 cc cc cc cc cc cc cc


Stack:
 0000000000000000 0000000000000000 ffffffff810072a0 ffffffff81801f18
 ffffffff81012528 ffffffff81800010 ffffffff8185a2a0 ffffffff81801f38
 ffffffff81009faf 0000000000000000 6db6db6db6db6db7 ffffffff81801f48
 ffffffff813fb388 ffffffff81801f88 ffffffff81875c79 ffffffff81801f88

Call Trace:
  [<ffffffff810013aa>] hypercall_page+0x3aa  <--
  [<ffffffff810072a0>] xen_safe_halt+0x10
  [<ffffffff81012528>] default_idle+0x58
  [<ffffffff81009faf>] cpu_idle+0x5f
  [<ffffffff813fb388>] rest_init+0x68
  [<ffffffff81875c79>] start_kernel+0x36f
  [<ffffffff81875346>] x86_64_start_reservations+0x131
  [<ffffffff81878245>] xen_start_kernel+0x5f1
------------------

I've collected few more messages from successful and failed domU starts.
The only difference is the place where "Switched to NOHz mode on CPU #0"
appears and existence of "CE: xen increased min_delta_ns to ..." and
"CE: Reprogramming failure. Giving up" messages.

I think it can be related to:
http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00649.html
(this was on HVM not PV, but looks similar)

I've tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU config,
but it doesn't help. Also pinning vcpu doesn't help (this domUs have
only 1 vcpu). Is 'xenpm set-max-cstate 0' the same as booting xen with
max_cstate=0?

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl

Attachment: xenctx-out
Description: Text document

Attachment: xenctx-out2
Description: Text document

Attachment: fwvm-fail1
Description: Text document

Attachment: fwvm-fail2
Description: Text document

Attachment: netvm-fail1
Description: Text document

Attachment: netvm-ok
Description: Text document

Attachment: netvm-ok2
Description: Text document

Attachment: netvm-ok3
Description: Text document

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel