[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped



On Mon, Aug 29, 2011 at 04:59:38PM -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Aug 29, 2011 at 10:21:23PM +0200, Marek Marczykowski wrote:
> > On 29.08.2011 22:07, Konrad Rzeszutek Wilk wrote:
> > > On Sun, Aug 28, 2011 at 03:13:46PM +0200, Marek Marczykowski wrote:
> > >> Hey,
> > >>
> > >> I'm experiencing strange problem: non-deterministic PV domain hang, only
> > >> on some machines (with fast SSD drive). I've tried xen-4.1.0 and
> > >> xen-4.1.1 with many kernels different kernels:
> > >> VM:
> > >>  - 2.6.38.3 xenlinux based on SUSE package
> > >>  - vanilla 3.0.3
> > >>  - vanilla 3.1 rc2
> > >> dom0:
> > >>  - 2.6.38.3 xenlinux based on SUSE package
> > >>  - vanilla 3.1 rc2
> > >>
> > >> Result always the same: sometimes VM hang at startup, SysRq-T shows
> > >> modprobe waiting in "wait_for_devices" (concretely schedule_timeout) and
> > >> jiffies counter not increasing between task-states dumps.
> > >>
> > >> The only found thing (probably) connected with this problem are domU
> > >> kernel messages:
> > >> CE: xen increased min_delta_ns to 150000 nsec
> > >> (...)
> > >> CE: xen increased min_delta_ns to 4000000 nsec
> > >> CE: Reprogramming failure. Giving up
> > >>
> > >> This messages doesn't exists in successful boot.
> > >>
> > >> I've also tried some options to xen and domU kernel, but without success
> > >> (all combinations):
> > > 
> > > BTW, your 'xencons=..' and 'swiotlb=force' are obsolete. Use
> > > 'console=hvc0' and 'iommu=soft'. The 'swiotlb=force' kills performance.
> > > 
> > >> xen: tsc=unstable, cpufreq=none
> > >> domU: nohz=off, clocksource=tsc
> > >>
> > >> Some combination of above options lowered frequency of problem (ex
> > >> tsc=unstable + nohz=off), but it happens quite often - like 1 of 15
> > >> boots fails.
> > >>
> > >> Have you idea what is the cause and what can help?
> > > 
> > > The problem looks to be xenwatch stuck. So the problem is in Dom0 right?
> > 
> > This "R" state of xenwatch looks like result of SysRq, which dumps data...
> > 
> > [  118.679707]  [<ffffffff812a8081>] handle_sysrq+0x21/0x30
> > [  118.679707]  [<ffffffff8128db49>] sysrq_handler+0xb9/0xe0
> > [  118.679707]  [<ffffffff8128ff50>] xenwatch_thread+0xb0/0x170
> > 
> > And the problem is at DomU boot, Dom0 works without any problems.
> 
> Ok, but I am still unsure where it is hanging in DomU. Can you run with
> 'console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen' to get an idea
> of what is stuck in the guest? You might also have better luck using
> 'xenctx' to get a stack trace of what is hangning in the guest.
> (you will need the System.map file from the guest's kernel.. but that should
> be fairly easy to extract).
> 

xenctx usage:
http://wiki.xen.org/xenwiki/XenCommonProblems#head-61843b32f0243b5ad0e17850f9493bffd80f8c17

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.