[Xen-devel] RE: Question about Xen S3 and resume code - Linux do

To:	Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
Subject:	[Xen-devel] RE: Question about Xen S3 and resume code - Linux dom0 never exits the xen_safe_halt hypercall after resume
From:	"Tian, Kevin" <kevin.tian@xxxxxxxxx>
Date:	Tue, 21 Jun 2011 07:22:02 +0800
Accept-language:	en-US
Acceptlanguage:	en-US
Cc:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Yu, Ke" <ke.yu@xxxxxxxxx>
Delivery-date:	Mon, 20 Jun 2011 16:22:58 -0700
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<20110620123626.GA2973@xxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<20110616225739.GA8714@xxxxxxxxxxxx> <625BA99ED14B2D499DC4E29D8138F1505D2C2DD530@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20110620123626.GA2973@xxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index:	AcwvRrPeZu/fR35lSwiQr2MjXXEXmAAVicyw
Thread-topic:	Question about Xen S3 and resume code - Linux dom0 never exits the xen_safe_halt hypercall after resume

> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx]
> Sent: Monday, June 20, 2011 8:36 PM
> 
> > ideally ACPI S3/S5 has nothing to do with ACPI processor driver which is for
> Cx/Px.
> 
> Right..
> >
> > >
> > > (which is in the devel/acpi-s3.v0 branch).
> > >
> > > the hypervisor, after an S3 resume sits forever in the default_idle. The
> > > Linux dom0 is stuck looping (I think) around SCHEDOP_block hypercall.
> > >
> > > http://darnok.org/xen/devel.acpi-s3.v1.serial.log
> > >
> > > If that patch above is present and I've cpufreq=xen on the Xen
> > > hypervisor then Linux kernel gets unstuck and returns to userspace:
> > >
> > > http://darnok.org/xen/devel.acpi-s3.v0.serial.log
> >
> > Compare your logs, the major difference is:
> >
> > [  168.754739] calling  i2c-8+ @ 3096
> > [  168.758200] call i2c-8+ returned 0 after 0 usecs
> > <<< 1st case stuck here
> > [  168.762882] calling  card0-VGA-1+ @ 3096
> > [  168.766867] call card0-VGA-1+ returned 0 after 0 usecs
> > [  168.772085] calling  ttm+ @ 3096
> > [  168.775360] call ttm+ returned 0 after 0 usecs
> > [  168.779870] PM: resume of devices complete after 13117.603 msecs
> > [  168.786006] PM: Finishing wakeup.
> > <<<2nd case forward progress
> >
> > It looks that VGA card resume has some problem on resume, which then
> 
> In both cases - with the patch and without..

that's expected since device suspend is always invoked in the S3 path.

> 
> > makes dom0 stay in idle loop and thus block hypercall, and then due to
> > no runnable vcpu so Xen most time in idle_loop too. In earlier log there're
> > some stack trace in i915 driver. Perhaps you can try a different machine
> 
> Or remove the i915 just to eliminate that.

So any result there? :-)

> > or try native S3 on same box to make sure it's not mixed with native issues.
> >
> > >
> > > (however, if I set cpuidle=0 cpufreq=none on the hypervisor line and
> > > have the 9f301b0a0081676dfc71b7f0898295e6bcba391a patch it still
> > > gets stuck).
> > >
> > > I figured that the primary reason the guest is allowed to
> > > exit is SCHEDOP_block loop is b/c the pm_idle call is set to the
> > > acp_processor_idle which does "something" extra after the machine comes
> > > out of a S3 suspend.
> >
> > If that's the case I think you should disable CONFIG_ACPI_PROCESSOR in
> dom0
> > before incorporating Xen specific version (the patch you tried). We don't 
> > want
> > dom0 to play with Cx directly b/c it's the responsibility of Xen.
> 
> Huh? You misunderstood me. The 'acpi_processor_idle' is the hypervisor's
> idle loop. It can be running inside of that one, or the 'default_idle' loop. 
> Hence

running inside which one? I'd think only default_idle invokes it when current 
cpu
is actually idle.

> my question why would that specific hypervisor idle loop make dom0 run nicely
> while the default one would not.

this is counterintuitive to me honestly speaking. I'd more think that 
acpi_processor_idle may cause some issue than pure "sti;hlt" because acpi
version has more logic to handle. In earlier day when it's still in 
stabilization
phase, we did observe some non-exit case from deep Cstate but this never
happens on pure hlt.

IOW, I don't take this idle path as a necessary step to make S3 resume working,
which is simply related when the cpu has nothing to do... 

> 
> In dom0, irregardless of the patches, the 'default_idle' is run which makes 
> the
> xen_safe_halt paravirt call.

OK, that matches my expectation then.

> 
> >
> > Of course we still need figure out why same issues occur with cpuidle=0/
> > cpufreq=none, which however can be revisited after the basic S3 works. :-)
> 
> Right. The end result of those parameters is that the 'default_idle' in the
> hypervisor is choosen instead of the 'acpi_processor_idle' one.
> >
> > >
> > > Any ideas?
> >
> > No other ideas for now. From historical view Xen S3 was supported before
> 
> Hmm, I am actually tempted to start commenting out code in the
> acpi_processor_idle
> and seeing what will cause it to have the same failure as 'default_idle'.

you can also try "max_cstates=1" to see any difference, which is expected to
has similar effect as safe_halt().

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

[Xen-devel] RE: Question about Xen S3 and resume code - Linux dom0 never