[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RE: latest xen-unstable fails to boot on Dell D630 (likely hpet/Cstate problem)



OK, thanks for looking for the problem.  Since you
can't reproduce it, it is likely a problem specific
to the Dell D630 motherboard or BIOS or HPET or something
like that.

Since I have a workaround (max_cstate=2), I will just
continue to use the workaround.

I don't have a dock but if I get the serial port
working at some point, I will try to reproduce the
problem again.

Also, when Ke's work on RTC emulation is completed,
let me know and I will give that a try.

Thanks,
Dan

> -----Original Message-----
> From: Zhang, Xiantao [mailto:xiantao.zhang@xxxxxxxxx]
> Sent: Sunday, December 13, 2009 1:26 AM
> To: Dan Magenheimer; Yu, Ke; Xen-Devel (E-mail)
> Subject: RE: latest xen-unstable fails to boot on Dell D630 (likely
> hpet/Cstate problem)
> 
> 
> Hi, Dan 
>     We still can't reproduce this failure locally, even with 
> Merom laptop. Do you have the dock with your Dell 630, and I 
> think the dock should have the serial port support, and maybe 
> you can get the failure log through it.  If we can get the 
> failure log, it should be helpful to identify this issue.  
> Also I have analyzed the Cset #20072 and Cst20073, and have 
> no any clue which can lead to this issue.  In addition, I 
> also talked with Ke, he said he could reproduce another issue 
> related to hwclock, but for this issue, he also can't catch 
> it in any platforms.  :(
> Thanks!
> Xiantao
> 
> -----Original Message-----
> From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx] 
> Sent: Wednesday, December 09, 2009 11:50 PM
> To: Zhang, Xiantao; Yu, Ke; Xen-Devel (E-mail)
> Subject: RE: latest xen-unstable fails to boot on Dell D630 
> (likely hpet/Cstate problem)
> 
> > Could you attach the failure log ?
> 
> I can't get any failure logs because dom0 fails to boot.
> The failure conditions are the same as described
> here:
> 
> http://lists.xensource.com/archives/html/xen-devel/2009-10/msg
> 01027.html 
> 
> However, I have attached the xm dmesg output from
> a successful boot (with max_cstate=2).
> 
> > In addition, does this system have ioapic support ?
> 
> I think so.  See attached log.
> 
> > I think hpet doesn't use MSI, right ? 
> 
> I don't think so.
> 
> Dan
> 
> > -----Original Message-----
> > From: Zhang, Xiantao [mailto:xiantao.zhang@xxxxxxxxx]
> > Sent: Tuesday, December 08, 2009 9:40 PM
> > To: Dan Magenheimer; Yu, Ke; Xen-Devel (E-mail)
> > Subject: RE: latest xen-unstable fails to boot on Dell D630 (likely
> > hpet/Cstate problem)
> > 
> > 
> > Dan Magenheimer wrote:
> > > FYI, 20073+20093+20149 boots properly and xend starts
> > > WITH max_cstate=2, but dom0 FAILs to boot unless
> > > max_cstate=2 is added as a Xen boot parameter. 
> > 
> > Could you attach the failure log ?  In addition, does this 
> > system have ioapic support ? I think hpet doesn't use MSI, right ? 
> > Xiantao
> > 
> > 
> > > So I still think something changed at 20073 that
> > > causes Merom+RHEL5dom0 to fail to boot due to not
> > > recovering from deep C-state (after dom0 runs
> > > /sbin/hwclock ... Ke Yu knows how to reproduce
> > > the problem).
> > > 
> > > Thanks,
> > > Dan
> > > 
> > >> -----Original Message-----
> > >> From: Zhang, Xiantao [mailto:xiantao.zhang@xxxxxxxxx]
> > >> Sent: Tuesday, December 08, 2009 6:44 PM
> > >> To: Dan Magenheimer; Yu, Ke; Xen-Devel (E-mail)
> > >> Subject: RE: latest xen-unstable fails to boot on Dell 
> D630 (likely
> > >> hpet/Cstate problem) 
> > >> 
> > >> 
> > >> Dan,
> > >> Don't use Cset20073 for testing separately, since it needs
> > >> two minor fixes check-ined by the Cset #20093 and #20149.
> > >> Except this, Keir also has a typo in Cset #20076 fixed by
> > >> Cset #20092. In addition, one serious issue is also
> > >> introduced in #Cset20084 which is fixed in Cset #20140.  I
> > >> remembered Pod also has issues which can crash hypervisor
> > >> before Cset #20100. Thus, it is too hard to identify this
> > >> issue through bisect before #Cset20149, since these issues
> > >> are introduced and fixed crossedly.   Certainly, if you want
> > >> to test Cset #20073, you at least have to apply the
> > >> Cset#20093 and #20149 on top of it.  :)
> > >> Xiantao
> > >> 
> > >> 
> > >> Dan Magenheimer wrote:
> > >>>> But I'll give bisecting a try.
> > >>> 
> > >>> Looks like the problem has been around for awhile.  It appears
> > >>> the problem starts at c/s 20073.  Xiantao cc'ed since 
> > 20073 was his
> > >>> patch. 
> > >>> 
> > >>> 20070 boots OK without max_cstate=2
> > >>> 
> > >>> 20072 boots most of the way without max_cstate=2 but crashes
> > >>>       before a login prompt (when xend is starting I think)
> > >>> 
> > >>> 20073 FAILS to boot without max_cstate=2 but crashes      
> >  before a
> > >>> login prompt 
> > >>> 
> > >>> 20082 FAILS to boot without max_cstate=2 but crashes
> > >>>       before a login prompt with max_cstate=2
> > >>> 
> > >>> 20143 FAILS to boot without max_cstate=2 but boots OK       with
> > >>> max_cstate=2 
> > >>> 
> > >>> Note that I have NOT bisected tools, just the hypervisor
> > >>> so the crashes are likely due to a newer xend failing on
> > >>> an older hypervisor (which is irrelevant to this problem).
> > >>> 
> > >>>> -----Original Message-----
> > >>>> From: Dan Magenheimer
> > >>>> Sent: Tuesday, December 08, 2009 10:42 AM
> > >>>> To: Yu, Ke; Xen-Devel (E-mail)
> > >>>> Subject: RE: latest xen-unstable fails to boot on Dell D630
> > >>>> (likely hpet/Cstate problem) 
> > >>>> 
> > >>>> 
> > >>>>> case, if convenient, could you help to do some bisect to see
> > >>>>> which cset cause this bug?
> > >>>> 
> > >>>> I can do this, but because it is often no longer easy to
> > >>>> bisect Xen because of interdependencies with other
> > >>>> components, I was hoping that Keir or you or someone might
> > >>>> have some idea of what changeset might have caused the 
> > regression.
> > >>>> But I'll give bisecting a try.
> > >>>> 
> > >>>>> max_cstate=2), when dom0 hangs, is xen still alive, E.g. can
> > >>>>> Xen response to three Ctrl-'A' in serial?
> > >>>> 
> > >>>> Unfortunately, I can't seem to get a Xen console working on
> > >>>> the Merom machine, and the problem can't be reproduced on
> > >>>> my other machine where the Xen console is working (because
> > >>>> Conroe doesn't support deep C).
> > >>>> 
> > >>>>> -----Original Message-----
> > >>>>> From: Yu, Ke [mailto:ke.yu@xxxxxxxxx]
> > >>>>> Sent: Tuesday, December 08, 2009 12:08 AM
> > >>>>> To: Dan Magenheimer; Xen-Devel (E-mail)
> > >>>>> Subject: RE: latest xen-unstable fails to boot on Dell D630
> > >>>>> (likely hpet/Cstate problem) 
> > >>>>> 
> > >>>>> 
> > >>>>>> -----Original Message-----
> > >>>>>> In this thread, I observed that I was unable to
> > >>>>>> provoke deep C state (C3) on my Dell D630, which has
> > >>>>>> a Intel Merom (dual-core laptop) processor.  At that
> > >>>>>> time, when I tried enabling hpetbroadcast, dom0 boot failed.
> > >>>>>> 
> > >>>>>> http://lists.xensource.com/archives/html/xen-devel/2009-10/ms
> > >>>>>> g01027.html 
> > >>>>>> 
> > >>>>>> As it turned out, all RHEL5-based (maybe RHEL4- also) dom0
> > >>>>>> default installation run /sbin/hwclock, which IIRC takes
> > >>>>>> the RTC away from Xen and gives it to dom0.  Since the
> > >>>>>> Xen hpet emulation does not do RTC emulation, bad things
> > >>>>>> then happen when a deep Cstate is entered (dom0 apparently
> > >>>>>> never wakes up).  I think Ke Yu has also reproduced 
> > this problem.
> > >>>>>> 
> > >>>>>> Sometime in the last few weeks, some patch in xen-unstable
> > >>>>>> apparently changed some defaults and xen-unstable will
> > >>>>>> no longer boot with this processor/dom0, with or without
> > >>>>>> hpetbroadcast on the Xen command line.  However, specifying
> > >>>>>> max_cstate=2 on the Xen command line allows a successful
> > >>>>>> dom0 boot, so I suspect the problem is the same (or at
> > >>>>>> least very similar).
> > >>>>>> 
> > >>>>>> I did a quick scan for hpet changes and found c/s 20497,
> > >>>>>> but backing it out made no difference.
> > >>>>>> 
> > >>>>>> I have a workaround for now, but since it is likely that
> > >>>>>> many customers (including all of Oracle's OVS customers)
> > >>>>>> use a RHEL5-based dom0 boot sequence, and Merom processors
> > >>>>>> work fine otherwise, it would be nice to get this identified
> > >>>>>> and fixed before 4.0.
> > >>>>> 
> > >>>>> Let's firstly figure out which component the issue resides.
> > >>>>> 
> > >>>>> Firstly, in the default boot (i.e. without specifying
> > >>>>> max_cstate=2), when dom0 hangs, is xen still alive, E.g. can
> > >>>>> Xen response to three Ctrl-'A' in serial?
> > >>>>> 
> > >>>>> If only dom0 hangs, it is probably that RTC malfunction make
> > >>>>> incorrect dom0 time and lead dom0 fail to boot. Then RTC
> > >>>>> emulation in hypervisor should fix this issue.
> > >>>>> 
> > >>>>> If Xen also hangs, it should be another bug, i.e. hpet
> > >>>>> broadcast does not wake up CPU in deep C states. in this
> > >>>>> case, if convenient, could you help to do some bisect to see
> > >>>>> which cset cause this bug? 
> > >>>>> 
> > >>>>> Best Regards
> > >>>>> Ke
> > 
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.