[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4 serial hangs during boot

On Tue, Jul 24, 2012 at 11:32:19AM +0100, Jan Beulich wrote:
> >>> On 23.07.12 at 22:53, "Christopher S. Aker" <caker@xxxxxxxxxxxx> wrote:
> > On 7/20/12 3:59 PM, Keir Fraser wrote:
> >> Then it is Xen doing something to kill the serial interrupt. ;-) I haven't
> >> seen anything like this reported before. Not sure what to suggest really...
> >> Gather debug output from interrupt-related debug keys (via the xl 
> >> debug-keys
> >> interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen
> >> and dom0 boot logs... something might become apparent.
> > 
> > We hit this again today, and I grabbed boot and debug-keys output:
> > 
> > http://theshore.net/~caker/xen/BUGS/serial/log.txt 
> This isn't even 8k that make it over, whereas the transmit buffer
> is 16k, and dropping of characters would only start when it first
> got full.
> The part of the data that didn't make it out isn't big enough to
> overflow the buffer - to check whether that would actually
> happen, could you increase the log level of both hypervisor and
> Dom0 kernel? To me this all (particularly the fact that you can
> make the data appear combined with the amount of data not
> being big enough to fill the buffer) looks as if there was some
> buffering happening outside of the control of Xen. Did you check
> whether this is possibly a problem with the remote end?

This got me thinking - I've one particular AMD machine (prototype) that
seems to hang often - but if I use 'sync_console' it works fine.

This issue started oooh, I can't remember when but I do have some logs
that could shed some light on the about date. I guess I was
too quick to blame the prototype for being at fault here :-(

Then recently (yesterday?) the upstream kernel started doing something
wonky on this card:

01:05.0 Serial controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 
Under Xen, when it boots it hits right here:
[    1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002
and then stops [note: I hadn't really done any investigation to see
if the machine is dead or if it continues on, but with the serial port just
wedged hard].

On baremetal it can actually read the IO bars:
[    1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002
[    1.247075] pci 0000:01:05.0: reg 10: [io  0xe050-0xe057]
[    1.252734] pci 0000:01:05.0: reg 14: [io  0xe040-0xe047]
[    1.258394] pci 0000:01:05.0: reg 18: [io  0xe030-0xe037]
[    1.264054] pci 0000:01:05.0: reg 1c: [io  0xe020-0xe027]
[    1.269713] pci 0000:01:05.0: reg 20: [io  0xe010-0xe017]
[    1.275372] pci 0000:01:05.0: reg 24: [io  0xe000-0xe00f]

so I am wondering if the back-ports in Xen 4.1 for dealing with
PCI have something to do with this? 

> Does this also happen with "sync_console"? Did you check
> whether disabling the use of the associated IRQ makes any
> difference, as suggested by Konrad (I think)?
> Does the port work flawlessly on native Linux?
> Jan
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.