[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH] IRQ handling race and spurious IIR read in serial/8250.c



Anders Kaseorg writes ("Re: Serial console hangs with Linux 2.6.20 HVM guest"):
> Yes, I took Linux v2.6.20 on amd64, ran `make defconfig`, then ran `make 
> menuconfig` and turned off CONFIG_HOTPLUG_CPU (Processor type and features 
> ò Support for hot-pluggable CPUs).

Thanks.  I think I have tracked down the bug.  In
drivers/serial/8250.c in Linux there are two bugs:
 1. UART_BUG_TXEN can be spuriously set, due to an IRQ race
 2. The workaround then applied by the kernel is itself buggy

Anders: can you try two tests for me ?  Firstly, in
serial8250_startup, delete the section which sets UART_BUG_TXEN (see
2nd patch below); I think this will fix the symptoms for you.
Secondly, in serial8250_start_tx delete the read from the IIR and the
relevant branch of the text (see 3rd patch below); I think this will
also in itself fix your symptoms.  I haven't compiled either patch (so
you may find that eg I missed deleting some variables).


The bugs in detail (this discussion applies to 2.6.20 and also to
2.6.28.4):

 1. The hunk of serial8250_startup I quote below attempts to discover
    whether writing the IER re-asserts the THRI (transmit ready)
    interrupt.  However the spinlock that it has taken out,
    port->lock, is not the one that the IRQ service routine takes
    before reading the IIR (i->lock).  As a result, on an SMP system
    the generated interrupt races with the straight-line code in
    serial8250_startup.

    If serial8250_startup loses the race (perhaps because the system
    is a VM and its VCPU got preempted), UART_BUG_TXEN is spuriously
    added to bugs.  This is quite unlikely in a normal system but in
    certain Xen configurations, particularly ones where there is CPU
    pressure, we may lose the race every time.

    It is not exactly clear to me how this ought to be resolved.  One
    possibility is that the UART_BUG_TXEN problem might be worked
    around perfectly well by the new and very similar workaround
    UART_BUG_THRE[1] in 2.6.21ish in which case it could just be
    removed.
    
 2. UART_BUG_TXEN's workaround appears to be intended to be harmless.
    However what it actually does is to read the IIR, thus clearing
    any actual interrupt (including incidentally non-THRI), and then
    only perform the intended servicing if the interrupt was _not_
    asserted.  That is, it breaks on any serial port with the bug.

    As far as I can see there is not much use in UART_BUG_TXEN reading
    IIR at all, so a suitable change if we want to keep UART_BUG_TXEN
    might be the first patch I enclose below (again, not compiled
    or tested).

    If UART_BUG_TXEN is retained something along these lines should be
    done at the very least.

Ian.

[1] 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=40b36daad0ac704e6d5c1b75789f371ef5b053c1
    in which case UART


Proposed initial band-aid fix (against 2.6.28.4):


Do not read IIR in serial8250_start_tx when UART_BUG_TXEN

Reading the IIR clears some oustanding interrupts so it is not safe.
Instead, simply transmit immediately if the buffer is empty without
regard to IIR.

Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

--- ../linux-2.6.28.4/drivers/serial/8250.c~    2009-02-06 21:47:45.000000000 
+0000
+++ ../linux-2.6.28.4/drivers/serial/8250.c     2009-02-11 15:55:24.000000000 
+0000
@@ -1257,14 +1257,12 @@
                serial_out(up, UART_IER, up->ier);
 
                if (up->bugs & UART_BUG_TXEN) {
-                       unsigned char lsr, iir;
+                       unsigned char lsr;
                        lsr = serial_in(up, UART_LSR);
                        up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS;
-                       iir = serial_in(up, UART_IIR) & 0x0f;
                        if ((up->port.type == PORT_RM9000) ?
-                               (lsr & UART_LSR_THRE &&
-                               (iir == UART_IIR_NO_INT || iir == 
UART_IIR_THRI)) :
-                               (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT))
+                               (lsr & UART_LSR_THRE) :
+                               (lsr & UART_LSR_TEMT))
                                transmit_chars(up);
                }
        }


Anders - first patch to try (against 2.6.20):

--- drivers/serial/8250.c~      2007-02-04 18:44:54.000000000 +0000
+++ drivers/serial/8250.c       2009-02-11 15:39:43.000000000 +0000
@@ -1645,25 +1645,6 @@
 
        serial8250_set_mctrl(&up->port, up->port.mctrl);
 
-       /*
-        * Do a quick test to see if we receive an
-        * interrupt when we enable the TX irq.
-        */
-       serial_outp(up, UART_IER, UART_IER_THRI);
-       lsr = serial_in(up, UART_LSR);
-       iir = serial_in(up, UART_IIR);
-       serial_outp(up, UART_IER, 0);
-
-       if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
-               if (!(up->bugs & UART_BUG_TXEN)) {
-                       up->bugs |= UART_BUG_TXEN;
-                       pr_debug("ttyS%d - enabling bad tx status 
workarounds\n",
-                                port->line);
-               }
-       } else {
-               up->bugs &= ~UART_BUG_TXEN;
-       }
-
        spin_unlock_irqrestore(&up->port.lock, flags);
 
        /*


Anders - second patch to try (against 2.6.20):
Fix should be suitable for distribution IMO.

Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

--- drivers/serial/8250.c~      2007-02-04 18:44:54.000000000 +0000
+++ drivers/serial/8250.c       2009-02-11 15:41:51.000000000 +0000
@@ -1136,10 +1136,9 @@
                serial_out(up, UART_IER, up->ier);
 
                if (up->bugs & UART_BUG_TXEN) {
-                       unsigned char lsr, iir;
+                       unsigned char lsr;
                        lsr = serial_in(up, UART_LSR);
-                       iir = serial_in(up, UART_IIR);
-                       if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT)
+                       if (lsr & UART_LSR_TEMT)
                                transmit_chars(up);
                }
        }


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.