[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] AMD/IO-APIC: Use old IO-APIC ack method if AMD 813{1, 2} PCI-X tunnel is present



On 04/06/13 17:29, Jan Beulich wrote:
>>>> On 04.06.13 at 12:13, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> References at time of patch:
>>
>> AMD 8132 PCI-X HyperTransport Tunnel
>> http://support.amd.com/us/ChipsetMotherboard_TechDocs/30801.pdf 
>>
>> AMD 8131 PCI-X HyperTransport Tunnel
>> http://support.amd.com/us/ChipsetMotherboard_TechDocs/26310_AMD-8131_HyperTra
>>  
>> nsport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf
>>
>> The IO-APICs on the above two PCI-X HyperTransport Tunnels will issue illegal
>> HT packets if an interrupt gets masked while active.
>>
>> This is sadly exactly what the "new" IO-APIC ack mode does.  The "old" ack
>> mode does however avoid this bad behaviour, and stabilises an affected 
>> system belonging to a customer.
> Reading the errata descriptions, I'm getting the impression that
> your patch only tries to deal with some portion of the issue, and
> even that only on the basis that devices (and perhaps even their
> drivers) are well behaved. That's because there's no silencing of
> interrupt sources being done here, and I also can't see how we
> would from the hypervisor.
>
> If that's correct, is adding the workaround code really worth it,
> rather than documenting that on those systems "ioapic_ack=old"
> will make them a little more reliable?
>
> If it's not correct, would you mind explaining in some more detail
> how you see your change dealing with all aspects of the erratum,
> namely for edge triggered IRQs as well as for the masking that's
> done with the old ack mode for level triggered ones (or why you
> don't have to deal with those cases)?

The observation we have from the customer is that using "new" mode will
result in host hangs/crashes about once every two hours.  With "old",
the system has been rock solid for a week.

Unfortunately I do not have the hardware at my disposal.

>
>> +void __init amd813x_errata_quirks(void)
>> +{
>> +    unsigned i, found = 0;
>> +    uint32_t bus, dev, vendor_id;
>> +
>> +    if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
>> +        return;
>> +
>> +    /* Check for the affected IO-APIC version (0x11) to save scanning the 
>> PCI
>> +     * bus on most systems. */
>> +    for ( i = 0; i < nr_ioapics; ++i )
>> +    {
>> +        if ( mp_ioapics[i].mpc_apicver == 0x11 )
>> +        {
>> +            found = 1;
>> +            break;
>> +        }
>> +    }
>> +
>> +    if ( !found )
>> +        return;
>> +
>> +    for ( bus = 0; bus < 256; ++bus )
>> +    {
>> +        for ( dev = 0; dev < 32; ++dev )
>> +        {
>> +            /* IO-APIC is always Function 1.  Check for Multifunction. */
>> +            if ( !( pci_conf_read8(0, bus, dev, 0, PCI_HEADER_TYPE ) & 0x80 
>> ) )
>> +                continue;
>> +
>> +            vendor_id = pci_conf_read32(0, bus, dev, 1, PCI_VENDOR_ID);
>> +
>> +            /* 0x7451 is AMD-8131 PCI-X IO-APIC */
>> +            if ( vendor_id == 0x74511022 )
>> +                printk(XENLOG_WARNING "Found AMD 8131 PCI-X tunnel, 
>> erraturm #63: ");
> "erratum"

Doh.

>
>> +            /* 0x7459 is AMD-8132 PCI-X IO-APIC */
>> +            else if ( vendor_id == 0x74591022 )
>> +                printk(XENLOG_WARNING "Found AMD 8132 PCI-X tunnel, 
>> erraturm #81: ");
> Again.
>
>> +            else
>> +                continue;
> And overall this is the kind of thing that I think a switch statement
> is dealing with in a more legible way, at once avoiding the need for
> the "vendor_id" local variable.

Ok

>
>> +
>> +            if ( ioapic_ack_forced )
>> +            {
> Inverting the condition would allow you to make this a "if/else-if/else"
> sequence, with less deep indentation...
>
>> +                if ( ioapic_ack_new != 0 )
> != 0 on a boolean variable? Two lines earlier you didn't do so,
> please be consistent.

Sorry - I thought I checked this.

>
>> +                    printk("Not overriding command line. System will be 
>> unstable. "
> Is "will" really right here? These chipsets are rather old, and not
> having seen reports of instabilities makes me imply that many
> such systems just happen to work fine. Therefore, "may" might
> be the better term...

Yes - see description above.

>
>> +                           "Use ioapic_ack_mode=old\n");
> "Don't use \"ioapic_ack=new\"\n" would seem better.
>
> Jan

No - this is more informative to someone who doesn't know the Xen
command line arguments inside/out.

~Andrew

>
>> +                else
>> +                    printk("Command line is ok for system stability\n");
>> +            }
>> +            else
>> +            {
>> +                printk("Forcing 'old' IO-APIC ack method\n");
>> +                ioapic_ack_new = 0;
>> +            }
>> +            return;
>> +        }
>> +    }
>> +}
>> +
>>  void __init setup_IO_APIC(void)
>>  {
>> +    amd813x_errata_quirks();
>> +
>>      enable_IO_APIC();
>>  
>>      if (acpi_ioapic)
>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.