On 14.06.2011, at 15:27, Stefano Stabellini wrote:
> On Tue, 14 Jun 2011, Alexander Graf wrote:
>>>>>>> static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id)
>>>>>>> {
>>>>>>> PCII440FXState *d = opaque;
>>>>>>> @@ -267,8 +263,17 @@ static PCIBus *i440fx_common_init(const char
>>>>>>> *device_name,
>>>>>>> d = pci_create_simple(b, 0, device_name);
>>>>>>> *pi440fx_state = DO_UPCAST(PCII440FXState, dev, d);
>>>>>>>
>>>>>>> - piix3 = DO_UPCAST(PIIX3State, dev,
>>>>>>> - pci_create_simple_multifunction(b, -1, true,
>>>>>>> "PIIX3"));
>>>>>>> + if (xen_enabled()) {
>>>>>>> + piix3 = DO_UPCAST(PIIX3State, dev,
>>>>>>> + pci_create_simple_multifunction(b, -1, true,
>>>>>>> "PIIX3-xen"));
>>>>>>> + pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
>>>>>>> + piix3, XEN_PIIX_NUM_PIRQS);
>>>>>>
>>>>>> But with XEN_PIIX_NUM_PIRQS it's not a piix3 anymore, no? What's the
>>>>>> reason behind this change?
>>>>>
>>>>> It is still a piix3, but also provides non-legacy interrupt links to the
>>>>> IO-APIC.
>>>>> The four pins of each PCI device on the bus not only are routed to the
>>>>> normal four pirqs (programmed writing to 0x60-0x63, see above) but also
>>>>> they are connected to the IO-APIC directly.
>>>>> These additional routes can only be discovered through ACPI, so you need
>>>>> matching ACPI tables. We used to build the old ACPI tables like this:
>>>>>
>>>>> /* PRTA: APIC routing table (via non-legacy IOAPIC GSIs). */
>>>>> printf("Name(PRTA, Package() {\n");
>>>>> for ( dev = 1; dev < 32; dev++ )
>>>>> for ( intx = 0; intx < 4; intx++ ) /* INTA-D */
>>>>> printf("Package(){0x%04xffff, %u, 0, %u},\n",
>>>>> dev, intx, ((dev*4+dev/8+intx)&31)+16);
>>>>> printf("})\n");
>>>>>
>>>>
>>>> Interesting concept, but completely non-standard and very much
>>>> different from real hardware. Please at least add a comment there to
>>>> show readers that Xen is doing a hack which is not at all related to
>>>> how the PIIX really works.
>>>
>>> Isn't this more a function of the "wires" on the motherboard than the
>>> PIIX specifically? i.e. this just encodes the permutation of the wires
>>> from the PCI slots into the IO-APIC input pins (bypassing the PIIX,
>>> which is only used for legacy ISA IRQs i.e. by non-APIC aware OSes)?
>>
>> Interrupts with PCI work slightly different. PCI devices can map (themselves
>> or by software) to one of 4 interrupt lines: INTA, INTB, INTC, INTD. These
>> get converted using PCI host controller specific logic to 4 interrupt lines
>> which then go into the IO-APIC.
>>
>> The IO-APIC is a chip with a limited number of pins. IIRC it was 24, could
>> be 26 though.
>
> The number of redirection entries in the IOAPIC can be discovered
> reading from the IOAPICVER register and it is a property of a specific
> model of IOAPIC. As a matter of fact Xen's emulated IOAPIC supports more
> pins than the most popular IOAPIC used with PIIX3.
which means you're emulating hardware that never existed :).
>
>
>> I haven't seen a single case where PCI devices have a direct link to the
>> IO-APIC. I also have not seen any PCI host controller that exports more than
>> 4 interrupts. Giving each PCI device its own line, on top of that more than
>> ever could be in real hardware, is a plain hack IMHO.
>
> Actually this happens quite often: if I am not mistaken all the GSIs
> higher than 15 are actually the result of a direct connection between
> an interrupt source and the IOAPIC. I have several on my testboxes.
Yes. "Interrupt source" meaning a wire on the board. I haven't seen any
situation so far where you get direct IO-APIC connections to PCI _device_ pins.
You obviously get plenty connections to PCI _bus_ pins.
> Also give a look at the Intel Multiprocessor Specification, section
> 3.6.2.3: as you can see from the diagram in "Symmetric I/O Mode" all the
> interrupts are routed through the IOAPIC directly.
>
>
>> Did this really give you actual performance/latency/scalability gains? I
>> still think for devices that matter, we should go with MSI rather than
>> deriving from real hw.
>>
>
> Not all the operating systems support MSIs, it is nice to be able to
> avoid interrupt sharing without recurring to MSIs.
Yes and no. It's a tradeoff. If no interrupt sharing means that we emulate
hardware that simply never could have existed the way we model it, I think it's
a bad idea.
> Also this is how Xen has been working for more then 5 years in HVM mode,
> so this configuration is well tested and supported by most operating
> systems (at least all the ones we tried so far).
I'm fine with Xen breaking its own neck, as long as it doesn't affect non-Xen
code paths. Just be aware that I'm not a huge fan of this approach :).
> In any case I think it is a good idea to add a comment to better explain
> what we are doing, see below.
>
>
>
> commit 973bb091a967fdec37a1bc8fe30d46a483d2903d
> Author: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
> Date: Tue May 17 12:10:36 2011 +0000
>
> xen: fix interrupt routing
>
> - remove i440FX-xen and i440fx_write_config_xen
> we don't need to intercept pci config writes to i440FX anymore;
>
> - introduce PIIX3-xen and piix3_write_config_xen
> we do need to intercept pci config write to the PCI-ISA bridge to update
> the PCI link routing;
>
> - set the number of PIIX3-xen interrupts line to 128;
I still find it unpretty and I'm pretty sure it's completely different from
real hardware, but since Xen code is your call and this doesn't affect non-Xen
workloads, I won't block it, unless someone else is very much opposed to it.
Please resend as proper patch.
Alex
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|