[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] MSI and VT-d interrupt remapping



[Yunhong Jiang]
> xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote:
>> You're right in that Linux does not currently support this.  You
>> can, however, allocate multiple interrupts using MSI-X.  Anyhow, I
>> was not envisioning this feature being used directly for
>> passthrough device access.  Rather, I was considering the case
>> where a device could be configured to communicate data directly
>> into a VM (e.g., using multi-queue NICs) and deliver the interrupt
>> to the appropriate VM.  In this case the frontend in the guest
>> would not need to see a multi-message MSI device, only the backend
>> in dom0/the driver domain would need to be made aware of it.

> Although I don't know if any device has such usage model (Intel's
> VMDq is using MSI-X ), but yes, your usage model will be helpful.
> To achive this, maybe we need change the protocol between pci
> backend and pci frontend, in fact, maybe the
> pci_enable_msi/pci_enable_msix can be commbind, with a flag to
> determin if the vector should be continous or not.

This is similar to my initial idea as well.  In addition to being
contigous the multi-message MSI request would also need to allocate
vectors that are properly aligned.

> One thing left is, how can the driver domain bind the vector to the
> frontend VM.  Some sanity check mechanism should be added.

Well, there exists a domctl for modifying the permissions of a pirq.
This could be used to grant pirq access to a frontend domain.  Not
sure if this is sufficient.

Also, as discussed in my previous reply dom0 may need the ability to
reset the affinity of an irq when migrating the destination vcpu.
Further, a pirq is now always bound to vcpu[0] of a domain (in
evtchn_bind_pirq).  There is clearly some room for improvement and
more flexibility here.

Not sure what the best solution is.  One option is to allow a guest to
re-bind a pirq to set its affinity, and have such expliticly set
affinities be automatically updated when the associated vcpu is
migrated.  Another option is to create unbound ports in a guest domain
and let a privileged domain bind pirqs to those port.  The privileged
domain should then also be allowed to later modify the destination
vcpu and set the affinity of the bound pirq.


> BTW, can you tell which device may use this feature? I'm a bit
> interesting on this.

I must confess that I do not know of any device that currently use
this feature (perhaps Solarflare or NetXen devices have support for
it), and the whole connection with VT-d interreupt remapping is as of
now purely academic anyway due to the lack of chipsets with the
apropriate feature.

However, the whole issue of binding multiple pirqs of a device to
different guest domains remains the same even if using MSI-X.
Multi-message MSI devices only/mostly add some additional restrictions
upon allocating interrupt vectors.


>>>> I do not think explicitly specifying destination APIC upon
>>>> allocation is the best idea.  Setting the affinity upon binding
>>>> the interrupt like it's done today seems like a better approach.
>>>> This leaves us with dealing with the vectors.
>> 
>>> But what should happen when the vcpu is migrated to another
>>> physical cpu? I'm not sure the cost to program the interrupt
>>> remapping table, otherwise, that is a good choice to achieveh the
>>> affinity.
>> 
>> As you've already said, the interrupt affinity is only set when a
>> pirq is bound.  The interrupt routing is not redirected if the vcpu
>> it's bound to migrates to another physical cpu.  This can (should?)
>> be changed in the future so that the affinity is either set
>> implicitly when migrating the vcpu, or explictily with a rebind
>> call by dom0.  In any case the affinity would be reset by the
>> set_affinity method.

> Yes, I remember Keir suggested to use interrupt remapping table in
> vtd to achieve this, not sure that is still ok.

Relying on the VT-d interrupt remapping table would rule out any Intel
chipset on the market today, and also the equivalent solution (if any)
used by AMD and others.

It seems better to update the IOAPIC entry or MSI capability structure
directly when redirecting the interrupt, and let io_apic_write() or
the equivalent function for MSI rewrite the interrupt remapping table
if VT-d is enabled.  Not sure how much it would cost to rewrite the
remapping table and perform the respecive VT-d interrupt entry cache
flush; it's difficult to measure without actually having any available
hardware.  However, I suspect the cost would in many cases be dwarfed
by migrating the cache working set and by other associated costs of
migrating a vcpu.

        eSk

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.