Xen project Mailing List

RE: [Xen-devel] MSI and VT-d interrupt remapping

To: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>

From: Espen Skoglund <espen.skoglund@xxxxxxxxxxxxx>

Date: Tue, 25 Mar 2008 16:38:28 +0000

Cc: "Shan, Haitao" <haitao.shan@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxx, Espen Skoglund <espen.skoglund@xxxxxxxxxxxxx>

Delivery-date: Tue, 25 Mar 2008 09:40:02 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

[Yunhong Jiang] > xen-devel-bounces@xxxxxxxxxxxxxxxxxxx <> wrote: >> You're right in that Linux does not currently support this. You >> can, however, allocate multiple interrupts using MSI-X. Anyhow, I >> was not envisioning this feature being used directly for >> passthrough device access. Rather, I was considering the case >> where a device could be configured to communicate data directly >> into a VM (e.g., using multi-queue NICs) and deliver the interrupt >> to the appropriate VM. In this case the frontend in the guest >> would not need to see a multi-message MSI device, only the backend >> in dom0/the driver domain would need to be made aware of it. > Although I don't know if any device has such usage model (Intel's > VMDq is using MSI-X ), but yes, your usage model will be helpful. > To achive this, maybe we need change the protocol between pci > backend and pci frontend, in fact, maybe the > pci_enable_msi/pci_enable_msix can be commbind, with a flag to > determin if the vector should be continous or not. This is similar to my initial idea as well. In addition to being contigous the multi-message MSI request would also need to allocate vectors that are properly aligned. > One thing left is, how can the driver domain bind the vector to the > frontend VM. Some sanity check mechanism should be added. Well, there exists a domctl for modifying the permissions of a pirq. This could be used to grant pirq access to a frontend domain. Not sure if this is sufficient. Also, as discussed in my previous reply dom0 may need the ability to reset the affinity of an irq when migrating the destination vcpu. Further, a pirq is now always bound to vcpu[0] of a domain (in evtchn_bind_pirq). There is clearly some room for improvement and more flexibility here. Not sure what the best solution is. One option is to allow a guest to re-bind a pirq to set its affinity, and have such expliticly set affinities be automatically updated when the associated vcpu is migrated. Another option is to create unbound ports in a guest domain and let a privileged domain bind pirqs to those port. The privileged domain should then also be allowed to later modify the destination vcpu and set the affinity of the bound pirq. > BTW, can you tell which device may use this feature? I'm a bit > interesting on this. I must confess that I do not know of any device that currently use this feature (perhaps Solarflare or NetXen devices have support for it), and the whole connection with VT-d interreupt remapping is as of now purely academic anyway due to the lack of chipsets with the apropriate feature. However, the whole issue of binding multiple pirqs of a device to different guest domains remains the same even if using MSI-X. Multi-message MSI devices only/mostly add some additional restrictions upon allocating interrupt vectors. >>>> I do not think explicitly specifying destination APIC upon >>>> allocation is the best idea. Setting the affinity upon binding >>>> the interrupt like it's done today seems like a better approach. >>>> This leaves us with dealing with the vectors. >> >>> But what should happen when the vcpu is migrated to another >>> physical cpu? I'm not sure the cost to program the interrupt >>> remapping table, otherwise, that is a good choice to achieveh the >>> affinity. >> >> As you've already said, the interrupt affinity is only set when a >> pirq is bound. The interrupt routing is not redirected if the vcpu >> it's bound to migrates to another physical cpu. This can (should?) >> be changed in the future so that the affinity is either set >> implicitly when migrating the vcpu, or explictily with a rebind >> call by dom0. In any case the affinity would be reset by the >> set_affinity method. > Yes, I remember Keir suggested to use interrupt remapping table in > vtd to achieve this, not sure that is still ok. Relying on the VT-d interrupt remapping table would rule out any Intel chipset on the market today, and also the equivalent solution (if any) used by AMD and others. It seems better to update the IOAPIC entry or MSI capability structure directly when redirecting the interrupt, and let io_apic_write() or the equivalent function for MSI rewrite the interrupt remapping table if VT-d is enabled. Not sure how much it would cost to rewrite the remapping table and perform the respecive VT-d interrupt entry cache flush; it's difficult to measure without actually having any available hardware. However, I suspect the cost would in many cases be dwarfed by migrating the cache working set and by other associated costs of migrating a vcpu. eSk _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.