Xen project Mailing List

Re: [Xen-devel] Enabling VT-d PI by default

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: George Dunlap <george.dunlap@xxxxxxxxxx>

Date: Mon, 15 May 2017 15:32:15 +0100

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Chao Gao <chao.gao@xxxxxxxxx>

Delivery-date: Mon, 15 May 2017 14:38:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, May 15, 2017 at 2:35 PM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > On 15/05/17 11:27, George Dunlap wrote: >> On Fri, May 12, 2017 at 12:05 PM, Andrew Cooper >> <andrew.cooper3@xxxxxxxxxx> wrote: >>> Citrix Netscalar SDX boxes have more MSI-X interrupts than fit in the >>> cumulative IDTs of a top end dual-socket Xeon server systems. Some of >>> the device drivers are purposefully modelled to use fewer interrupts >>> than they otherwise would want to. >>> >>> Using PI is the proper solution longterm, because doing so would remove >>> any need to allocate IDT vectors for the interrupts; the IOMMU could be >>> programmed to dump device vectors straight into the PI block without >>> them ever going through Xen's IDT. >> I wouldn't necessarily call that a "proper" solution. With PI, instead >> of an interrupt telling you exactly which VM to wake up and/or which >> routine you need to run, instead you have to search through >> (potentially) thousands of entries to see which vcpu the interrupt you >> received wanted to wake up; and you need to do that on every single >> interrupt. (Obviously it does have the advantage that if the vcpu >> happens to be running Xen doesn't get an interrupt at all.) > > Having spoken to the PI architects, this is not how the technology was > designed to be used. > > On systems with this number of in-flight interrupts, trying to track > "who got what interrupt" for priority boosting purposes is a waste of > time, as we spend ages taking vmexits to process interrupt notifications > for out-of-context vcpus. > > The way the PI architects envisaged the technology being used is that > Suppress Notification is set at all points other than executing in > non-root mode for the vcpu in question (there is a small race window > around clearing SN on vmentry), and that the scheduler uses Outstanding > Notification on each of the PI blocks when it rebalances credit to see > which vcpus have had interrupts in the last 30ms. It sounds like they may have made the mistake that the Credit1 designers made, in analyzing only a system that was overloaded; and one where all workloads were identical, as opposed to analyzing a system that was at least sometimes partially loaded, and where workloads were very different. You're right that if you weren't going to preempt the currently running vcpu anyway, there's no need for Xen to get the interrupt. But it should be obvious that on a system that's idle (even for a relatively short amount of time) that we want to get the interrupt and wake up the appropriate vcpu immediately. It should also be obvious that in a mixed workload, where one vcpu is doing tons of computation and another is mainly handling interrupts quickly and going to sleep again, that we would want Xen at regular intervals to check to see if it should run the vcpu that's mostly handling interrupts. We generally wouldn't want to delay waking up the lower-priority vcpu longer than 1ms. In both cases, waiting 30ms to see if we should wake somebody up is far too long. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.