[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Enabling VT-d PI by default



On Mon, May 15, 2017 at 2:35 PM, Andrew Cooper
<andrew.cooper3@xxxxxxxxxx> wrote:
> On 15/05/17 11:27, George Dunlap wrote:
>> On Fri, May 12, 2017 at 12:05 PM, Andrew Cooper
>> <andrew.cooper3@xxxxxxxxxx> wrote:
>>> Citrix Netscalar SDX boxes have more MSI-X interrupts than fit in the
>>> cumulative IDTs of a top end dual-socket Xeon server systems.  Some of
>>> the device drivers are purposefully modelled to use fewer interrupts
>>> than they otherwise would want to.
>>>
>>> Using PI is the proper solution longterm, because doing so would remove
>>> any need to allocate IDT vectors for the interrupts; the IOMMU could be
>>> programmed to dump device vectors straight into the PI block without
>>> them ever going through Xen's IDT.
>> I wouldn't necessarily call that a "proper" solution. With PI, instead
>> of an interrupt telling you exactly which VM to wake up and/or which
>> routine you need to run, instead you have to search through
>> (potentially) thousands of entries to see which vcpu the interrupt you
>> received wanted to wake up; and you need to do that on every single
>> interrupt.  (Obviously it does have the advantage that if the vcpu
>> happens to be running Xen doesn't get an interrupt at all.)
>
> Having spoken to the PI architects, this is not how the technology was
> designed to be used.
>
> On systems with this number of in-flight interrupts, trying to track
> "who got what interrupt" for priority boosting purposes is a waste of
> time, as we spend ages taking vmexits to process interrupt notifications
> for out-of-context vcpus.
>
> The way the PI architects envisaged the technology being used is that
> Suppress Notification is set at all points other than executing in
> non-root mode for the vcpu in question (there is a small race window
> around clearing SN on vmentry), and that the scheduler uses Outstanding
> Notification on each of the PI blocks when it rebalances credit to see
> which vcpus have had interrupts in the last 30ms.

It sounds like they may have made the mistake that the Credit1
designers made, in analyzing only a system that was overloaded; and
one where all workloads were identical, as opposed to analyzing a
system that was at least sometimes partially loaded, and where
workloads were very different.

You're right that if you weren't going to preempt the currently
running vcpu anyway, there's no need for Xen to get the interrupt.

But it should be obvious that on a system that's idle (even for a
relatively short amount of time) that we want to get the interrupt and
wake up the appropriate vcpu immediately.  It should also be obvious
that in a mixed workload, where one vcpu is doing tons of computation
and another is mainly handling interrupts quickly and going to sleep
again, that we would want Xen at regular intervals to check to see if
it should run the vcpu that's mostly handling interrupts.  We
generally wouldn't want to delay waking up the lower-priority vcpu
longer than 1ms.

In both cases, waiting 30ms to see if we should wake somebody up is
far too long.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.