Xen project Mailing List

Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling

To: George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

From: "Wu, Feng" <feng.wu@xxxxxxxxx>

Date: Wed, 9 Mar 2016 05:22:24 +0000

Accept-language: en-US

Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Dario Faggioli <dario.faggioli@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "Wu, Feng" <feng.wu@xxxxxxxxx>

Delivery-date: Wed, 09 Mar 2016 05:22:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHReTJ11JqJFTCaAUulU/UisU2Xyp9Pgj9w//+Wp4CAABC1gIAAF2eAgAFPelA=

Thread-topic: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt core logic handling

> -----Original Message----- > From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx] > Sent: Wednesday, March 9, 2016 1:06 AM > To: Jan Beulich <JBeulich@xxxxxxxx>; George Dunlap > <George.Dunlap@xxxxxxxxxxxxx>; Wu, Feng <feng.wu@xxxxxxxxx> > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; Dario Faggioli > <dario.faggioli@xxxxxxxxxx>; Tian, Kevin <kevin.tian@xxxxxxxxx>; xen- > devel@xxxxxxxxxxxxx; Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>; Keir > Fraser <keir@xxxxxxx> > Subject: Re: [Xen-devel] Ideas Re: [PATCH v14 1/2] vmx: VT-d posted-interrupt > core logic handling > > On 08/03/16 15:42, Jan Beulich wrote: > >>>> On 08.03.16 at 15:42, <George.Dunlap@xxxxxxxxxxxxx> wrote: > >> On Tue, Mar 8, 2016 at 1:10 PM, Wu, Feng <feng.wu@xxxxxxxxx> wrote: > >>>> -----Original Message----- > >>>> From: George Dunlap [mailto:george.dunlap@xxxxxxxxxx] > >> [snip] > >>>> It seems like there are a couple of ways we could approach this: > >>>> > >>>> 1. Try to optimize the reverse look-up code so that it's not a linear > >>>> linked list (getting rid of the theoretical fear) > >>> > >>> Good point. > >>> > >>>> > >>>> 2. Try to test engineered situations where we expect this to be a > >>>> problem, to see how big of a problem it is (proving the theory to be > >>>> accurate or inaccurate in this case) > >>> > >>> Maybe we can run a SMP guest with all the vcpus pinned to a dedicated > >>> pCPU, we can run some benchmark in the guest with VT-d PI and without > >>> VT-d PI, then see the performance difference between these two sceanrios. > >> > >> This would give us an idea what the worst-case scenario would be. > > > > How would a single VM ever give us an idea about the worst > > case? Something getting close to worst case is a ton of single > > vCPU guests all temporarily pinned to one and the same pCPU > > (could be multi-vCPU ones, but the more vCPU-s the more > > artificial this pinning would become) right before they go into > > blocked state (i.e. through one of the two callers of > > arch_vcpu_block()), the pinning removed while blocked, and > > then all getting woken at once. > > Why would removing the pinning be important? > > And I guess it's actually the case that it doesn't need all VMs to > actually be *receiving* interrupts; it just requires them to be > *capable* of receiving interrupts, for there to be a long chain all > blocked on the same physical cpu. > > > > >> But > >> pinning all vcpus to a single pcpu isn't really a sensible use case we > >> want to support -- if you have to do something stupid to get a > >> performance regression, then I as far as I'm concerned it's not a > >> problem. > >> > >> Or to put it a different way: If we pin 10 vcpus to a single pcpu and > >> then pound them all with posted interrupts, and there is *no* > >> significant performance regression, then that will conclusively prove > >> that the theoretical performance regression is of no concern, and we > >> can enable PI by default. > > > > The point isn't the pinning. The point is what pCPU they're on when > > going to sleep. And that could involve quite a few more than just > > 10 vCPU-s, provided they all sleep long enough. > > > > And the "theoretical performance regression is of no concern" is > > also not a proper way of looking at it, I would say: Even if such > > a situation would happen extremely rarely, if it can happen at all, > > it would still be a security issue. > > What I'm trying to get at is -- exactly what situation? What actually > constitutes a problematic interrupt latency / interrupt processing > workload, how many vcpus must be sleeping on the same pcpu to actually > risk triggering that latency / workload, and how feasible is it that > such a situation would arise in a reasonable scenario? > > If 200us is too long, and it only takes 3 sleeping vcpus to get there, > then yes, there is a genuine problem we need to try to address before we > turn it on by default. If we say that up to 500us is tolerable, and it > takes 100 sleeping vcpus to reach that latency, then this is something I > don't really think we need to worry about. > > "I think something bad may happen" is a really difficult to work with. > "I want to make sure that even a high number of blocked cpus won't cause > the interrupt latency to exceed 500us; and I want it to be basically > impossible for the interrupt latency to exceed 5ms under any > circumstances" is a concrete target someone can either demonstrate that > they meet, or aim for when trying to improve the situation. > > Feng: It should be pretty easy for you to: George, thanks a lot for you to pointing the possible way to move forward. > * Implement a modified version of Xen where > - *All* vcpus get put on the waitqueue So this means, all the vcpus are blocked, and hence waiting in the blocking list, right? > - Measure how long it took to run the loop in pi_wakeup_interrupt > * Have one VM receiving posted interrupts on a regular basis. > * Slowly increase the number of vcpus blocked on a single cpu (e.g., by > creating more guests), stopping when you either reach 500us or 500 > vcpus. :-) This may depends on the environment, I was using a 10G NIC to do the test, if we increase the number of guests, I need more NICs to get assigned to the guests, I will see if I can get them. Thanks, Feng > > To report the measurements, you could either create a Xen trace record > and use xentrace_format or xenalyze to plot the results; or you could > create some software performance counters for different "buckets" -- > less than 100us, 100-200us, 200-300us, 300-400us, 400-500us, and more > than 500us. > > Or you could printk the min / average / max every 5000 interrupts or so. :-) > > To test, it seems like using a network benchmark with short packet > lengths should be able to trigger large numbers of interrupts; and it > also can let you know if / when there's a performance impact of adding > more vcpus. > > Or alternately, you could try to come up with a quicker reverse-lookup > algorithm. :-) > > -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.