Xen project Mailing List

Re: [Xen-devel] [for-4.9] Re: HVM guest performance regression

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>

From: Juergen Gross <jgross@xxxxxxxx>

Date: Wed, 7 Jun 2017 08:55:55 +0200

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>

Delivery-date: Wed, 07 Jun 2017 06:56:13 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 06/06/17 21:08, Stefano Stabellini wrote: > On Tue, 6 Jun 2017, Juergen Gross wrote: >> On 06/06/17 18:39, Stefano Stabellini wrote: >>> On Tue, 6 Jun 2017, Juergen Gross wrote: >>>> On 26/05/17 21:01, Stefano Stabellini wrote: >>>>> On Fri, 26 May 2017, Juergen Gross wrote: >>>>>> On 26/05/17 18:19, Ian Jackson wrote: >>>>>>> Juergen Gross writes ("HVM guest performance regression"): >>>>>>>> Looking for the reason of a performance regression of HVM guests under >>>>>>>> Xen 4.7 against 4.5 I found the reason to be commit >>>>>>>> c26f92b8fce3c9df17f7ef035b54d97cbe931c7a ("libxl: remove >>>>>>>> freemem_slack") >>>>>>>> in Xen 4.6. >>>>>>>> >>>>>>>> The problem occurred when dom0 had to be ballooned down when starting >>>>>>>> the guest. The performance of some micro benchmarks dropped by about >>>>>>>> a factor of 2 with above commit. >>>>>>>> >>>>>>>> Interesting point is that the performance of the guest will depend on >>>>>>>> the amount of free memory being available at guest creation time. >>>>>>>> When there was barely enough memory available for starting the guest >>>>>>>> the performance will remain low even if memory is being freed later. >>>>>>>> >>>>>>>> I'd like to suggest we either revert the commit or have some other >>>>>>>> mechanism to try to have some reserve free memory when starting a >>>>>>>> domain. >>>>>>> >>>>>>> Oh, dear. The memory accounting swamp again. Clearly we are not >>>>>>> going to drain that swamp now, but I don't like regressions. >>>>>>> >>>>>>> I am not opposed to reverting that commit. I was a bit iffy about it >>>>>>> at the time; and according to the removal commit message, it was >>>>>>> basically removed because it was a piece of cargo cult for which we >>>>>>> had no justification in any of our records. >>>>>>> >>>>>>> Indeed I think fixing this is a candidate for 4.9. >>>>>>> >>>>>>> Do you know the mechanism by which the freemem slack helps ? I think >>>>>>> that would be a prerequisite for reverting this. That way we can have >>>>>>> an understanding of why we are doing things, rather than just >>>>>>> flailing at random... >>>>>> >>>>>> I wish I would understand it. >>>>>> >>>>>> One candidate would be 2M/1G pages being possible with enough free >>>>>> memory, but I haven't proofed this yet. I can have a try by disabling >>>>>> big pages in the hypervisor. >>>>> >>>>> Right, if I had to bet, I would put my money on superpages shattering >>>>> being the cause of the problem. >>>> >>>> Seems you would have lost your money... >>>> >>>> Meanwhile I've found a way to get the "good" performance in the micro >>>> benchmark. Unfortunately this requires to switch off the pv interfaces >>>> in the HVM guest via "xen_nopv" kernel boot parameter. >>>> >>>> I have verified that pv spinlocks are not to blame (via "xen_nopvspin" >>>> kernel boot parameter). Switching to clocksource TSC in the running >>>> system doesn't help either. >>> >>> What about xen_hvm_exit_mmap (an optimization for shadow pagetables) and >>> xen_hvm_smp_init (PV IPI)? >> >> xen_hvm_exit_mmap isn't active (kernel message telling me so was >> issued). >> >>>> Unfortunately the kernel seems no longer to be functional when I try to >>>> tweak it not to use the PVHVM enhancements. >>> >>> I guess you are not talking about regular PV drivers like netfront and >>> blkfront, right? >> >> The plan was to be able to use PV drivers without having to use PV >> callbacks and PV timers. This isn't possible right now. > > I think the code to handle that scenario was gradually removed over time > to simplify the code base. Hmm, too bad. >>>> I'm wondering now whether >>>> there have ever been any benchmarks to proof PVHVM really being faster >>>> than non-PVHVM? My findings seem to suggest there might be a huge >>>> performance gap with PVHVM. OTOH this might depend on hardware and other >>>> factors. >>>> >>>> Stefano, didn't you do the PVHVM stuff back in 2010? Do you have any >>>> data from then regarding performance figures? >>> >>> Yes, I still have these slides: >>> >>> https://www.slideshare.net/xen_com_mgr/linux-pv-on-hvm >> >> Thanks. So you measured the overall package, not the single items like >> callbacks, timers, time source? I'm asking because I start to believe >> there are some of those slower than their non-PV variants. > > There isn't much left in terms of individual optimizations: you already > tried switching clocksource and removing pv spinlocks. xen_hvm_exit_mmap > is not used. Only the following are left (you might want to double check > I haven't missed anything): > > 1) PV IPI Its a 1 vcpu guest. > 2) PV suspend/resume > 3) vector callback > 4) interrupt remapping > > 2) is not on the hot path. > I did individual measurements of 3) at some points and it was a clear win. That might depend on the hardware. Could it be newer processors are faster here? > Slide 14 shows the individual measurements of 4) I don't think this is affecting my benchmark. It is just munmap after all. > > Only 1) is left to check as far as I can tell. No IPIs should be involved. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.