[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Ongoing/future speculative mitigation work

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
From: Tamas K Lengyel <tamas.k.lengyel@xxxxxxxxx>
Date: Thu, 25 Oct 2018 12:35:51 -0600
Cc: mpohlack@xxxxxxxxx, Julien Grall <julien.grall@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, joao.m.martins@xxxxxxxxxx, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Daniel Kiper <daniel.kiper@xxxxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, aliguori@xxxxxxxxxx, uwed@xxxxxxxxx, Lars Kurth <lars.kurth@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, ross.philipson@xxxxxxxxxx, George Dunlap <george.dunlap@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Matt Wilson <msw@xxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, JGross@xxxxxxxx, sergey.dyasli@xxxxxxxxxx, Wei Liu <wei.liu2@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>, mdontu <mdontu@xxxxxxxxxxxxxxx>, dwmw@xxxxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>
Delivery-date: Thu, 25 Oct 2018 18:36:35 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Oct 25, 2018 at 12:13 PM Andrew Cooper
<andrew.cooper3@xxxxxxxxxx> wrote:
>
> On 25/10/18 18:58, Tamas K Lengyel wrote:
> > On Thu, Oct 25, 2018 at 11:43 AM Andrew Cooper
> > <andrew.cooper3@xxxxxxxxxx> wrote:
> >> On 25/10/18 18:35, Tamas K Lengyel wrote:
> >>> On Thu, Oct 25, 2018 at 11:02 AM George Dunlap <george.dunlap@xxxxxxxxxx> 
> >>> wrote:
> >>>> On 10/25/2018 05:55 PM, Andrew Cooper wrote:
> >>>>> On 24/10/18 16:24, Tamas K Lengyel wrote:
> >>>>>>> A solution to this issue was proposed, whereby Xen synchronises 
> >>>>>>> siblings
> >>>>>>> on vmexit/entry, so we are never executing code in two different
> >>>>>>> privilege levels.  Getting this working would make it safe to continue
> >>>>>>> using hyperthreading even in the presence of L1TF.  Obviously, its 
> >>>>>>> going
> >>>>>>> to come in perf hit, but compared to disabling hyperthreading, all its
> >>>>>>> got to do is beat a 60% perf hit to make it the preferable option for
> >>>>>>> making your system L1TF-proof.
> >>>>>> Could you shed some light what tests were done where that 60%
> >>>>>> performance hit was observed? We have performed intensive stress-tests
> >>>>>> to confirm this but according to our findings turning off
> >>>>>> hyper-threading is actually improving performance on all machines we
> >>>>>> tested thus far.
> >>>>> Aggregate inter and intra host disk and network throughput, which is a
> >>>>> reasonable approximation of a load of webserver VM's on a single
> >>>>> physical server.  Small packet IO was hit worst, as it has a very high
> >>>>> vcpu context switch rate between dom0 and domU.  Disabling HT means you
> >>>>> have half the number of logical cores to schedule on, which doubles the
> >>>>> mean time to next timeslice.
> >>>>>
> >>>>> In principle, for a fully optimised workload, HT gets you ~30% extra due
> >>>>> to increased utilisation of the pipeline functional units.  Some
> >>>>> resources are statically partitioned, while some are competitively
> >>>>> shared, and its now been well proven that actions on one thread can have
> >>>>> a large effect on others.
> >>>>>
> >>>>> Two arbitrary vcpus are not an optimised workload.  If the perf
> >>>>> improvement you get from not competing in the pipeline is greater than
> >>>>> the perf loss from Xen's reduced capability to schedule, then disabling
> >>>>> HT would be an improvement.  I can certainly believe that this might be
> >>>>> the case for Qubes style workloads where you are probably not very
> >>>>> overprovisioned, and you probably don't have long running IO and CPU
> >>>>> bound tasks in the VMs.
> >>>> As another data point, I think it was MSCI who said they always disabled
> >>>> hyperthreading, because they also found that their workloads ran slower
> >>>> with HT than without.  Presumably they were doing massive number
> >>>> crunching, such that each thread was waiting on the ALU a significant
> >>>> portion of the time anyway; at which point the superscalar scheduling
> >>>> and/or reduction in cache efficiency would have brought performance from
> >>>> "no benefit" down to "negative benefit".
> >>>>
> >>> Thanks for the insights. Indeed, we are primarily concerned with
> >>> performance of Qubes-style workloads which may range from
> >>> no-oversubscription to heavily oversubscribed. It's not a workload we
> >>> can predict or optimize before-hand, so we are looking for a default
> >>> that would be 1) safe and 2) performant in the most general case
> >>> possible.
> >> So long as you've got the XSA-273 patches, you should be able to park
> >> and re-reactivate hyperthreads using `xen-hptool cpu-{online,offline} 
> >> $CPU`.
> >>
> >> You should be able to effectively change hyperthreading configuration at
> >> runtime.  It's not quite the same as changing it in the BIOS, but from a
> >> competition of pipeline resources, it should be good enough.
> >>
> > Thanks, indeed that is a handy tool to have. We often can't disable
> > hyperthreading in the BIOS anyway because most BIOS' don't allow you
> > to do that when TXT is used.
>
> Hmm - that's an odd restriction.  I don't immediately see why such a
> restriction would be necessary.
>
> > That said, with this tool we still
> > require some way to determine when to do parking/reactivation of
> > hyperthreads. We could certainly park hyperthreads when we see the
> > system is being oversubscribed in terms of number of vCPUs being
> > active, but for real optimization we would have to understand the
> > workloads running within the VMs if I understand correctly?
>
> TBH, I'd perhaps start with an admin control which lets them switch
> between the two modes, and some instructions on how/why they might want
> to try switching.
>
> Trying to second-guess the best HT setting automatically is most likely
> going to be a lost cause.  It will be system specific as to whether the
> same workload is better with or without HT.

This may just not be practically possible at the end as the system
administrator may have no idea what workload will be running on any
given system. It may also vary between one user to the next on the
same system, without the users being allowed to tune such details of
the system. If we can show that with core-scheduling deployed for most
workloads performance is improved by x % it may be a safe option. But
if every system needs to be tuned and evaluated in terms of its
eventual workload, that task becomes problematic. I appreciate the
insights though!

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] Ongoing/future speculative mitigation work
  - From: Dario Faggioli
- Re: [Xen-devel] Ongoing/future speculative mitigation work
  - From: Andrew Cooper

References:
- [Xen-devel] Ongoing/future speculative mitigation work
  - From: Andrew Cooper
- Re: [Xen-devel] Ongoing/future speculative mitigation work
  - From: Tamas K Lengyel
- Re: [Xen-devel] Ongoing/future speculative mitigation work
  - From: Andrew Cooper
- Re: [Xen-devel] Ongoing/future speculative mitigation work
  - From: George Dunlap
- Re: [Xen-devel] Ongoing/future speculative mitigation work
  - From: Tamas K Lengyel
- Re: [Xen-devel] Ongoing/future speculative mitigation work
  - From: Andrew Cooper
- Re: [Xen-devel] Ongoing/future speculative mitigation work
  - From: Tamas K Lengyel
- Re: [Xen-devel] Ongoing/future speculative mitigation work
  - From: Andrew Cooper

Prev by Date: Re: [Xen-devel] [PATCH v3 01/34] x86emul: support AVX512 opmask insns
Next by Date: Re: [Xen-devel] [PATCH v3 02/34] x86/HVM: grow MMIO cache data size to 64 bytes
Previous by thread: Re: [Xen-devel] Ongoing/future speculative mitigation work
Next by thread: Re: [Xen-devel] Ongoing/future speculative mitigation work
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.