[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Ongoing/future speculative mitigation work

  • To: Tamas K Lengyel <tamas.k.lengyel@xxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 25 Oct 2018 19:39:52 +0100
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: mpohlack@xxxxxxxxx, Julien Grall <julien.grall@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, joao.m.martins@xxxxxxxxxx, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Daniel Kiper <daniel.kiper@xxxxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, aliguori@xxxxxxxxxx, uwed@xxxxxxxxx, Lars Kurth <lars.kurth@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, ross.philipson@xxxxxxxxxx, George Dunlap <george.dunlap@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Matt Wilson <msw@xxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, JGross@xxxxxxxx, sergey.dyasli@xxxxxxxxxx, Wei Liu <wei.liu2@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>, mdontu <mdontu@xxxxxxxxxxxxxxx>, dwmw@xxxxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Thu, 25 Oct 2018 18:40:11 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 25/10/18 19:35, Tamas K Lengyel wrote:
> On Thu, Oct 25, 2018 at 12:13 PM Andrew Cooper
> <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 25/10/18 18:58, Tamas K Lengyel wrote:
>>> On Thu, Oct 25, 2018 at 11:43 AM Andrew Cooper
>>> <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> On 25/10/18 18:35, Tamas K Lengyel wrote:
>>>>> On Thu, Oct 25, 2018 at 11:02 AM George Dunlap <george.dunlap@xxxxxxxxxx> 
>>>>> wrote:
>>>>>> On 10/25/2018 05:55 PM, Andrew Cooper wrote:
>>>>>>> On 24/10/18 16:24, Tamas K Lengyel wrote:
>>>>>>>>> A solution to this issue was proposed, whereby Xen synchronises 
>>>>>>>>> siblings
>>>>>>>>> on vmexit/entry, so we are never executing code in two different
>>>>>>>>> privilege levels.  Getting this working would make it safe to continue
>>>>>>>>> using hyperthreading even in the presence of L1TF.  Obviously, its 
>>>>>>>>> going
>>>>>>>>> to come in perf hit, but compared to disabling hyperthreading, all its
>>>>>>>>> got to do is beat a 60% perf hit to make it the preferable option for
>>>>>>>>> making your system L1TF-proof.
>>>>>>>> Could you shed some light what tests were done where that 60%
>>>>>>>> performance hit was observed? We have performed intensive stress-tests
>>>>>>>> to confirm this but according to our findings turning off
>>>>>>>> hyper-threading is actually improving performance on all machines we
>>>>>>>> tested thus far.
>>>>>>> Aggregate inter and intra host disk and network throughput, which is a
>>>>>>> reasonable approximation of a load of webserver VM's on a single
>>>>>>> physical server.  Small packet IO was hit worst, as it has a very high
>>>>>>> vcpu context switch rate between dom0 and domU.  Disabling HT means you
>>>>>>> have half the number of logical cores to schedule on, which doubles the
>>>>>>> mean time to next timeslice.
>>>>>>> In principle, for a fully optimised workload, HT gets you ~30% extra due
>>>>>>> to increased utilisation of the pipeline functional units.  Some
>>>>>>> resources are statically partitioned, while some are competitively
>>>>>>> shared, and its now been well proven that actions on one thread can have
>>>>>>> a large effect on others.
>>>>>>> Two arbitrary vcpus are not an optimised workload.  If the perf
>>>>>>> improvement you get from not competing in the pipeline is greater than
>>>>>>> the perf loss from Xen's reduced capability to schedule, then disabling
>>>>>>> HT would be an improvement.  I can certainly believe that this might be
>>>>>>> the case for Qubes style workloads where you are probably not very
>>>>>>> overprovisioned, and you probably don't have long running IO and CPU
>>>>>>> bound tasks in the VMs.
>>>>>> As another data point, I think it was MSCI who said they always disabled
>>>>>> hyperthreading, because they also found that their workloads ran slower
>>>>>> with HT than without.  Presumably they were doing massive number
>>>>>> crunching, such that each thread was waiting on the ALU a significant
>>>>>> portion of the time anyway; at which point the superscalar scheduling
>>>>>> and/or reduction in cache efficiency would have brought performance from
>>>>>> "no benefit" down to "negative benefit".
>>>>> Thanks for the insights. Indeed, we are primarily concerned with
>>>>> performance of Qubes-style workloads which may range from
>>>>> no-oversubscription to heavily oversubscribed. It's not a workload we
>>>>> can predict or optimize before-hand, so we are looking for a default
>>>>> that would be 1) safe and 2) performant in the most general case
>>>>> possible.
>>>> So long as you've got the XSA-273 patches, you should be able to park
>>>> and re-reactivate hyperthreads using `xen-hptool cpu-{online,offline} 
>>>> $CPU`.
>>>> You should be able to effectively change hyperthreading configuration at
>>>> runtime.  It's not quite the same as changing it in the BIOS, but from a
>>>> competition of pipeline resources, it should be good enough.
>>> Thanks, indeed that is a handy tool to have. We often can't disable
>>> hyperthreading in the BIOS anyway because most BIOS' don't allow you
>>> to do that when TXT is used.
>> Hmm - that's an odd restriction.  I don't immediately see why such a
>> restriction would be necessary.
>>> That said, with this tool we still
>>> require some way to determine when to do parking/reactivation of
>>> hyperthreads. We could certainly park hyperthreads when we see the
>>> system is being oversubscribed in terms of number of vCPUs being
>>> active, but for real optimization we would have to understand the
>>> workloads running within the VMs if I understand correctly?
>> TBH, I'd perhaps start with an admin control which lets them switch
>> between the two modes, and some instructions on how/why they might want
>> to try switching.
>> Trying to second-guess the best HT setting automatically is most likely
>> going to be a lost cause.  It will be system specific as to whether the
>> same workload is better with or without HT.
> This may just not be practically possible at the end as the system
> administrator may have no idea what workload will be running on any
> given system. It may also vary between one user to the next on the
> same system, without the users being allowed to tune such details of
> the system. If we can show that with core-scheduling deployed for most
> workloads performance is improved by x % it may be a safe option. But
> if every system needs to be tuned and evaluated in terms of its
> eventual workload, that task becomes problematic. I appreciate the
> insights though!

To a first approximation, a superuser knob of "switch between single and
dual threaded mode" can be used by people to experiment as to which is
faster overall.

If it really is the case that disabling HT makes things faster, then
you've suddenly gained (almost-)core scheduling "for free" alongside
that perf improvement.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.