[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/AMD: correct certain Fam17 checks


  • To: Jan Beulich <JBeulich@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Fri, 22 Mar 2019 19:56:23 +0000
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= mQINBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABtClBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPokCOgQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86LkCDQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAYkC HwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, Brian Woods <brian.woods@xxxxxxx>, Pu Wen <puwen@xxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • Delivery-date: Fri, 22 Mar 2019 19:56:35 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 19/03/2019 09:16, Jan Beulich wrote:
> Commit 3157bb4e13 ("Add MSR support for various feature AMD processor
> families") converted certain checks for Fam11 to include families all
> the way up to Fam17. The commit having no description, it is hard to
> tell whether this was a mechanical dec->hex conversion mistake, or
> indeed intended. In any event the NB_CFG handling needs to be restricted
> to Fam16 and below: Fam17 doesn't have such an MSR anymore.
>
> A non-MMCFG extended config space access mechanism still appears to
> exist, but code to deal with it will need to be written down the road,
> when it can actually be tested.
>
> Reported-by: Pu Wen <puwen@xxxxxxxx>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

Having looked through various spec documents, this is a chronic mess.

First, to NB_CTL MSR itself.  It certainly is documented in Fam15, and
is absence in the documentation of Fam17.

In Fam15, it is explicitly documented as an alias of
00:18.3[NB_CFG_LOW/HIGH] which are registers at offset 0x88 and 0x8c in
config space.

Fam17 documents that the extended cfc/cf8 mechanism does still exist,
and the new controls for that found in 00:18.4[CoreMasterAccessCtrl]
with a different bit layout.

Experimentation on a Rome system indicates that NB_CTL is fully
read0/write discard, so this patch is probably an improvement.

Therefore, in principle, Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

However, the actual code touched is completely insane.

HVM guests have it automatically read0/write discard, even for Intel
(citing cross vendor migration).

PV guest handling is complicated.  For CPUs without the MSR, it is read
#GP, write discard.  For CPUs which do have the MSR, it is read0/write
discard for domU or nonpinned dom0, which is 100% of usecases.  The PV
vs HVM differences cause an asymetry for the hardware domain.

A pinned PV dom0 is permitted to change just the IO_ECS bit, eliciting a
warning but no #GP for modifying other bits.  As NB_CTL is a per-node
control (not a per-core control), unless dom0 vcpus == host pcpus, this
creates an asymmetry across the system as to whether IO_ECS is enabled
or not.

The HVM IOREQ path, and PV cfg_ok() path, when seeing an IO_ECS access
from the guest, reads the MSR every time, which is results in behaviour
which doesn't match the settings a guest can see, and comes with a
massive perf hit.  It also means the behaviour of the guest IO depends
on which node it is currently scheduled on.


Moving on to Linux...

Linux tries to enable IO_ECS on all AMD and HYGON CPUs >= Fam10.

One path tries to enable things using PCI config space, and the PCI
device list includes Fam17h and HYGON northbridges, but the code only
operates on the Fam15h information, meaning that it clobbers what
appears to be a reserved register on Fam17/Hygon hardware.

Linux also uses NB_CTL in all circumstances, but given hardware's
read0/write discard behaviour, Linux fails to notice that it doesn't
enable IO_ECS at all.

Nothing ever reads back settings to check whether IO_ECS has been
correctly enabled, which means that it really isn't enabled on
Fam17h/Hygon, and really isn't when running under Xen, but Linux is
under the expectation that it is enabled.

In practice, this means that the use of raw_pci_ext_ops between being
set up in pci_direct_init() and possibly overridden in pci_mmcfg_init().


Back to Xen...

The IO_ECS setting should be chosen once at boot, made consistent across
the entire system, and never touched at runtime.

In all cases for guests, we can offer MMCFG even on a system which
doesn't have IO_ECS, and they will prefer that.  The behaviour of the
extra 4 bits is reserved, so we could have IO_ECS working in practice
with no signal.  However, we could equally drop IO_ECS entirely.  Guests
can't find its setting to begin with, so can't reliably use it, and also
wouldn't in the presence of MMCFG anyway.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.