[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RMRRs and Phantom Functions


  • To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 27 Apr 2022 12:18:35 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DsC2Spo7snQqRCQysI7Uh60F8HhrNIZf0ocSbWN6/1Q=; b=WoOFDc+mIh4LuiwixKnCkwzlDU9uPnHLV/tl8jUVTXveSVdZNXzjzhgHEcQGonXD9FaiiNp7ijrEsOFQuPGIP+VNl2av7u9BM0bkVdYZV0oi7tG9T9AcIpiUMsRsVahd3Q+aGcFhLLBZ2qsTNSNAkLeFxg+kQINog4vdkR9qxsAqR+i5Q/MXtVwcZuRUB20sXXmVQj0OX0f5WqIZ01t7FFuYdk/3xXkUQMSrVsTnLihgylTLnz3OLMB4srgIRUy9iOaFbVoQ9wA815cEX9ZIF1EZRhKJyDOufNmga4Jole7j+5LS+/UPYKmNv84IQPrJlmegApz1pjYKNNLxTWGnTw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FrgHbd2+ZqcvuxOVgcgUqMi7zaAW6wX2LXklgpkovc3NWkUhKqEO3DD7FtgMFAyL8hpQJvybeXur8lD9GWoC2y90d0OHv6fQ+l14thWC9tiKNTqkughAXJhOl/R9IscGGkCOchLfFqi4TK/wx3Xh/kIMn5WFeJ4+JItKdZouYsbZrQjbA4K+NzJAENij5O4UDJOHESxmMbjxrKWISGGJKLEohmK2SAdog7Crvgwz7+vQzoexTU5y5I6xYqkem8hct/tZaLb813McSPG+yQBE0/iu4e7Xhj/bIagNWBvSDTArt6JI92Pb52uv/1UNcEnIThnHFxHDzQDuV6uLX4MSFw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Jan Beulich <jbeulich@xxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Edwin Torok <edvin.torok@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 27 Apr 2022 10:18:48 +0000
  • Ironport-data: A9a23:EWxbOqpDieoFwz/7TZCuq7qWAaFeBmJmZBIvgKrLsJaIsI4StFCzt garIBmCbvaJZWfwLtB0YI+0phkH6JPXmoUxTFFs+XphH3tG8JuZCYyVIHmrMnLJJKUvbq7GA +byyDXkBJppJpMJjk71atANlVEliefQAOCU5NfsYkidfyc9IMsaoU8lyrZRbrJA24DjWVvR4 42q+KUzBXf+s9JKGjNMg068gEsHUMTa4Fv0aXRnOJinFHeH/5UkJMp3yZOZdhMUcaENdgKOf M7RzanRw4/s10xF5uVJMFrMWhZirrb6ZWBig5fNMkSoqkAqSicais7XOBeAAKv+Zvrgc91Zk b1wWZKMpQgBGrDVgN1acB5jCGJHMa1B2eTsDyP8mJnGp6HGWyOEL/RGKmgTZNRd0cEuRGZE+ LofNSwHaQ2Fi6Su2rWnR+Jwh8Mlas72IIcYvXImxjbcZRokacmbH+OWupkFgnFp2Zgm8fX2P qL1bRJ1axvNeVtXM0o/A5Mihua4wHL4dlW0rXrK/fJrszaLlmSd1pCzPMCMR/7XRPwSxBvHr GiY4GrSXAAjYYn3JT2ttyjEavX0tSHxVZ8WFba43uV3m1DVzWsWYDUVWEW6p7+li0e4c9NZN 0EQvCEpqMAa5EGtC9XwQRC8iHqFpQIHHcpdFfUg7wOAwbaS5ByWblXoVRZEYd0i8cQxHDoj0 wbQm8uzXGM39rqIVXia67GY6yuoPjQYJnMDYilCShYZ597ko8c4iRenostfLZNZR+bdQVnYq w1mZgBk71nPpabnD5mGwG0=
  • Ironport-hdrordr: A9a23:UD9us62czgqiqFKQjtgD5gqjBTtyeYIsimQD101hICG9Lfb0qy n+pp4mPEHP4wr5OEtOpTlPAtjkfZr5z+8M3WB3B8bYYOCGghrQEGgG1+ffKlLbexEWmtQttp uINpIOcuEYbmIK8voSgjPIdOrIqePvmM7IuQ6d9QYKcegDUdAd0+4TMHf+LqQZfnglOXJvf6 Dsm/av6gDQMUj+Ka+Adwo4dtmGg+eOuIPtYBYACRJiwA6SjQmw4Lq/NxSDxB8RXx5G3L9nqA H+4kbEz5Tml8v+5g7X1mfV4ZgTsNz9yuFbDMjJrsQOMD3jhiuheYwkcbyfuzIepv2p9T8R4Z LxiiZlG/42x2Laf2mzrxeo8w780Aw243un8lOciWuLm72PeBsKT+56wa5JeBrQ7EQt+Ptm1r hQ4m6fv51LSTvdgSXU/bHzJl9Xv3vxhUBnvf8YjnRZX4dbQqRWt5Yj8ERcF4pFND7m6bogDP JlAKjnlblrmGuhHjDkV1RUsZ+RtixZJGbFfqFCgL3Y79FupgE586NCr/Zv20vp9/oGOu15Dq r/Q+BVfYp1P74rhJJGdZk8qPSMexzwqDL3QRSvyAfcZeg600ykke+E3JwFoMeXRbcv8Lwe3L z8bXIwjx9GR6upM7zC4KF2
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Apr 27, 2022 at 10:05:54AM +0000, Andrew Cooper wrote:
> On 27/04/2022 07:59, Jan Beulich wrote:
> > On 26.04.2022 19:51, Andrew Cooper wrote:
> >> Hello,
> >>
> >> Edvin has found a machine with some very weird properties.  It is an HP
> >> ProLiant BL460c Gen8 with:
> >>
> >>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
> >>              +-01.0-[11]--
> >>              +-01.1-[02]--
> >>              +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
> >> (be3)
> >>              |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
> >> (be3)
> >>              |            +-00.2  Emulex Corporation OneConnect 10Gb
> >> iSCSI Initiator (be3)
> >>              |            \-00.3  Emulex Corporation OneConnect 10Gb
> >> iSCSI Initiator (be3)
> >>
> >> yet all 4 other functions on the device periodically hit IOMMU faults
> >> (~once every 5 mins, so definitely stats).
> >>
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
> >> bdf80000
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
> >> bdf80000
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
> >> bdf80000
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
> >> bdf80000
> >>
> >> There are several RMRRs covering the these devices, with:
> >>
> >> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> >> (XEN) [VT-D] endpoint: 0000:03:00.0
> >> (XEN) [VT-D] endpoint: 0000:01:00.0
> >> (XEN) [VT-D] endpoint: 0000:01:00.2
> >> (XEN) [VT-D] endpoint: 0000:04:00.0
> >> (XEN) [VT-D] endpoint: 0000:04:00.1
> >> (XEN) [VT-D] endpoint: 0000:04:00.2
> >> (XEN) [VT-D] endpoint: 0000:04:00.3
> >> (XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr bdf92fff
> >>
> >> being the one relevant to these faults.  I've not manually decoded the
> >> DMAR table because device paths are horrible to follow but there are at
> >> least the correct number of endpoints.  The functions all have SR-IOV
> >> (disabled) and ARI (enabled).  None have any Phantom functions described.
> >>
> >> Specifying pci-phantom=04:00,1 does appear to work around the faults,
> >> but it's not right, because functions 1 thru 3 aren't actually phantom.
> > Indeed, and I think you really mean "pci-phantom=04:00,4".
> 
> As a quick tangent, the cmdline docs for pci-phantom= are in desperate
> need of an example and a description of how stride works.  I've got some
> ideas and notes jotted down.
> 
> Do we really mean ,4 here?  What happens for function 1?
> 
> > I guess we
> > should actually refuse "pci-phantom=04:00,1" in a case like this one.
> > The problem is that at the point we set pdev->phantom_stride we may
> > not know of the other devices, yet. But I guess we could attempt a
> > config space read of the supposed phantom function's device/vendor
> > and do <whatever> if these aren't both 0xffff.
> 
> At a minimum, we ought to warn when it looks like something is wonky,
> but I wouldn't go as far as rejecting.
> 
> All of these options to work around firmware/system screwups are applied
> to an already-non-working system, and there is absolutely no guarantee
> that necessary fixes make any kind of logical sense.

AFAICT with stride = 1 Xen will treat functions 1-7 as phantom
functions depending from function 0, which means the pdev struct won't
get updated when those phantom functions are assigned to a domain as
part of assigning function 0.  That would imply that functions 1 to 3
will be considered phantom but would also have a matching pdev that
allows them to be independently assigned to a domain, nothing good
will came out of it.

I agree with Jan that we need to explicitly reject strides that cover
functions that would otherwise be considered devices (ie: have valid
config space entries).  Or alternatively we need to remove the pdevs
for those functions now considered phantom.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.