[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 3 Feb 2022 10:04:07 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zODdNV7ArjOdL7lhGTdFRD4+JJR5lQe2VVceAhdOiIM=; b=OkjnSV8Fm9AAXiPjj0+TVUVYmq5yZ3OfpsorPtjZEWoaEAbUc64HaWTRPQeIrkW5thim+KuIkWQPpDdTvTPMD70LAX5X5yVNlvQwvsjTr/B+tcE1QkQ9CVjpz9CaBOUCaphvqMB8eO5I4+3oE+Nt92fJ1YsSpZq+i4TLyzcCQua5h0jUo36AqpPZcBexXkuuZBNV+RyNINzrga07mYXNjru82hHC2xlSzFwUtwLqt+qstwoMaI8kzV0PsOR3jG8qhoiS3qJUF/u2K2grYLhuS+iwrrXbULrscfD2tCxOddMq0YaHQpGptrKEOs++qbu0UYsOWDoNr+ezFspPoRGwyQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ByTOdmzmr3RJXoiYNGgYKnCy4TVXCmlIyd2IHmPHIhw8kZgh2myFA9PKixeKOzFiBq6efWwoaopDbEph1JsDfA4eEtpQhrFzYVaQp6pC9Cs06dHfUXhG50BfLfxe2UJfQrwquFqkXoqbBn2pVM3Ph9g+RqZ/2d3RLuJgk40qcmPWNOxuaadOu3ALyJWhaFubMgv8TbGzM7cUPRTovJQXI6V0klMqbh0inhyjKIDVfkTUoeaWawm2ReQfA9PWV7waoWwYnIr+4Iuk72qta92JeOpMG0nFWrLo+3+q3CYHF5JNr2C51jJb8ybGKkJZzkDtUTVNMbYJnVDLBXa0sTh0hg==
  • Authentication-results: esa3.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, Petre Pircalabu <ppircalabu@xxxxxxxxxxxxxxx>, Alexandru Isaila <aisaila@xxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 03 Feb 2022 09:04:39 +0000
  • Ironport-data: A9a23:BhW6WaBEWLvIARVW/5fkw5YqxClBgxIJ4kV8jS/XYbTApD12hmcEy TNJW2/VPq3fY2ugfNl0aYq+9xsAscfczNc1QQY4rX1jcSlH+JHPbTi7wuYcHM8wwunrFh8PA xA2M4GYRCwMo/u1Si6FatANl1ElvU2zbue6WL6s1hxZH1c+En970Uk7wobVv6Yz6TSHK1LV0 T/Ni5W31G+Ng1aY5UpNtspvADs21BjDkGtwUm4WPJinj3eH/5UhN7oNJLnZEpfNatI88thW5 Qr05OrREmvxp3/BAz4++1rxWhVirrX6ZWBihpfKMkQLb9crSiEai84G2PQghUh/2wjRx89I6 8p3vIHvQlY5JIvutcIATEwNe81+FfUuFL7vJHG+tYqYzlHccmuqyPJrZK00FdRGoKAtWzgIr KFGbmBWBvyAr7veLLaTUO5ji95lNMD2FIgepmth3XfSCvNOrZXrHf6RtIMDgWpYasZmOvT9Q OhHRj9UXRnebDpUMFFUVKMMg7L97pX4W2IB8w/EzUYt2EDPxQl4y5DxM97YfNObSMETlUGdz kre52XjCRgePdaC4TCE7n6hiOLJkS7hHokVEdWQ//9xiVyXz0QZCQEaWFW2p/W0kAi1XNc3A 1Qd5y4GvaU0skuxQbHVXRe1vXqFtR40QMdLHqsx7wTl4rrZ5UOVC3YJShZFacc6r4kmSDoyz FiLktj1Qzt1v9W9Vna15rqS6zSoNkAowXQqPHFeC1Ffup+6/d913kmnostf/LCd3uCtQS7J4 C+wpTVhjOQDl+gp+oOm4gWS696znaThQgkw7wTRe2uq6AJleYKoD7CVBUjnAeVod9jAEATY1 JQQs43Htb1VU8nR/MCYaLhVRNmUC+C53CowaLKFN70o7HyT9nGqZui8CxkudR4yYq7oldIEC XI/WD+9BrcOZBNGjoctOupd7vjGK4C6TrwJsdiPNrJzjmBZLlPvwc2XTRf4M5rRuEYti7ojH pyQbNyhC30XYYw+kmbtF7xAj+J6nX9lrY82eXwd5078uVZ5TCXNIYrpzXPUNrxphE96iFm9H ylj2zuilEwEDbyWjtj/+o8PN1EaRUXX9ritw/G7gtWre1I8cEl4Uqe56ep4J+RNwvoJ/s+Vo CDVchIImTLX2CycQS3XOy8LVV8adcslxZ7NFXZybQ/ANrlKSdvH0ZrzgLNsLOR+qbw+kaUpJ xTHEu3Zaslypv380211RbH2rZB4dQTtggSLPiG/ZyM4cYImTAvMkuIItCO2nMXXJibo58Y4v ZO60QbXHcgKSwh4VZ6EY/Oz1VKh+3ManbsqDUfPJ9BSfmTq8ZRrdHOt3qNmfZlUJEWR3Cae2 iaXHQwc+bvHrbgq/YSbnquDtYqoTbdzRxIIA2nB4L+qHiDG5W7/k5RYWeOFcGmFBmP58amvf 8tPyPT4PKFVlVpGqdMkQb1q0bg/953koLoDllZoG3DCblKKDLJ8IybZgZkT5/MVnrIA4Fm4Q EOC/NVeKI6lAsK9HQ5DPhchY8SCyeoQxmvY48MqLRio/yRw5reGDxlfZkHelCxHIbJpG4o52 uN96tUO4gmyhxd2YNaLiidYqzaFInAaCvh1s5gbBMngixYxy0EEapvZU3ek7JaKYtRKE08rP j7L2/aS2+UCnhLPIygpCHzA/etBnpBf6hlFwWgLK0mNhteY1OQ82wdc8GhvQwlYpvmdPzmf5 oS/25VJGJiz
  • Ironport-hdrordr: A9a23:H3orz6spGEnIXY2C++NSwdgi7skC7IMji2hC6mlwRA09TyXGra 6TdaUguiMc1gx8ZJhBo7C90KnpewK7yXdQ2/htAV7EZnibhILIFvAZ0WKG+Vzd8kLFh4tgPM tbAsxD4ZjLfCdHZKXBkXmF+rQbsaG6GcmT7I+0pRodLnAJV0gj1XYDNu/yKDwGeOAsP+tBKH Pz3Lshm9L2Ek5nEPhTS0N1FdTrlpnurtbLcBQGDxko5E2nii6p0qfzF1y90g0FWz1C7L8++S yd+jaJqpmLgrWe8FvxxmXT55NZlJ/IzcZCPtWFjowwJi/3ggilSYx9U/mpvSwzosuo9FE2+e O87isIDoBW0Tf8b2u1qRzi103J1ysv0WbrzRuijX7qsaXCNXsHIvsEobgcXgrS6kImst05+r lMxXilu51eCg6FtDjh5vDTPisa13ackD4Hq6o+nnZfWYwRZPt6tooE5n5YF58GAWbT9J0nKu 9zF8vRjcwmP29yV0qp/VWH/ebcHEjaRny9Mw0/U42uondrdUlCvgslLJd1pAZFyHo/I6M0kd gsfJ4Y042mdfVmH56VMt1xNvdfOla9Mi4kD1jiVGgPNJt3c04l+KSHq4nc2omRCeg1Jd0J6d L8bG8=
  • Ironport-sdr: G0tugr9LC+vytqwYYIeJd3BzB1tQ1CFPDfPyixwhttoEwCQIQKyiU37FxW/80jNfciWUK70D5C Q8GLzkr5TD0xNqtJRH5p3/IEbHvsfBwQ7s8eGeV3gtF4OI5hBkD6y3GtrzSbjP2CesN0T6Q8ll s8U7cfcEyoFg/KjkUbd+6qKAmgzYoNxzHds9N9reQBZtVJ98w+TPE025wu5eCBCpq9Ti20sLKj z7ZOAPzHTuI511oXc95kRJ1sHOM+BsmITinr+h5J60UXpeDczqlJHlVKkrHSs8lM7eH5uKPYtj Zf6jEbn4rvjkN6TrAV3e67ie
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote:
> On 02.02.2022 17:13, Roger Pau Monné wrote:
> > On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote:
> >> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d,
> >>  
> >>      ASSERT( pod_target >= p2m->pod.count );
> >>  
> >> -    ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/);
> >> +    if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
> > 
> > Is it possible to have cache flush allowed without any PCI device
> > assigned? AFAICT the iomem/ioport_caps would only get setup when there
> > are device passed through?
> 
> One can assign MMIO or ports to a guest the raw way. That's not
> secure, but functionally explicitly permitted.
> 
> > TBH I would be fine if we just say that PoD cannot be used in
> > conjunction with an IOMMU, and just check for is_iommu_enable(d) here.
> > 
> > I understand it's technically possible for PoD to be used together
> > with a domain that will later get a device passed through once PoD is
> > no longer in use, but I doubt there's much value in supporting that
> > use case, and I fear we might be introducing corner cases that could
> > create issues in the future. Overall I think it would be safer to just
> > disable PoD in conjunction with an IOMMU.
> 
> I consider it wrong to put in place such a restriction, but I could
> perhaps accept you and Andrew thinking this way if this was the only
> aspect playing into here. However, this would then want an equivalent
> tools side check, and while hunting down where to make the change as
> done here, I wasn't able to figure out where that alternative
> adjustment would need doing. Hence I would possibly(!) buy into this
> only if someone else took care of doing so properly in the tool stack
> (including the emission of a sensible error message).

What about the (completely untested) chunk below:

diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index d7a40d7550..e585ef4c5c 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
     pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) &&
         (d_config->b_info.target_memkb < d_config->b_info.max_memkb);
 
-    /* We cannot have PoD and PCI device assignment at the same time
+    /* We cannot have PoD and an active IOMMU at the same time
      * for HVM guest. It was reported that IOMMU cannot work with PoD
      * enabled because it needs to populated entire page table for
-     * guest. To stay on the safe side, we disable PCI device
-     * assignment when PoD is enabled.
+     * guest.
      */
     if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
-        d_config->num_pcidevs && pod_enabled) {
+        d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED &&
+        pod_enabled) {
         ret = ERROR_INVAL;
-        LOGD(ERROR, domid,
-             "PCI device assignment for HVM guest failed due to PoD enabled");
+        LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD");
         goto error_out;
     }
 


> Finally this still leaves out the "raw MMIO / ports" case mentioned
> above.

But the raw MMIO 'mode' doesn't care much about PoD, because if
there's no PCI device assigned there's no IOMMU setup, and thus such
raw MMIO regions (could?) belong to a device that's not constrained by
the guest p2m anyway?

> >> --- a/xen/common/vm_event.c
> >> +++ b/xen/common/vm_event.c
> >> @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st
> >>  
> >>              rc = -EXDEV;
> >>              /* Disallow paging in a PoD guest */
> >> -            if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) )
> >> +            if ( p2m_pod_active(d) )
> > 
> > Isn't it fine to just check for entry_count like you suggest in the
> > change to libxl?
> 
> I didn't think it would be, but I'm not entirely sure: If paging was
> enabled before a guest actually starts, it wouldn't have any entries
> but still be a PoD guest if it has a non-empty cache. The VM event
> folks may be able to clarify this either way. But ...
> 
> > This is what p2m_pod_entry_count actually does (rather than entry_count | 
> > count).
> 
> ... you really mean "did" here, as I'm removing p2m_pod_entry_count()
> in this patch. Of course locking could be added to it instead (or
> p2m_pod_get_mem_target() be used in its place), but I'd prefer if we
> could go with just the check which precisely matches what the comment
> says (IOW otherwise I'd need to additionally know what exactly the
> comment is to say).

Could you briefly mention this in the commit message? Ie: VM event
code is also adjusted to make sure PoD is not in use and cannot be
used during the guest lifetime?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.