[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 3 Feb 2022 10:52:34 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5KEUghfDt3DRQtQA4I7P0DPj+90F4WVJn336jZSh78M=; b=aa9GzABhMCFIMLD3O4O45m+49DNDkPHtM5DMSvwJ5R6tFl2AbhWKsA9/z8ZAjxbxd1EBbgZpOhGxJcn95bI+iFn6v9NbhwZdDbQeGVGsLrbAqz6URdYLBHpVFHkcNy8pOPkjLG/XP4eVeyBxIljeL2pOv/acEG1uo1unpGfxhbGKxg8fjmvLwN+kAwYfm764dQGR9mFpXU1YCEEDNqEJS+MhxcOEWj6EF+rT3tsL4ObpaNSFAZ90URmiQvsub33PEyCjsMOx6q9PhJEibiHgH7bzgRLo3y7DYQthcL5ceBqa0d/bmYMn8bX7tH1jWv+TOV9ZxGi/bbRwkS3kCXav5g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bJMNm3FkGELxjxDA0n7CmsbOa/iG/OmDjvU5BMjizwuNmPT4EUHY8XGLvFNSlj7lWtlkAbNeERq5p5KzhuCdOI+aTQHyx0sJXtlTyWsefaPILfVGeKlTxougkWGUjMZmEyUTghYQ+grXHnqOwEtcWW58MftuZXboQsTSwvedN2tx6zLdobFQwkYilhkYhOf82RohDatC3JTbTvc6ghyCL0rGS4IjoZCAMNLRsfiDURDih+FG4kcTKWx/3HwQlkVAcXWczhFmwC3p05WsrecWnbsVdn6dPkYFNzxEIYKmztfWwyNAsUYwz5UiXjN2iowkHgmmZPyyjYKhivFTLaFFiQ==
  • Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, Petre Pircalabu <ppircalabu@xxxxxxxxxxxxxxx>, Alexandru Isaila <aisaila@xxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 03 Feb 2022 09:52:52 +0000
  • Ironport-data: A9a23:DlKHEKDJNNXaXRVW/5fkw5YqxClBgxIJ4kV8jS/XYbTApD1w3zIAx jRMC2zXPvqNYzTxet9+boTi8EwPsZbQn9JqQQY4rX1jcSlH+JHPbTi7wuYcHM8wwunrFh8PA xA2M4GYRCwMo/u1Si6FatANl1ElvU2zbue6WL6s1hxZH1c+En970Uk7wobVv6Yz6TSHK1LV0 T/Ni5W31G+Ng1aY5UpNtspvADs21BjDkGtwUm4WPJinj3eH/5UhN7oNJLnZEpfNatI88thW5 Qr05OrREmvxp3/BAz4++1rxWhVirrX6ZWBihpfKMkQLb9crSiEai84G2PQghUh/1g2Mltdhl 9l3qrPhRCgbFaLRvsA3ekwNe81+FfUuFL7vJHG+tYqYzlHccmuqyPJrZK00FdRGoKAtWzgIr KFGbmBWBvyAr7veLLaTUO5ji95lNMD2FIgepmth3XfSCvNOrZXrHf+auI8Fgm1YasZmDf3CR /glcRdTd079bR9ICgw3MMk9g7L97pX4W2IB8w/EzUYt2EDPxQl4y5DxM97YfNObSMETlUGdz kre52XjCRgePdaC4TCE7n6hiOLJkS7hHokVEdWQ//9xiVyXz0QZCQEaWFW2p/W0kAi1XNc3A 1Qd5y4GvaU0skuxQbHVXRe1vXqFtR40QMdLHqsx7wTl4rrZ5UOVC3YJShZFacc6r4kmSDoyz FiLktj1Qzt1v9W9Vna15rqS6zSoNkAowXQqPHFeC1Ffup+6/d913kmnostf/LCd0+XwEDbgn DywswdvipohpsEW8YGR1AWS696znaThQgkw7wTRe2uq6AJleYKoD7CVBUjnAeVod9jAEATY1 JQQs43Htb1VU8nR/MCYaLhVRNmUC+C53CowaLKFN70o7HyT9nGqZui8CxkudR4yYq7oldIEC XI/WD+9BrcOZBNGjoctOupd7vjGK4C6TrwJsdiPNrJzjmBZLlPvwc2XTRf4M5rRuEYti7ojH pyQbNyhC30XYYw+kmbtF7xAj+J6nX9lrY82eXwd5078uVZ5TCXNIYrpzXPUNrxphE96iFm9H ylj2zuilEwEDbyWjtj/+o8PN1EaRUXX9ritw/G7gtWre1I8cEl4Uqe56ep4J+RNwvoJ/s+Vo CDVchIImTLX2CycQS3XOy8LVV8adcslxZ7NFXZybQ/ANrlKSdvH0ZrzgLNsLOR+qbw+kaUpJ xTHEu3Zaslypv380211RbH2rZB4dQTtggSLPiG/ZyM4cYImTAvMkuIItCO2nMXXJibo58Y4v ZO60QbXHcgKSwh4VZ6EY/Oz1VKh+3ManbsqDUfPJ9BSfmTq8ZRrdHOt3qNmfZlUJEWR3Cae2 iaXHQwc+bvHrbgq/YSbnquDtYqoTbdzRxIIA2nB4L+qHiDG5W7/k5RYWeOFcGmFBmP58amvf 8tPyPT4PKFVlVpGqdMkQb1q0bg/953koLoDllZoG3DCblKKDLJ8IybZgZkT5/MVnrIA4Fm4Q EOC/NVeKI6lAsK9HQ5DPhchY8SCyeoQxmvY48MqLRio/yRw5reGDxlfZkHelCxHIbJpG4o52 uN96tUO4gmyhxd2YNaLiidYqzaFInAaCvh1s5gbBMngixYxy0EEapvZU3ek7JaKYtRKE08rP j7L2/aS2+UCnhLPIygpCHzA/etBnpBf6hlFwWgLK0mNhteY1OQ82wdc8GhvQwlYpvmdPzmf5 oS/25VJGJiz
  • Ironport-hdrordr: A9a23:yhp4M6P4T63UicBcT1v155DYdb4zR+YMi2TDiHoedfUFSKOlfp 6V8MjztSWVtN4QMEtQ/+xoHJPwPE80kqQFnbX5XI3SJjUO3VHIEGgM1/qG/9SNIVybygcZ79 YeT0EcMqyBMbEZt7eD3ODQKb9Jq7PrgcPY59s2jU0dNj2CA5sQnjuRYTzra3GeKjM2YqbQQ/ Gnl7R6TnebCD4qR/X+IkNAc/nIptXNmp6jSRkaByQ/4A3LqT+z8rb1HzWRwx9bClp0sPsf2F mAtza8yrSosvm9xBOZ/2jP765OkN+k7tdYHsSDhuUcNz2poAe1Y4ZKXaGEoVkO0aySwWdvtO OJjwYrPsx15X+UVmapoSH10w2l6zoq42+K8y7QvVLT5ejCAB4qActIgoxUNjHD7VA7gd162K VXm0qEqpt+F3r77WXAzumNcysvulu/oHIkn+JWpWdYS5EiZLhYqpFa1F9JEa0HADnx5OkcYa dT5fnnlbVrmG6hHjLkVjEF+q3oYp1zJGbIfqE6gL3U79AM90oJi3fxx6Qk7wE9HdwGOt55Dt //Q9ZVfYd1P7grhJJGdZQ8qPSMexnwqDL3QSqvyAfcZeo600ykke+C3Fxy3pDtRKA1
  • Ironport-sdr: 3+23iLLTUPIn+zkneI23LWqxALA5bUZPwZOu+xZ5HFrNr4+1NwulTLfb+4mMBmyLs7a0sb5Il9 v2rdC9UE+zgpnyT/VWgGwlRS32S41auG5mMUe7Qfso7kQWBbOdmhNZg5uU646CcN8nmVfaZ5Sw hl9Cevu9asblWNBhqp4Z4WraWVpF5zYHa3utS//IqgNstRePCLYllT2VaWprfiXJoheRy/imbC MmwKNN4AouoOTN/l7zQ4NGovzzcGSI/53fXybE+bm6Guv2ahePHwk+1ZwbbtSb7ogHiLKFJOLZ JjdmgxJpzjNxW1eO9Dpagw/W
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Feb 03, 2022 at 10:21:54AM +0100, Jan Beulich wrote:
> On 03.02.2022 10:04, Roger Pau Monné wrote:
> > On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote:
> >> On 02.02.2022 17:13, Roger Pau Monné wrote:
> >>> On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote:
> >>>> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d,
> >>>>  
> >>>>      ASSERT( pod_target >= p2m->pod.count );
> >>>>  
> >>>> -    ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/);
> >>>> +    if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
> >>>
> >>> Is it possible to have cache flush allowed without any PCI device
> >>> assigned? AFAICT the iomem/ioport_caps would only get setup when there
> >>> are device passed through?
> >>
> >> One can assign MMIO or ports to a guest the raw way. That's not
> >> secure, but functionally explicitly permitted.
> >>
> >>> TBH I would be fine if we just say that PoD cannot be used in
> >>> conjunction with an IOMMU, and just check for is_iommu_enable(d) here.
> >>>
> >>> I understand it's technically possible for PoD to be used together
> >>> with a domain that will later get a device passed through once PoD is
> >>> no longer in use, but I doubt there's much value in supporting that
> >>> use case, and I fear we might be introducing corner cases that could
> >>> create issues in the future. Overall I think it would be safer to just
> >>> disable PoD in conjunction with an IOMMU.
> >>
> >> I consider it wrong to put in place such a restriction, but I could
> >> perhaps accept you and Andrew thinking this way if this was the only
> >> aspect playing into here. However, this would then want an equivalent
> >> tools side check, and while hunting down where to make the change as
> >> done here, I wasn't able to figure out where that alternative
> >> adjustment would need doing. Hence I would possibly(!) buy into this
> >> only if someone else took care of doing so properly in the tool stack
> >> (including the emission of a sensible error message).
> > 
> > What about the (completely untested) chunk below:
> > 
> > diff --git a/tools/libs/light/libxl_create.c 
> > b/tools/libs/light/libxl_create.c
> > index d7a40d7550..e585ef4c5c 100644
> > --- a/tools/libs/light/libxl_create.c
> > +++ b/tools/libs/light/libxl_create.c
> > @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
> >      pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) &&
> >          (d_config->b_info.target_memkb < d_config->b_info.max_memkb);
> >  
> > -    /* We cannot have PoD and PCI device assignment at the same time
> > +    /* We cannot have PoD and an active IOMMU at the same time
> >       * for HVM guest. It was reported that IOMMU cannot work with PoD
> >       * enabled because it needs to populated entire page table for
> > -     * guest. To stay on the safe side, we disable PCI device
> > -     * assignment when PoD is enabled.
> > +     * guest.
> >       */
> >      if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
> > -        d_config->num_pcidevs && pod_enabled) {
> > +        d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED &&
> > +        pod_enabled) {
> >          ret = ERROR_INVAL;
> > -        LOGD(ERROR, domid,
> > -             "PCI device assignment for HVM guest failed due to PoD 
> > enabled");
> > +        LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD");
> >          goto error_out;
> >      }
> 
> Perhaps. Seeing this I actually recall coming across this check during
> my investigation. Not changing it along the lines of what you do was
> then really more because of me not being convinced of the extra
> restriction; I clearly misremembered when writing the earlier reply.
> If we were to do what you suggest, I'd like to ask that the comment be
> changed differently, though: "We cannot ..." then isn't really true
> anymore. We choose not to permit this mode; "cannot" only applies to
> actual device assignment (and of course only as long as there aren't
> restartable IOMMU faults).

I'm fine with an adjusted wording here. This was mostly a placement
suggestion, but I didn't gave much thought to the error message.

> >> Finally this still leaves out the "raw MMIO / ports" case mentioned
> >> above.
> > 
> > But the raw MMIO 'mode' doesn't care much about PoD, because if
> > there's no PCI device assigned there's no IOMMU setup, and thus such
> > raw MMIO regions (could?) belong to a device that's not constrained by
> > the guest p2m anyway?
> 
> Hmm, yes, true.
> 
> >>>> --- a/xen/common/vm_event.c
> >>>> +++ b/xen/common/vm_event.c
> >>>> @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st
> >>>>  
> >>>>              rc = -EXDEV;
> >>>>              /* Disallow paging in a PoD guest */
> >>>> -            if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) )
> >>>> +            if ( p2m_pod_active(d) )
> >>>
> >>> Isn't it fine to just check for entry_count like you suggest in the
> >>> change to libxl?
> >>
> >> I didn't think it would be, but I'm not entirely sure: If paging was
> >> enabled before a guest actually starts, it wouldn't have any entries
> >> but still be a PoD guest if it has a non-empty cache. The VM event
> >> folks may be able to clarify this either way. But ...
> >>
> >>> This is what p2m_pod_entry_count actually does (rather than entry_count | 
> >>> count).
> >>
> >> ... you really mean "did" here, as I'm removing p2m_pod_entry_count()
> >> in this patch. Of course locking could be added to it instead (or
> >> p2m_pod_get_mem_target() be used in its place), but I'd prefer if we
> >> could go with just the check which precisely matches what the comment
> >> says (IOW otherwise I'd need to additionally know what exactly the
> >> comment is to say).
> > 
> > Could you briefly mention this in the commit message? Ie: VM event
> > code is also adjusted to make sure PoD is not in use and cannot be
> > used during the guest lifetime?
> 
> I've added
> 
> "Nor was the use of that function in line with the immediately preceding
>  comment: A PoD guest isn't just one with a non-zero entry count, but
>  also one with a non-empty cache (e.g. prior to actually launching the
>  guest)."
> 
> to the already existing paragraph about the removal of that function.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.