[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3] IOMMU: make DMA containment of quarantined devices optional



> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: Monday, March 9, 2020 7:09 PM
> 
> Containing still in flight DMA was introduced to work around certain
> devices / systems hanging hard upon hitting a "not-present" IOMMU fault.
> Passing through (such) devices (on such systems) is inherently insecure
> (as guests could easily arrange for IOMMU faults of any kind to occur).
> Defaulting to a mode where admins may not even become aware of issues
> with devices can be considered undesirable. Therefore convert this mode
> of operation to an optional one, not one enabled by default.
> 
> This involves resurrecting code commit ea38867831da ("x86 / iommu: set
> up a scratch page in the quarantine domain") did remove, in a slightly
> extended and abstracted fashion. Here, instead of reintroducing a pretty
> pointless use of "goto" in domain_context_unmap(), and instead of making
> the function (at least temporarily) inconsistent, take the opportunity
> and replace the other similarly pointless "goto" as well.
> 
> In order to key the re-instated bypasses off of there (not) being a root
> page table this further requires moving the allocate_domain_resources()
> invocation from reassign_device() to amd_iommu_setup_domain_device()
> (or
> else reassign_device() would allocate a root page table anyway); this is
> benign to the second caller of the latter function.
> 
> Take the opportunity and also limit the control to builds supporting
> PCI.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> I'm happy to take better suggestions to replace the "full" command line
> option and Kconfig prompt tokens. I don't think though that "fault" and
> "write-fault" are really suitable there.

I think we may just allow both r/w access to scratch page for such bogus
device, which may make 'full' more reasonable since we now fully
contain in-fly DMAs. I'm not sure about the value of keeping write-fault
alone for such devices (just because one observed his specific device only 
has problem with read-fault).

alternatively I also thought about whether whitelisting the problematic 
devices through another option (e.g. nofault=b:d:f) could provide more
value. In concept any IOMMU page table (dom0, dom_io or domU) 
for such bogus device should not include invalid entry, even when 
quarantine is not specified. However I'm not sure whether it's worthy of 
going so far...

> ---
> This patch contextually depends on "[PATCH v2 0/5] IOMMU: restrict
> visibility/scope if certain variables".
> ---
> v3: IOMMU_quarantine_basic -> IOMMU_quarantine_fault,
>     IOMMU_quarantine_full -> IOMMU_quarantine_write_fault. Kconfig
>     option (choice) to select default. Limit to HAS_PCI.
> v2: Don't use true/false. Introduce QUARANTINE_SKIP() (albeit I'm not
>     really convinced this is an improvement). Add comment.
> 
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1238,7 +1238,7 @@ detection of systems known to misbehave
>  > Default: `new` unless directed-EOI is supported
> 
>  ### iommu
> -    = List of [ <bool>, verbose, debug, force, required, quarantine,
> +    = List of [ <bool>, verbose, debug, force, required, quarantine[=full],
>                  sharept, intremap, intpost, crash-disable,
>                  snoop, qinval, igfx, amd-iommu-perdev-intremap,
>                  dom0-{passthrough,strict} ]
> @@ -1276,11 +1276,15 @@ boolean (e.g. `iommu=no`) can override t
>      will prevent Xen from booting if IOMMUs aren't discovered and enabled
>      successfully.
> 
> -*   The `quarantine` boolean can be used to control Xen's behavior when
> -    de-assigning devices from guests.  If enabled (the default), Xen always
> -    quarantines such devices; they must be explicitly assigned back to Dom0
> -    before they can be used there again.  If disabled, Xen will only
> -    quarantine devices the toolstack hass arranged for getting quarantined.
> +*   The `quarantine` option can be used to control Xen's behavior when
> +    de-assigning devices from guests.  If set to true (the default), Xen
> +    always quarantines such devices; they must be explicitly assigned back
> +    to Dom0 before they can be used there again.  If set to "full", still
> +    active DMA will additionally be directed to a "sink" page.  If set to
> +    false, Xen will only quarantine devices the toolstack has arranged for
> +    getting quarantined.
> +
> +    This option is only valid on builds supporting PCI.
> 
>  *   The `sharept` boolean controls whether the IOMMU pagetables are
> shared
>      with the CPU-side HAP pagetables, or allocated separately.  Sharing
> --- a/xen/drivers/passthrough/Kconfig
> +++ b/xen/drivers/passthrough/Kconfig
> @@ -28,3 +28,31 @@ endif
> 
>  config IOMMU_FORCE_PT_SHARE
>       bool
> +
> +choice
> +     prompt "IOMMU device quarantining default behavior"
> +     depends on HAS_PCI
> +     default IOMMU_QUARANTINE_BASIC
> +     ---help---
> +       When a PCI device is assigned to an untrusted domain, it is possible
> +       for that domain to program the device to DMA to an arbitrary
> address.
> +       The IOMMU is used to protect the host from malicious DMA by
> making
> +       sure that the device addresses can only target memory assigned to
> the
> +       guest.  However, when the guest domain is torn down, assigning the
> +       device back to the hardware domain would allow any in-flight DMA
> to
> +       potentially target critical host data.  To avoid this, quarantining
> +       shold be enabled.  Quarantining can be done in two ways: In its
> basic
> +       form, all in-flight DMA will simply be forced to encounter IOMMU
> +       faults.  Since there are systems where doing so can cause host
> +       lockup, an alternative form is available where writes to memory will
> +       be made fault, but reads will be directed to a dummy page.  The
> +       implication here is that such reads will go unnoticed, i.e. an admin
> +       may not become aware of the underlying problem.
> +
> +     config IOMMU_QUARANTINE_NONE
> +             bool "none"
> +     config IOMMU_QUARANTINE_BASIC
> +             bool "basic"
> +     config IOMMU_QUARANTINE_FULL
> +             bool "full"
> +endchoice
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -25,6 +25,9 @@
>  #include "iommu.h"
>  #include "../ats.h"
> 
> +/* dom_io is used as a sentinel for quarantined devices */
> +#define QUARANTINE_SKIP(d) ((d) == dom_io && !dom_iommu(d)-
> >arch.root_table)
> +
>  static bool_t __read_mostly init_done;
> 
>  static const struct iommu_init_ops _iommu_init_ops;
> @@ -82,18 +85,35 @@ int get_dma_requestor_id(uint16_t seg, u
>      return req_id;
>  }
> 
> -static void amd_iommu_setup_domain_device(
> +static int __must_check allocate_domain_resources(struct domain_iommu
> *hd)
> +{
> +    int rc;
> +
> +    spin_lock(&hd->arch.mapping_lock);
> +    rc = amd_iommu_alloc_root(hd);
> +    spin_unlock(&hd->arch.mapping_lock);
> +
> +    return rc;
> +}
> +
> +static int __must_check amd_iommu_setup_domain_device(
>      struct domain *domain, struct amd_iommu *iommu,
>      uint8_t devfn, struct pci_dev *pdev)
>  {
>      struct amd_iommu_dte *table, *dte;
>      unsigned long flags;
> -    int req_id, valid = 1;
> +    int req_id, valid = 1, rc;
>      u8 bus = pdev->bus;
> -    const struct domain_iommu *hd = dom_iommu(domain);
> +    struct domain_iommu *hd = dom_iommu(domain);
> +
> +    if ( QUARANTINE_SKIP(domain) )
> +        return 0;
> +
> +    BUG_ON(!hd->arch.paging_mode || !iommu->dev_table.buffer);
> 
> -    BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode ||
> -            !iommu->dev_table.buffer );
> +    rc = allocate_domain_resources(hd);
> +    if ( rc )
> +        return rc;
> 
>      if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
>          valid = 0;
> @@ -148,6 +168,8 @@ static void amd_iommu_setup_domain_devic
> 
>          amd_iommu_flush_iotlb(devfn, pdev,
> INV_IOMMU_ALL_PAGES_ADDRESS, 0);
>      }
> +
> +    return 0;
>  }
> 
>  int __init acpi_ivrs_init(void)
> @@ -217,17 +239,6 @@ int amd_iommu_alloc_root(struct domain_i
>      return 0;
>  }
> 
> -static int __must_check allocate_domain_resources(struct domain_iommu
> *hd)
> -{
> -    int rc;
> -
> -    spin_lock(&hd->arch.mapping_lock);
> -    rc = amd_iommu_alloc_root(hd);
> -    spin_unlock(&hd->arch.mapping_lock);
> -
> -    return rc;
> -}
> -
>  int amd_iommu_get_paging_mode(unsigned long entries)
>  {
>      int level = 1;
> @@ -291,6 +302,9 @@ static void amd_iommu_disable_domain_dev
>      int req_id;
>      u8 bus = pdev->bus;
> 
> +    if ( QUARANTINE_SKIP(domain) )
> +        return;
> +
>      BUG_ON ( iommu->dev_table.buffer == NULL );
>      req_id = get_dma_requestor_id(iommu->seg, PCI_BDF2(bus, devfn));
>      table = iommu->dev_table.buffer;
> @@ -337,7 +351,6 @@ static int reassign_device(struct domain
>  {
>      struct amd_iommu *iommu;
>      int bdf, rc;
> -    struct domain_iommu *t = dom_iommu(target);
> 
>      bdf = PCI_BDF2(pdev->bus, pdev->devfn);
>      iommu = find_iommu_for_device(pdev->seg, bdf);
> @@ -358,11 +371,10 @@ static int reassign_device(struct domain
>          pdev->domain = target;
>      }
> 
> -    rc = allocate_domain_resources(t);
> +    rc = amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
>      if ( rc )
>          return rc;
> 
> -    amd_iommu_setup_domain_device(target, iommu, devfn, pdev);
>      AMD_IOMMU_DEBUG("Re-assign %04x:%02x:%02x.%u from dom%d to
> dom%d\n",
>                      pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>                      source->domain_id, target->domain_id);
> @@ -519,8 +531,7 @@ static int amd_iommu_add_device(u8 devfn
>          spin_unlock_irqrestore(&iommu->lock, flags);
>      }
> 
> -    amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
> -    return 0;
> +    return amd_iommu_setup_domain_device(pdev->domain, iommu, devfn,
> pdev);
>  }
> 
>  static int amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -31,9 +31,24 @@ bool_t __initdata iommu_enable = 1;
>  bool_t __read_mostly iommu_enabled;
>  bool_t __read_mostly force_iommu;
>  bool_t __read_mostly iommu_verbose;
> -bool __read_mostly iommu_quarantine = true;
>  bool_t __read_mostly iommu_crash_disable;
> 
> +#define IOMMU_quarantine_none        0 /* aka false */
> +#define IOMMU_quarantine_fault       1 /* aka true */
> +#define IOMMU_quarantine_write_fault 2
> +#ifdef CONFIG_HAS_PCI
> +uint8_t __read_mostly iommu_quarantine =
> +# if defined(CONFIG_IOMMU_QUARANTINE_NONE)
> +    IOMMU_quarantine_none;
> +# elif defined(CONFIG_IOMMU_QUARANTINE_BASIC)
> +    IOMMU_quarantine_fault;
> +# elif defined(CONFIG_IOMMU_QUARANTINE_FULL)
> +    IOMMU_quarantine_write_fault;
> +# endif
> +#else
> +# define iommu_quarantine IOMMU_quarantine_none
> +#endif /* CONFIG_HAS_PCI */
> +
>  static bool __hwdom_initdata iommu_hwdom_none;
>  bool __hwdom_initdata iommu_hwdom_strict;
>  bool __read_mostly iommu_hwdom_passthrough;
> @@ -68,8 +83,12 @@ static int __init parse_iommu_param(cons
>          else if ( (val = parse_boolean("force", s, ss)) >= 0 ||
>                    (val = parse_boolean("required", s, ss)) >= 0 )
>              force_iommu = val;
> +#ifdef CONFIG_HAS_PCI
>          else if ( (val = parse_boolean("quarantine", s, ss)) >= 0 )
>              iommu_quarantine = val;
> +        else if ( ss == s + 15 && !strncmp(s, "quarantine=full", 15) )
> +            iommu_quarantine = IOMMU_quarantine_write_fault;
> +#endif
>  #ifdef CONFIG_X86
>          else if ( (val = parse_boolean("igfx", s, ss)) >= 0 )
>              iommu_igfx = val;
> @@ -448,7 +467,7 @@ static int __init iommu_quarantine_init(
>      dom_io->options |= XEN_DOMCTL_CDF_iommu;
> 
>      rc = iommu_domain_init(dom_io, 0);
> -    if ( rc )
> +    if ( rc || iommu_quarantine < IOMMU_quarantine_write_fault )
>          return rc;
> 
>      if ( !hd->platform_ops->quarantine_init )
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -41,6 +41,9 @@
>  #include "vtd.h"
>  #include "../ats.h"
> 
> +/* dom_io is used as a sentinel for quarantined devices */
> +#define QUARANTINE_SKIP(d) ((d) == dom_io && !dom_iommu(d)-
> >arch.pgd_maddr)
> +
>  struct mapped_rmrr {
>      struct list_head list;
>      u64 base, end;
> @@ -1294,6 +1297,9 @@ int domain_context_mapping_one(
>      int agaw, rc, ret;
>      bool_t flush_dev_iotlb;
> 
> +    if ( QUARANTINE_SKIP(domain) )
> +        return 0;
> +
>      ASSERT(pcidevs_locked());
>      spin_lock(&iommu->lock);
>      maddr = bus_to_context_maddr(iommu, bus);
> @@ -1548,6 +1554,9 @@ int domain_context_unmap_one(
>      int iommu_domid, rc, ret;
>      bool_t flush_dev_iotlb;
> 
> +    if ( QUARANTINE_SKIP(domain) )
> +        return 0;
> +
>      ASSERT(pcidevs_locked());
>      spin_lock(&iommu->lock);
> 
> @@ -1609,7 +1618,7 @@ static int domain_context_unmap(struct d
>  {
>      struct acpi_drhd_unit *drhd;
>      struct vtd_iommu *iommu;
> -    int ret = 0;
> +    int ret;
>      u8 seg = pdev->seg, bus = pdev->bus, tmp_bus, tmp_devfn, secbus;
>      int found = 0;
> 
> @@ -1625,14 +1634,12 @@ static int domain_context_unmap(struct d
>              printk(VTDPREFIX "d%d:Hostbridge: skip %04x:%02x:%02x.%u
> unmap\n",
>                     domain->domain_id, seg, bus,
>                     PCI_SLOT(devfn), PCI_FUNC(devfn));
> -        if ( !is_hardware_domain(domain) )
> -            return -EPERM;
> -        goto out;
> +        return is_hardware_domain(domain) ? 0 : -EPERM;
> 
>      case DEV_TYPE_PCIe_BRIDGE:
>      case DEV_TYPE_PCIe2PCI_BRIDGE:
>      case DEV_TYPE_LEGACY_PCI_BRIDGE:
> -        goto out;
> +        return 0;
> 
>      case DEV_TYPE_PCIe_ENDPOINT:
>          if ( iommu_debug )
> @@ -1676,10 +1683,12 @@ static int domain_context_unmap(struct d
>          dprintk(XENLOG_ERR VTDPREFIX,
> "d%d:unknown(%u): %04x:%02x:%02x.%u\n",
>                  domain->domain_id, pdev->type,
>                  seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> -        ret = -EINVAL;
> -        goto out;
> +        return -EINVAL;
>      }
> 
> +    if ( QUARANTINE_SKIP(domain) )
> +        return ret;
> +
>      /*
>       * if no other devices under the same iommu owned by this domain,
>       * clear iommu in iommu_bitmap and clear domain_id in domid_bitmp
> @@ -1705,16 +1714,12 @@ static int domain_context_unmap(struct d
> 
>          iommu_domid = domain_iommu_domid(domain, iommu);
>          if ( iommu_domid == -1 )
> -        {
> -            ret = -EINVAL;
> -            goto out;
> -        }
> +            return -EINVAL;
> 
>          clear_bit(iommu_domid, iommu->domid_bitmap);
>          iommu->domid_map[iommu_domid] = 0;
>      }
> 
> -out:
>      return ret;
>  }
> 
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -53,7 +53,9 @@ static inline bool_t dfn_eq(dfn_t x, dfn
>  }
> 
>  extern bool_t iommu_enable, iommu_enabled;
> -extern bool force_iommu, iommu_quarantine, iommu_verbose;
> +extern bool force_iommu, iommu_verbose;
> +/* Boolean except for the specific purposes of
> drivers/passthrough/iommu.c. */
> +extern uint8_t iommu_quarantine;
> 
>  #ifdef CONFIG_X86
>  extern enum __packed iommu_intremap {
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.