[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/1] x86/vtd: Mask DMAR faults for IGD devices



While it's described as errata for gen8/9, the previous reporting was from 2015 which predates those generations. I tested it on a gen 7 box (which was causing me the grief of consuming my Xen console buffer). It could be the case that the FPD code isn't implemented (which wouldn't matter for gen8/9 because it is broken), and the original problem of faulty firmware reporting bad ranges is the ultimate culprit. As far as the last two questions, I was testing on an older version of Xen (4.9.x) and ported it to master. Happy to hear a better approach to solving the original problem.

-Brendan

On Sun, Apr 19, 2020 at 11:28 PM Tian, Kevin <kevin.tian@xxxxxxxxx> wrote:
> From: Brendan Kerrigan <brendank310@xxxxxxxxx>
> Sent: Friday, April 17, 2020 9:36 PM
>
> From: Brendan Kerrigan <kerriganb@xxxxxxxxxxxx>
>
> The Intel graphics device records DMAR faults regardless
> of the Fault Process Disable bit being set. When this fault

Since this is an errata for specific generations, let's describe
this way instead of stating it as a generic problem.

> occurs, enable the Interrupt Mask (IM) bit in the Fault
> Event Control (FECTL) register to prevent the continued
> recording of the fault.
>
> Signed-off-by: Brendan Kerrigan <kerriganb@xxxxxxxxxxxx>
> ---
>  xen/drivers/passthrough/vtd/iommu.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 07d40b37fe..288399d816 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -41,6 +41,8 @@
>  #include "vtd.h"
>  #include "../ats.h"
>
> +#define IS_IGD(seg, id) (0 == seg && 0 == PCI_BUS(id) && 2 == PCI_SLOT(id)
> && 0 == PCI_FUNC(id))
> +
>  struct mapped_rmrr {
>      struct list_head list;
>      u64 base, end;
> @@ -872,6 +874,13 @@ static int iommu_page_fault_do_one(struct
> vtd_iommu *iommu, int type,
>      printk(XENLOG_G_WARNING VTDPREFIX "%s: reason %02x - %s\n",
>             kind, fault_reason, reason);
>
> +    if ( DMA_REMAP == fault_type && type && IS_IGD(seg, source_id) ) {
> +        u32 fectl = dmar_readl(iommu->reg, DMAR_FECTL_REG);
> +        fectl |= DMA_FECTL_IM;
> +        dmar_writel(iommu->reg, DMAR_FECTL_REG, fectl);
> +        printk(XENLOG_G_WARNING VTDPREFIX "Disabling DMAR faults for
> IGD\n");
> +    }
> +

Several questions. First, I just note that FPD is not touched by any code
today. then how is this errata being caught? Second, we already have
pci_check_disable_device in place which stops DMAs from the problematic
device if it triggers excessive fault reports. why doesn't it work for your
case? Last, dma_msi_end just forces clearing the mask bit at end of handling
the fault interrupt, making above fix meaningless. Those questions just
make me wonder how you actually test this patch and whether it's necessary...

Thanks
Kevin

>      if ( iommu_verbose && fault_type == DMA_REMAP )
>          print_vtd_entries(iommu, PCI_BUS(source_id), PCI_DEVFN2(source_id),
>                            addr >> PAGE_SHIFT);
> --
> 2.17.1


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.