[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown



On Wed, 25 Mar 2015, Jan Beulich wrote:
> When a device gets detached from a guest, pciback will clear its
> command register, thus disabling both memory and I/O decoding. The
> disabled memory decoding, however, has an effect on the MSI-X table
> accesses the hypervisor does: These won't have the intended effect
> anymore. Even worse, for PCIe devices (but not SR-IOV virtual
> functions) such accesses may (will?) be treated as Unsupported
> Requests, causing respective errors to be surfaced, potentially in the
> form of NMIs that may be fatal to the hypervisor or Dom0 is different
> ways. Hence rather than carrying out these accesses, we should avoid
> them where we can, and use alternative (e.g. PCI config space based)
> mechanisms to achieve at least the same effect.

I don't think that it is a good idea for both Xen and Linux to access
the command register simultaneously.  Working around Linux in Xen
doesn't sound like an optimal solution.   Maybe we could just fix the
pciback and that would be enough.

In any case we should make it clear somewhere who is supposed to write
to the command register (and other PCI reigsters) at any given time,
otherwise it would be very easy for a new kernel update to break the
hypervisor and we wouldn't even know why it happened.

 
> @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_
>          }
>          break;
>      case PCI_CAP_ID_MSIX:
> -    {
> -        int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
> -        writel(flag, entry->mask_base + offset);
> -        readl(entry->mask_base + offset);
> -        break;
> -    }
> +        if ( likely(memory_decoded(pdev)) )
> +        {
> +            writel(flag, entry->mask_base + 
> PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
> +            readl(entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
> +            break;
> +        }
> +        if ( flag )
> +        {
> +            u16 control;
> +            domid_t domid = pdev->domain->domain_id;
> +
> +            control = pci_conf_read16(seg, bus, slot, func,
> +                                      
> msix_control_reg(entry->msi_attrib.pos));
> +            if ( control & PCI_MSIX_FLAGS_MASKALL )
> +                break;
> +            pci_conf_write16(seg, bus, slot, func,
> +                             msix_control_reg(entry->msi_attrib.pos),
> +                             control | PCI_MSIX_FLAGS_MASKALL);

How is that going to interact with Linux (Dom0) writing to the command
register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for
devices assigned to HVM guests. Could this cause any conflicts?

One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but
as a matter of fact QEMU has done that for years and we cannot break
that behaviour without introducing regressions.  In fact as it stands
QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM
guests, not Xen.

Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed
through devices to running domains?  I think that might be a good enough
separation of responsibilities between Xen and QEMU.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.