[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown



On Mon, 13 Apr 2015, Jan Beulich wrote:
> >>> On 02.04.15 at 18:49, <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> > On Wed, 25 Mar 2015, Jan Beulich wrote:
> >> When a device gets detached from a guest, pciback will clear its
> >> command register, thus disabling both memory and I/O decoding. The
> >> disabled memory decoding, however, has an effect on the MSI-X table
> >> accesses the hypervisor does: These won't have the intended effect
> >> anymore. Even worse, for PCIe devices (but not SR-IOV virtual
> >> functions) such accesses may (will?) be treated as Unsupported
> >> Requests, causing respective errors to be surfaced, potentially in the
> >> form of NMIs that may be fatal to the hypervisor or Dom0 is different
> >> ways. Hence rather than carrying out these accesses, we should avoid
> >> them where we can, and use alternative (e.g. PCI config space based)
> >> mechanisms to achieve at least the same effect.
> > 
> > I don't think that it is a good idea for both Xen and Linux to access
> > the command register simultaneously.  Working around Linux in Xen
> > doesn't sound like an optimal solution.   Maybe we could just fix the
> > pciback and that would be enough.
> 
> I'm afraid that would just eliminate the specific case, but not the
> general issue.

If we trust Dom0 to do the right thing, then I don't think there is a
general issue to be solved. Dom0 can break the system at any time, I
don't see any differences here, unless we have a plan to actually be
able to handle a misbehaving dom0, in that case I am all for it.


> While we trust Dom0 to not do outright bad things,
> the hypervisor should still avoid doing things that can go wrong
> due to the state a device is put (or left) in by Dom0.

Xen should also avoid doing things that can go wrong because of the
state a device is put in by QEMU or other components in the system.
There isn't much room for Xen to play with.


> > In any case we should make it clear somewhere who is supposed to write
> > to the command register (and other PCI reigsters) at any given time,
> > otherwise it would be very easy for a new kernel update to break the
> > hypervisor and we wouldn't even know why it happened.
> 
> We should, but at this point in time this is going to be rather
> problematic. Such a separation of responsibilities should have been
> done before all the pass-through code got written.

That is true.


> >> @@ -369,24 +401,52 @@ static void msi_set_mask_bit(struct irq_
> >>          }
> >>          break;
> >>      case PCI_CAP_ID_MSIX:
> >> -    {
> >> -        int offset = PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
> >> -        writel(flag, entry->mask_base + offset);
> >> -        readl(entry->mask_base + offset);
> >> -        break;
> >> -    }
> >> +        if ( likely(memory_decoded(pdev)) )
> >> +        {
> >> +            writel(flag, entry->mask_base + 
> >> PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
> >> +            readl(entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
> >> +            break;
> >> +        }
> >> +        if ( flag )
> >> +        {
> >> +            u16 control;
> >> +            domid_t domid = pdev->domain->domain_id;
> >> +
> >> +            control = pci_conf_read16(seg, bus, slot, func,
> >> +                                      
> >> msix_control_reg(entry->msi_attrib.pos));
> >> +            if ( control & PCI_MSIX_FLAGS_MASKALL )
> >> +                break;
> >> +            pci_conf_write16(seg, bus, slot, func,
> >> +                             msix_control_reg(entry->msi_attrib.pos),
> >> +                             control | PCI_MSIX_FLAGS_MASKALL);
> > 
> > How is that going to interact with Linux (Dom0) writing to the command
> > register? Moreover QEMU writes to the PCI_MSIX_FLAGS_MASKALL bit for
> > devices assigned to HVM guests. Could this cause any conflicts?
> 
> I don't see the relation to the comment register here - all quoted code
> only deals with the MSI-X control register.

I meant control register, sorry for the confusion.


> > One might argue that QEMU should not touch PCI_MSIX_FLAGS_MASKALL, but
> > as a matter of fact QEMU has done that for years and we cannot break
> > that behaviour without introducing regressions.  In fact as it stands
> > QEMU is the owner of PCI_MSIX_FLAGS_MASKALL for devices assigned to HVM
> > guests, not Xen.
> > 
> > Can we avoid messing with PCI_MSIX_FLAGS_MASKALL in Xen for passed
> > through devices to running domains?  I think that might be a good enough
> > separation of responsibilities between Xen and QEMU.
> 
> No, the bits has to be under hypervisor control (just like any other
> hardware bits controlling interrupts).

Maybe they should be under hypervisor control, but keep in mind that
they are not and they haven't been for years now.  We cannot go ahead
and break existing code paths and use cases. As it stands, QEMU is in
charge of PCI_MSIX_FLAGS_MASKALL for passthrough devices and this is a
regression.


> Dealing with this is the subject of patches I have ready, but didn't post yet.

OK, that is very good. But if we really want to make this change,
then I think that those patches would need to be part of this series so
that they can be committed at the same time to minimize the breakage.

And how are we going to deal with older "unfixed" QEMUs?
So far we have been using the same policy for QEMU and the Dom0 kernel:
Xen doesn't break them -- old Linux kernels and QEMUs are supposed to
just work.

However this change would subtly break QEMU, so we would need to
introduce a feature flag in QEMU to advertise that is new enough and
capable of leaving the control register alone. If the flag is missing,
Xen would refuse to do PCI passthrough?  It seems awful.

Alternatively we would have to introduce a minimum supported QEMU
version for Xen, that is unpleasant.  Do we want to do that? I think it
is worth a thought before going ahead.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.