Xen project Mailing List

Re: [Xen-devel] [PATCH for Xen 4.13] x86/msi: Don't panic if msix capability is missing

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 30 Sep 2019 22:21:40 +0800

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <jbeulich@xxxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Delivery-date: Mon, 30 Sep 2019 14:17:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Sep 30, 2019 at 11:09:58AM +0200, Roger Pau Monné wrote: >On Mon, Sep 30, 2019 at 05:24:31AM +0800, Chao Gao wrote: >> Current, Xen isn't aware of device reset (initiated by dom0). Xen may >> access the device while device cannot respond to config requests >> normally (e.g. after device reset, device may respond to config >> requests with CRS completions to indicate it needs more time to >> complete a reset, refer to pci_dev_wait() in linux kernel for more >> detail). Here, don't assume msix capability is always visible and >> return -EAGAIN to the caller. >> >> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx> >> --- >> I didn't find a way to trigger the assertion in normal usages. >> It is found by an internal test: echo 1 to /sys/bus/pci/<sbdf>/reset >> when the device is being used by a guest. Although the test is a >> little insane, it is better to avoid crashing Xen even for this case. > >The hardware domain doing such things behind Xen's back is quite >likely to end badly, either hitting an ASSERT somewhere or with a >malfunctioning device. Xen should be signaled of when such reset is >happening, so it can also tear down the internal state of the >device. > >Xen could trap accesses to the FLR bit in order to detect device >resets, but that's only a way of performing a device reset, other >methods are likely more complicated to detect, and hence this would >only be a partial solution. > >Have you considered whether it's feasible to signal Xen that a device >reset is happening, so it can torn down the internal device state? I think it is feasible. But I am not sure whether it is necessary. As you said to me before, after detaching the device from a domain, the internal device state in Xen should have be reset. That's why hardware domain or other domainU can use the device again. So Xen has provided hypercalls to tear down the internal state. (IMO, the internal state includes interrupt binding and mapping, MMIO mapping. But I am not sure if I miss something). The question then becomes: should Xen tolerate hardware domain's misbehavior (resetting a device without tearing down internal state) or just panic? > >> --- >> xen/arch/x86/msi.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c >> index 76d4034..e2f3c6c 100644 >> --- a/xen/arch/x86/msi.c >> +++ b/xen/arch/x86/msi.c >> @@ -1265,7 +1265,13 @@ int pci_msi_conf_write_intercept(struct pci_dev >> *pdev, unsigned int reg, >> pos = entry ? entry->msi_attrib.pos >> : pci_find_cap_offset(seg, bus, slot, func, >> PCI_CAP_ID_MSIX); >> - ASSERT(pos); > >I think at least a comment should be added here describing why a >capability might suddenly disappear. Will do. > >> + if ( unlikely(!pos) ) >> + { >> + printk_once(XENLOG_WARNING > >I'm not sure if printk_once is the best option, the message would be >printed only once, and for the first device that hits this. Ideally I >think it should be printed at least once for each device that hits >this condition. > >Alternatively you can turn this into a gprintk which would be good >enough IMO. Will do. Thanks Chao _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.