[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
On 03/02/2020 14:21, Roger Pau Monné wrote: > On Mon, Feb 03, 2020 at 01:44:06PM +0100, Sander Eikelenboom wrote: >> On 03/02/2020 13:41, Roger Pau Monné wrote: >>> On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote: >>>> On 03/02/2020 13:23, Roger Pau Monné wrote: >>>>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote: >>>>>> Hi Roger, >>>>>> >>>>>> Last week I encountered an issue with the PCI-passthrough of a USB >>>>>> controller. >>>>>> In the guest I get: >>>>>> [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to >>>>>> stop endpoint command. >>>>>> [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not >>>>>> responding, assume dead >>>>>> [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up >>>>>> [ 1143.356407] usb 1-2: USB disconnect, device number 2 >>>>>> >>>>>> Bisection turned up as the culprit: >>>>>> commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 >>>>>> x86/smp: use APIC ALLBUT destination shorthand when possible >>>>> >>>>> Sorry to hear that, let see if we can figure out what's wrong. >>>> >>>> No problem, that is why I test stuff :) >>>> >>>>>> I verified by reverting that commit and now it works fine again. >>>>> >>>>> Does the same controller work fine when used in dom0? >>>> >>>> Will test that, but as all other pci devices in dom0 work fine, >>>> I assume this controller would also work fine in dom0 (as it has also >>>> worked fine for ages with PCI-passthrough to that guest and still works >>>> fine when reverting the referenced commit). >>> >>> Is this the only device that fails to work when doing pci-passthrough, >>> or other devices also don't work with the mentioned change applied? >>> >>> Have you tested on other boxes? >>> >>>> I don't know if your change can somehow have a side effect >>>> on latency around the processing of pci-passthrough ? >>> >>> Hm, the mentioned commit should speed up broadcast IPIs, but I don't >>> see how it could slow down other interrupts. Also I would think the >>> domain is not receiving interrupts from the device, rather than >>> interrupts being slow. >>> >>> Can you also paste the output of lspci -v for that xHCI device from >>> dom0? >>> >>> Thanks, Roger. >> >> Will do this evening including the testing in dom0 etc. >> Will also see if there is any pattern when observing /proc/interrupts in >> the guest. > > Thanks! I also have some trivial patch that I would like you to try, > just to discard send_IPI_mask clearing the scratch_cpumask under > another function feet. > > Roger. Hi Roger, Took a while, but I was able to run some tests now. I also forgot a detail in the first report (probably still a bit tired from FOSDEM), namely: the device passedthrough works OK for a while before I get the kernel message. I tested the patch and it looks like it makes the issue go away, I tested for a day, while without the patch (or revert of the commit) the device will give problems within a few hours. lspci output from dom0 for this device is below. -- Sander lspci -vvvknn -s 08:00.0 08:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 Host Controller [1033:0194] (rev 03) (prog-if 30 [XHCI]) Subsystem: ASUSTeK Computer Inc. P8P67 Deluxe Motherboard [1043:8413] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 37 NUMA node: 0 Region 0: Memory at f9afe000 (64-bit, non-prefetchable) [size=8K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [90] MSI-X: Enable+ Count=8 Masked- Vector table: BAR=0 offset=00001000 PBA: BAR=0 offset=00001080 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140 v1] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff Capabilities: [150 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Kernel driver in use: pciback > --- > diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c > index 65eb7cbda8..aeeb506155 100644 > --- a/xen/arch/x86/smp.c > +++ b/xen/arch/x86/smp.c > @@ -66,7 +66,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int > vector, > void send_IPI_mask(const cpumask_t *mask, int vector) > { > bool cpus_locked = false; > - cpumask_t *scratch = this_cpu(scratch_cpumask); > + static DEFINE_PER_CPU(cpumask_t, send_ipi_cpumask); > + cpumask_t *scratch = &this_cpu(send_ipi_cpumask); > > /* > * This can only be safely used when no CPU hotplug or unplug operations > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |