|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
On 03/02/2020 14:21, Roger Pau Monné wrote:
> On Mon, Feb 03, 2020 at 01:44:06PM +0100, Sander Eikelenboom wrote:
>> On 03/02/2020 13:41, Roger Pau Monné wrote:
>>> On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote:
>>>> On 03/02/2020 13:23, Roger Pau Monné wrote:
>>>>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
>>>>>> Hi Roger,
>>>>>>
>>>>>> Last week I encountered an issue with the PCI-passthrough of a USB
>>>>>> controller.
>>>>>> In the guest I get:
>>>>>> [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to
>>>>>> stop endpoint command.
>>>>>> [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not
>>>>>> responding, assume dead
>>>>>> [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
>>>>>> [ 1143.356407] usb 1-2: USB disconnect, device number 2
>>>>>>
>>>>>> Bisection turned up as the culprit:
>>>>>> commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>>>>> x86/smp: use APIC ALLBUT destination shorthand when possible
>>>>>
>>>>> Sorry to hear that, let see if we can figure out what's wrong.
>>>>
>>>> No problem, that is why I test stuff :)
>>>>
>>>>>> I verified by reverting that commit and now it works fine again.
>>>>>
>>>>> Does the same controller work fine when used in dom0?
>>>>
>>>> Will test that, but as all other pci devices in dom0 work fine,
>>>> I assume this controller would also work fine in dom0 (as it has also
>>>> worked fine for ages with PCI-passthrough to that guest and still works
>>>> fine when reverting the referenced commit).
>>>
>>> Is this the only device that fails to work when doing pci-passthrough,
>>> or other devices also don't work with the mentioned change applied?
>>>
>>> Have you tested on other boxes?
>>>
>>>> I don't know if your change can somehow have a side effect
>>>> on latency around the processing of pci-passthrough ?
>>>
>>> Hm, the mentioned commit should speed up broadcast IPIs, but I don't
>>> see how it could slow down other interrupts. Also I would think the
>>> domain is not receiving interrupts from the device, rather than
>>> interrupts being slow.
>>>
>>> Can you also paste the output of lspci -v for that xHCI device from
>>> dom0?
>>>
>>> Thanks, Roger.
>>
>> Will do this evening including the testing in dom0 etc.
>> Will also see if there is any pattern when observing /proc/interrupts in
>> the guest.
>
> Thanks! I also have some trivial patch that I would like you to try,
> just to discard send_IPI_mask clearing the scratch_cpumask under
> another function feet.
>
> Roger.
Hi Roger,
Took a while, but I was able to run some tests now.
I also forgot a detail in the first report (probably still a bit tired from
FOSDEM),
namely: the device passedthrough works OK for a while before I get the kernel
message.
I tested the patch and it looks like it makes the issue go away,
I tested for a day, while without the patch (or revert of the commit) the device
will give problems within a few hours.
lspci output from dom0 for this device is below.
--
Sander
lspci -vvvknn -s 08:00.0
08:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 Host
Controller [1033:0194] (rev 03) (prog-if 30 [XHCI])
Subsystem: ASUSTeK Computer Inc. P8P67 Deluxe Motherboard [1043:8413]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 37
NUMA node: 0
Region 0: Memory at f9afe000 (64-bit, non-prefetchable) [size=8K]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
Vector table: BAR=0 offset=00001000
PBA: BAR=0 offset=00001080
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit
Latency L0s <4us, L1 unlimited
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+,
OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [140 v1] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
Capabilities: [150 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: pciback
> ---
> diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
> index 65eb7cbda8..aeeb506155 100644
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -66,7 +66,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int
> vector,
> void send_IPI_mask(const cpumask_t *mask, int vector)
> {
> bool cpus_locked = false;
> - cpumask_t *scratch = this_cpu(scratch_cpumask);
> + static DEFINE_PER_CPU(cpumask_t, send_ipi_cpumask);
> + cpumask_t *scratch = &this_cpu(send_ipi_cpumask);
>
> /*
> * This can only be safely used when no CPU hotplug or unplug operations
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |