WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Re: pci passthrough xhci host controller

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Jeremy Fitzhardinge <jeremy@xxxxxxxx>
Subject: Re: [Xen-devel] Re: pci passthrough xhci host controller
From: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
Date: Thu, 30 Sep 2010 21:24:48 +0200
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 30 Sep 2010 12:26:52 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100927155952.GA4741@xxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
Organization: Eikelenboom IT services
References: <1262837074.20100915230935@xxxxxxxxxxxxxx> <20100920203344.GA26201@xxxxxxxxxxxx> <1227438201.20100921220310@xxxxxxxxxxxxxx> <20100927155952.GA4741@xxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Hello Konrad,

I have done some more tests, the results:

- boot xen with mem=4G, > 2 days uptime with passthrough and videograbbing
- boot xen without mem=4G, < 1 day freeze with passthrough and videograbbing
- on both no problems as long as you don't grab video (so the controller 
doesn't do much)
- on both no problems when grabbing video with usb2, so it's xhci specific

I haven't changed anything else, same number of VM's running etc. etc., 
videograbbing is working on both (until the freeze or until i ended the test)
I'm reading some messages about msi(-x) interrupt problems with xen on 
xen-devel, and suggestions to try noirqbalance with xen, so on both i use 
noirqbalance.

So it seems to be related to the amount of mem available.
I do see one difference on the domU, with mem=4G i see some occasional warnings 
in syslog:
Sep 28 17:55:02 security kernel: [81744.078288] xhci_hcd 0000:07:00.0: WARN: 
transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.092653] xhci_hcd 0000:07:00.0: WARN: 
transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: 
transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: 
transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN: 
transfer error on endpoint

I don't see these warnings in the syslog when no mem=4G is used, so a hunch 
would be it goes wrong there while the xhci code tries to clean something up.
It could do something "strange" that seems to work on bare metal and on xen 
with mem=4G, but freezes everything with mem > 4G and gives no time to write 
the warning to the syslog / disk in time.

in the syslog of dom0 i do see some occasional memleaks going by, but one set 
could be related:
Sep 28 17:55:19 localhost kernel: [81962.053321] kmemleak: 22 new suspected 
memory leaks (see /sys/kernel/debug/kmemleak)

I will add a script that cat's the content of /sys/kernel/debug/kmemleak to 
syslog when kmemleak reports new suspected leaks.

Any suggestions to try to debug this further ?

I boot with:

title           xen-4.1-unstable.gz / Debian GNU/Linux, 
2.6.32.21-xen-stable-2.6.32.x-20100914
root            (hd0,0)
kernel          /xen-4.1-unstable.gz mem=4G dom0_mem=768M loglvl=all 
loglvl_guest=all com1=115200,8n1 sync_console console_to_ring 
console_timestamps console=com1,vga iommu=soft noirqbalance irqbalance=off
module          /vmlinuz-2.6.32.21-xen-stable-2.6.32.x-20100914 
root=/dev/mapper/serveerstertje-root ro earlyprintk=xen max_loop=255 
loop_max_part=63 libata.noacpi=1 iommu=soft 
xen-pciback.hide=(03:06.0)(07:00.0)(09:01.0)(09:01.1)(09:01.2) 
pci=resource_alignment=03:06.0;07:00.0;09:01.0;09:01.1;09:01.2;
module          /initrd.img-2.6.32.21-xen-stable-2.6.32.x-20100914


--
Sander



Monday, September 27, 2010, 5:59:52 PM, you wrote:

> On Tue, Sep 21, 2010 at 10:03:10PM +0200, Sander Eikelenboom wrote:
>> Hi Konrad,
>> 
>> I indeed have the feeling the memleak's aren't huge, and adding the diverse 
>> kernel hacking debug options, ended op doing more wrong than right.
>> I have turned off the options i added, re-instated the "swiotlb=force" in 
>> the domU config to see if it goes from a working to a freezing config, but i 
>> have the feeling it will not make a difference.
>> 
>> Then i have 4 differences left:
>> 
>> - Other dom0 kernel since the tests resulting in continous freezes of my 
>> server
>> - Other domU kernel since the tests resulting in continous freezes of my 
>> server
>> - Other workload (server is running more VM's)
>> - Other physical hardware
>>         - server is AMD phenom X6, current config Intel quad core
>>         - Both have there iommu disabled
>>         - Both are 64 capable cpu's with 64 xen, dom0 and domU
>> 
>>         - But most notably perhaps, the intel has only 2GB RAM, the server 
>> 8GB
>> 
>> Could the available physical RAM be an issue here ?
>> I limit the ram for dom0 with dom0_mem=

> OK, but that would not limit the memory of where the guest get their memory. 
> I think
> you might need this in conjunction with maxmem, say: maxmem=4GB 
> dom0_mem=max:512MB

> This way your 8GB machine has 4GB of memory available for both dom0 and the 
> guest.

>> 
>> After this test succeeds on the intel machine, i will retry the samen 
>> xen,dom0 kernel and domU kernel on the AMD config.
>> Is there anything i can especially log/configure/debug to get more detail to 
>> see if the 8GB could be the problem ?

> I think we have concluded that the device in question (3.0 PCIe USB host 
> controller) can do
> 64-bit DMA. In which case the SWIOTLB is only used as an address translation 
> system
(pfn ->> mfn, and vice-versa). If it was 32-bit it would also be utilized for 
bouncing
> the DMA buffers - there are sometimes cases were the driver does not sync 
> after the bounce
> (perfect examples are the existing radeon/nouveau drivers) ending up with 
> corruption/hanged
> device. But those show up early in development, and this is the new USB 
> controller than
> can do 64-bit instead of the dreaded 32-bit limit that all other USB 
> controllers are stuck
> with it.

> The memory difference might be a red-herring. It could be the workload - more 
> VMs
> and a latency issue (say we are waiting for an IRQ and it comes just a bit 
> too late)?
> I think the idea of narrowing down on the AMD machine the amount of memory 
> could help.

> What is the exact model of your USB capture device and the USB PCI device?



-- 
Best regards,
 Sander                            mailto:linux@xxxxxxxxxxxxxx


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

<Prev in Thread] Current Thread [Next in Thread>