Hello Konrad,
I have done some more tests, the results:
- boot xen with mem=4G, > 2 days uptime with passthrough and videograbbing
- boot xen without mem=4G, < 1 day freeze with passthrough and videograbbing
- on both no problems as long as you don't grab video (so the controller
doesn't do much)
- on both no problems when grabbing video with usb2, so it's xhci specific
I haven't changed anything else, same number of VM's running etc. etc.,
videograbbing is working on both (until the freeze or until i ended the test)
I'm reading some messages about msi(-x) interrupt problems with xen on
xen-devel, and suggestions to try noirqbalance with xen, so on both i use
noirqbalance.
So it seems to be related to the amount of mem available.
I do see one difference on the domU, with mem=4G i see some occasional warnings
in syslog:
Sep 28 17:55:02 security kernel: [81744.078288] xhci_hcd 0000:07:00.0: WARN:
transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.092653] xhci_hcd 0000:07:00.0: WARN:
transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN:
transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN:
transfer error on endpoint
Sep 28 17:55:02 security kernel: [81744.093647] xhci_hcd 0000:07:00.0: WARN:
transfer error on endpoint
I don't see these warnings in the syslog when no mem=4G is used, so a hunch
would be it goes wrong there while the xhci code tries to clean something up.
It could do something "strange" that seems to work on bare metal and on xen
with mem=4G, but freezes everything with mem > 4G and gives no time to write
the warning to the syslog / disk in time.
in the syslog of dom0 i do see some occasional memleaks going by, but one set
could be related:
Sep 28 17:55:19 localhost kernel: [81962.053321] kmemleak: 22 new suspected
memory leaks (see /sys/kernel/debug/kmemleak)
I will add a script that cat's the content of /sys/kernel/debug/kmemleak to
syslog when kmemleak reports new suspected leaks.
Any suggestions to try to debug this further ?
I boot with:
title xen-4.1-unstable.gz / Debian GNU/Linux,
2.6.32.21-xen-stable-2.6.32.x-20100914
root (hd0,0)
kernel /xen-4.1-unstable.gz mem=4G dom0_mem=768M loglvl=all
loglvl_guest=all com1=115200,8n1 sync_console console_to_ring
console_timestamps console=com1,vga iommu=soft noirqbalance irqbalance=off
module /vmlinuz-2.6.32.21-xen-stable-2.6.32.x-20100914
root=/dev/mapper/serveerstertje-root ro earlyprintk=xen max_loop=255
loop_max_part=63 libata.noacpi=1 iommu=soft
xen-pciback.hide=(03:06.0)(07:00.0)(09:01.0)(09:01.1)(09:01.2)
pci=resource_alignment=03:06.0;07:00.0;09:01.0;09:01.1;09:01.2;
module /initrd.img-2.6.32.21-xen-stable-2.6.32.x-20100914
--
Sander
Monday, September 27, 2010, 5:59:52 PM, you wrote:
> On Tue, Sep 21, 2010 at 10:03:10PM +0200, Sander Eikelenboom wrote:
>> Hi Konrad,
>>
>> I indeed have the feeling the memleak's aren't huge, and adding the diverse
>> kernel hacking debug options, ended op doing more wrong than right.
>> I have turned off the options i added, re-instated the "swiotlb=force" in
>> the domU config to see if it goes from a working to a freezing config, but i
>> have the feeling it will not make a difference.
>>
>> Then i have 4 differences left:
>>
>> - Other dom0 kernel since the tests resulting in continous freezes of my
>> server
>> - Other domU kernel since the tests resulting in continous freezes of my
>> server
>> - Other workload (server is running more VM's)
>> - Other physical hardware
>> - server is AMD phenom X6, current config Intel quad core
>> - Both have there iommu disabled
>> - Both are 64 capable cpu's with 64 xen, dom0 and domU
>>
>> - But most notably perhaps, the intel has only 2GB RAM, the server
>> 8GB
>>
>> Could the available physical RAM be an issue here ?
>> I limit the ram for dom0 with dom0_mem=
> OK, but that would not limit the memory of where the guest get their memory.
> I think
> you might need this in conjunction with maxmem, say: maxmem=4GB
> dom0_mem=max:512MB
> This way your 8GB machine has 4GB of memory available for both dom0 and the
> guest.
>>
>> After this test succeeds on the intel machine, i will retry the samen
>> xen,dom0 kernel and domU kernel on the AMD config.
>> Is there anything i can especially log/configure/debug to get more detail to
>> see if the 8GB could be the problem ?
> I think we have concluded that the device in question (3.0 PCIe USB host
> controller) can do
> 64-bit DMA. In which case the SWIOTLB is only used as an address translation
> system
(pfn ->> mfn, and vice-versa). If it was 32-bit it would also be utilized for
bouncing
> the DMA buffers - there are sometimes cases were the driver does not sync
> after the bounce
> (perfect examples are the existing radeon/nouveau drivers) ending up with
> corruption/hanged
> device. But those show up early in development, and this is the new USB
> controller than
> can do 64-bit instead of the dreaded 32-bit limit that all other USB
> controllers are stuck
> with it.
> The memory difference might be a red-herring. It could be the workload - more
> VMs
> and a latency issue (say we are waiting for an IRQ and it comes just a bit
> too late)?
> I think the idea of narrowing down on the AMD machine the amount of memory
> could help.
> What is the exact model of your USB capture device and the USB PCI device?
--
Best regards,
Sander mailto:linux@xxxxxxxxxxxxxx
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|