Hi Konrad,
It seems i have a working setup now, passing through a xhci controller to a
guest, and keep grabbing video, it has ran for 2 days now.
But i have changed a lot of parameters, so I will try to find out which one
seems to be the problem, or that everything is miraculously fixed :-)
(it's on other hardware (intel platform) now, as the only VM, newer dom0
kernel, newer domU kernel, no swiotlb=force anymore, the boot option pasi
suggested, no irq balance etc.)
Will post again when i know something more !
--
Sander
Thursday, August 26, 2010, 11:43:21 PM, you wrote:
>> > Looking at the output, the physical addresses that DMA-ed are:
>>
>> > 0x1f2962dc0
>> > 0x1f24f2e68
>>
>> > and they look to be called quite often. In fact, there looks to be a
>> > loop that does something like this:
>>
>> > again:
>> > p = kmalloc(..)
>>
>> > dma = pci_map_single(p)
>> > pci_dma_mapping_error(dma);
>> > /* get some data.. */
>> > /* parse the: (pipe 0x80000280): IN: c0 00 00 00 0c 00 01 00 */
>> > pci_unmap_sg(dma);
>> > goto again;
>>
>> > As the virtual address sent to pci_map_single looks to be sequentially
>> > increasing.
> I am not sure what virtual address I was thinking off. Looking again
> at the dmesg output it looks actually correct. SWIOTLB is giving out
> chunks, increasing its own index, and then unmaping those pages right
> away.
> This stack-trace in the middle of the log:
>
> Aug 15 23:14:02 security kernel: [ 103.646497] em28xx #0 em28xx_isoc_copy
> :Video frame 1, length=960, odd
> Aug 15 23:14:02 security kernel: [ 103.650177] xhci_hcd 0000:08:00.0: Miss
> service interval error, set skip flag
> Aug 15 23:14:02 security kernel: [ 103.651371] em28xx #0 em28xx_irq_callback
> :urb completition error -18.
> Aug 15 23:14:02 security kernel: [ 103.651406] em28xx #0 print_err_status
> :URB status -18 [Unknown].
> Aug 15 23:14:02 security kernel: [ 103.651432] em28xx #0 em28xx_isoc_copy
> :Video frame 2, length=2888, even
> Aug 15 23:14:02 security kernel: [ 103.651454] em28xx #0 get_next_buf :No
> active queue to serve
> Aug 15 23:14:02 security kernel: [ 103.651477] em28xx #0 print_err_status
> :URB packet 63, status -18 [Unknown].
> Aug 15 23:14:02 security kernel: [ 103.653143] motion: page allocation
> failure. order:1, mode:0x20
> Aug 15 23:14:02 security kernel: [ 103.653177] Pid: 1418, comm: motion Not
> tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14
> Aug 15 23:14:02 security kernel: [ 103.653205] Call Trace:
> Aug 15 23:14:02 security kernel: [ 103.653219] <IRQ> [<ffffffff810a7ebf>]
> __alloc_pages_nodemask+0x667/0x6c7
> Aug 15 23:14:02 security kernel: [ 103.653272] [<ffffffff810071ef>] ?
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:02 security kernel: [ 103.653300] [<ffffffff810ca32f>]
> cache_alloc_refill+0x3b3/0x851
> Aug 15 23:14:02 security kernel: [ 103.653326] [<ffffffff81006b4d>] ?
> xen_force_evtchn_callback+0xd/0xf
> Aug 15 23:14:02 security kernel: [ 103.653352] [<ffffffff81007202>] ?
> check_events+0x12/0x20
> Aug 15 23:14:02 security kernel: [ 103.653375] [<ffffffff810071ef>] ?
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:02 security kernel: [ 103.653401] [<ffffffff810ce9bd>] ?
> create_object+0x2e/0x2a1
> Aug 15 23:14:02 security kernel: [ 103.653425] [<ffffffff810c9e3f>]
> kmem_cache_alloc+0xf7/0x1c4
> Aug 15 23:14:02 security kernel: [ 103.653451] [<ffffffff8135e085>] ?
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:02 security kernel: [ 103.653475] [<ffffffff810ce9bd>]
> create_object+0x2e/0x2a1
> Aug 15 23:14:02 security kernel: [ 103.653499] [<ffffffff8135e085>] ?
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:02 security kernel: [ 103.653523] [<ffffffff8135e085>] ?
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:02 security kernel: [ 103.653548] [<ffffffff810c8f68>] ?
> cache_alloc_debugcheck_after+0x1bf/0x239
> Aug 15 23:14:02 security kernel: [ 103.653575] [<ffffffff8135e085>] ?
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:02 security kernel: [ 103.653602] [<ffffffff8143e7d7>]
> kmemleak_alloc+0x21/0x3e
> Aug 15 23:14:02 security kernel: [ 103.653625] [<ffffffff810c9edf>]
> kmem_cache_alloc+0x197/0x1c4
> Aug 15 23:14:02 security kernel: [ 103.653649] [<ffffffff8135e085>]
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:02 security kernel: [ 103.653675] [<ffffffff81340450>]
> usb_hcd_submit_urb+0xb1f/0xbf5
> Aug 15 23:14:02 security kernel: [ 103.653701] [<ffffffff810071ef>] ?
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:02 security kernel: [ 103.653726] [<ffffffff81006b4d>] ?
> xen_force_evtchn_callback+0xd/0xf
> Aug 15 23:14:02 security kernel: [ 103.653751] [<ffffffff81007202>] ?
> check_events+0x12/0x20
> Aug 15 23:14:02 security kernel: [ 103.653777] [<ffffffff812b6267>] ?
> vt_console_print+0x43/0x346
> Aug 15 23:14:02 security kernel: [ 103.653802] [<ffffffff810071ef>] ?
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:02 security kernel: [ 103.653830] [<ffffffff810071ef>] ?
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:02 security kernel: [ 103.653857] [<ffffffff8106d5b0>] ?
> lock_release+0x1e0/0x1ed
> Aug 15 23:14:03 security kernel: [ 103.653882] [<ffffffff81340d45>]
> usb_submit_urb+0x249/0x260
> Aug 15 23:14:03 security kernel: [ 103.653919] [<ffffffffa0060615>]
> em28xx_irq_callback+0xc9/0xfd [em28xx]
> Aug 15 23:14:03 security kernel: [ 103.653946] [<ffffffff8133f16a>]
> usb_hcd_giveback_urb+0x84/0xb8
> Aug 15 23:14:03 security kernel: [ 103.653972] [<ffffffff8136480a>]
> xhci_handle_event+0xced/0xdf7
> Aug 15 23:14:03 security kernel: [ 103.653998] [<ffffffff813648a8>]
> xhci_handle_event+0xd8b/0xdf7
> Aug 15 23:14:03 security kernel: [ 103.654024] [<ffffffff813648a8>]
> xhci_handle_event+0xd8b/0xdf7
> Aug 15 23:14:03 security kernel: [ 103.654050] [<ffffffff81364a31>]
> xhci_irq+0x11d/0x1af
> Aug 15 23:14:03 security kernel: [ 103.654074] [<ffffffff81364ade>]
> xhci_msi_irq+0x1b/0x1d
> Aug 15 23:14:03 security kernel: [ 103.654103] [<ffffffff8108e4aa>]
> handle_IRQ_event+0x20/0xc7
> Aug 15 23:14:03 security kernel: [ 103.654128] [<ffffffff810902a4>]
> handle_level_irq+0x98/0xf3
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812997a8>]
> __xen_evtchn_do_upcall+0xe1/0x183
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8129a423>]
> xen_evtchn_do_upcall+0x2a/0x3c
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8100a9ee>]
> xen_do_hypervisor_callback+0x1e/0x30
> Aug 15 23:14:03 security kernel: [ 103.654151] <EOI> [<ffffffff810012eb>]
> ? hypercall_page+0x2eb/0x1006
> Aug 15 23:14:03 security kernel: [ 103.654151] Mem-Info:
> Aug 15 23:14:03 security kernel: [ 103.654151] DMA per-cpu:
> Aug 15 23:14:03 security kernel: [ 103.654151] CPU 0: hi: 0, btch: 1
> usd: 0
> Aug 15 23:14:03 security kernel: [ 103.654151] DMA32 per-cpu:
> Aug 15 23:14:03 security kernel: [ 103.654151] CPU 0: hi: 186, btch: 31
> usd: 168
> Aug 15 23:14:03 security kernel: [ 103.654151] active_anon:4503
> inactive_anon:4767 isolated_anon:0
> Aug 15 23:14:03 security kernel: [ 103.654151] active_file:4071
> inactive_file:9087 isolated_file:0
> Aug 15 23:14:03 security kernel: [ 103.654151] unevictable:0 dirty:1081
> writeback:0 unstable:0
> Aug 15 23:14:03 security kernel: [ 103.654151] free:1584
> slab_reclaimable:5991 slab_unreclaimable:153956
> Aug 15 23:14:03 security kernel: [ 103.654151] mapped:3401 shmem:48
> pagetables:1452 bounce:0
> Aug 15 23:14:03 security kernel: [ 103.654151] DMA free:3992kB min:60kB
> low:72kB high:88kB active_anon:84kB inactive_anon:436kB active_file:24kB
> inactive_file:148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15712kB mlocked:0kB dirty:20kB writeback:0kB mapped:16kB shmem:0kB
> slab_reclaimable:264kB slab_unreclaimable:8932kB kernel_stack:0kB
> pagetables:384kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Aug 15 23:14:03 security kernel: [ 103.654151] lowmem_reserve[]: 0 994 994
> 994
> Aug 15 23:14:03 security kernel: [ 103.654151] DMA32 free:2344kB min:4000kB
> low:5000kB high:6000kB active_anon:17928kB inactive_anon:18632kB
> active_file:16260kB inactive_file:36200kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB present:1018080kB mlocked:0kB dirty:4304kB writeback:0kB
> mapped:13588kB shmem:192kB slab_reclaimable:23700kB
> slab_unreclaimable:606892kB kernel_stack:720kB pagetables:5424kB unstable:0kB
> bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> Aug 15 23:14:03 security kernel: [ 103.654151] lowmem_reserve[]: 0 0 0 0
> Aug 15 23:14:03 security kernel: [ 103.654151] DMA: 88*4kB 1*8kB 1*16kB
> 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3992kB
> Aug 15 23:14:03 security kernel: [ 103.654151] DMA32: 406*4kB 0*8kB 1*16kB
> 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2344kB
> Aug 15 23:14:03 security kernel: [ 103.654151] 13236 total pagecache pages
> Aug 15 23:14:03 security kernel: [ 103.654151] 21 pages in swap cache
> Aug 15 23:14:03 security kernel: [ 103.654151] Swap cache stats: add 32,
> delete 11, find 27/27
> Aug 15 23:14:03 security kernel: [ 103.654151] Free swap = 524200kB
> Aug 15 23:14:03 security kernel: [ 103.654151] Total swap = 524284kB
> Aug 15 23:14:03 security kernel: [ 103.654151] 262128 pages RAM
> Aug 15 23:14:03 security kernel: [ 103.654151] 73203 pages reserved
> Aug 15 23:14:03 security kernel: [ 103.654151] 21103 pages shared
> Aug 15 23:14:03 security kernel: [ 103.654151] 105944 pages non-shared
> Aug 15 23:14:03 security kernel: [ 103.654151] kmemleak: Cannot allocate a
> kmemleak_object structure
> Aug 15 23:14:03 security kernel: [ 103.654151] Pid: 1418, comm: motion Not
> tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14
> Aug 15 23:14:03 security kernel: [ 103.654151] Call Trace:
> Aug 15 23:14:03 security kernel: [ 103.654151] <IRQ> [<ffffffff810ce9d8>]
> create_object+0x49/0x2a1
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ?
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ?
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810c8f68>] ?
> cache_alloc_debugcheck_after+0x1bf/0x239
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ?
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8143e7d7>]
> kmemleak_alloc+0x21/0x3e
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810c9edf>]
> kmem_cache_alloc+0x197/0x1c4
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>]
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81340450>]
> usb_hcd_submit_urb+0xb1f/0xbf5
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ?
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81006b4d>] ?
> xen_force_evtchn_callback+0xd/0xf
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81007202>] ?
> check_events+0x12/0x20
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812b6267>] ?
> vt_console_print+0x43/0x346
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ?
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ?
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8106d5b0>] ?
> lock_release+0x1e0/0x1ed
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81340d45>]
> usb_submit_urb+0x249/0x260
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffffa0060615>]
> em28xx_irq_callback+0xc9/0xfd [em28xx]
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8133f16a>]
> usb_hcd_giveback_urb+0x84/0xb8
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8136480a>]
> xhci_handle_event+0xced/0xdf7
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff813648a8>]
> xhci_handle_event+0xd8b/0xdf7
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff813648a8>]
> xhci_handle_event+0xd8b/0xdf7
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81364a31>]
> xhci_irq+0x11d/0x1af
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81364ade>]
> xhci_msi_irq+0x1b/0x1d
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8108e4aa>]
> handle_IRQ_event+0x20/0xc7
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810902a4>]
> handle_level_irq+0x98/0xf3
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812997a8>]
> __xen_evtchn_do_upcall+0xe1/0x183
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8129a423>]
> xen_evtchn_do_upcall+0x2a/0x3c
> Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8100a9ee>]
> xen_do_hypervisor_callback+0x1e/0x30
> Aug 15 23:14:03 security kernel: [ 103.654151] <EOI> [<ffffffff810012eb>]
> ? hypercall_page+0x2eb/0x1006
> Aug 15 23:14:03 security kernel: [ 103.654151] kmemleak: Kernel memory leak
> detector disabled
> Aug 15 23:14:03 security kernel: [ 103.654151] xhci_hcd 0000:08:00.0: Found
> td. Clear skip flag.
> Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status
> :URB packet 0, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status
> :URB packet 1, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status
> :URB packet 2, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status
> :URB packet 3, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status
> :URB packet 4, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status
> :URB packet 5, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status
> :URB packet 6, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status
> :URB packet 7, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [ 103.659299] xhci_hcd 0000:08:00.0: Miss
> service interval error, set skip flag
> Aug 15 23:14:03 security kernel: [ 103.659307] xhci_hcd 0000:08:00.0: Found
> td. Clear skip flag.
> Aug 15 23:14:03 security kernel: [ 103.661748] kmemleak: Automatic memory
> scanning thread ended
> Aug 15 23:14:03 security kernel: [ 103.755620] em28xx #0 print_err_status
> :URB packet 6, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [ 103.755630] em28xx #0 print_err_status
> :URB packet 7, status -18 [Unknown].
> The first fault is kmemleaker not being able to allocate its structure,
> and then it throws an error too.
> Then the em28xx gets horribly confused that it could not send an URB
> anymore and from there one things get worst.
> There is something causing this memory problem,and I think we can narrow
> it down if you output the kmemleaker output. You might have to
> unload/reload the em28xx driver to get an idea of where the leak is.
>> So what i have as domU kernel is:
>> - from Linus tree 2.6.36-pre with latest commit
>> 8357422d4bf33bc2c35884d4016c3fc9efbbc1d2
>> - pulled your devel/xen-pcifront-0.5 tree, and fixed the merge conflicts due
>> to the pv on hvm patches, in the same way Jeremy did for his 2.6.36 branch.
>> Conflicts:
>> drivers/xen/events.c
>> include/xen/events.h
>>
>> - Added a patch for xhci isoc that fixes another bug. (attached
>> 0001-xHCI-update-ring-dequeue-pointer-when-process-missed.patch)
>> - Added a patch that shows extra debug info for xhci from the author of the
>> xhci-isoc pathes (attached isoc_length5.patch)
>>
>> - Changed some debug lines from dbg to warn level, (just enabling xhci-debug
>> in kernel config floods the logs too fast, so i just enabled them in
>> xhci-mem.c)
>> (Attached a patch with all my changes to xhci*, including the 2 patches
>> above applied.)
>>
>> - Added some printk's to swiotlb-xen.c to see which functions were used, and
>> let some of them print the address as well, in the hope i could find some
>> debug info there.(attached)
> So the other thing I realized is that xhci can actually do 64bit
> transfers. Try booting without the 'swiotlb=force' (but still have
> iommu=soft).
>>
>> Apart from fixing the xhci in the end, is there a way to prevent xen from
>> freezing altogether without leaving a trace ?
> I am still puzzled as why Xen hypervisor would freeze. It _should_ work
> just fine.
>> Even an Oops is much easier to debug than a freeze. Due to the nature of DMA
>> that could perhaps be difficult, although there is an DMA API ...
>>
>> Had a fruitful LinuxCon ?
> Quite so!
>>
>> --
>> Sander
--
Best regards,
Sander mailto:linux@xxxxxxxxxxxxxx
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|