| > > Looking at the output, the physical addresses that DMA-ed are:
> 
> > 0x1f2962dc0
> > 0x1f24f2e68
> 
> > and they look to be called quite often. In fact, there looks to be a
> > loop that does something like this:
> 
> > again:
> >   p = kmalloc(..)
> 
> >   dma = pci_map_single(p)
> >   pci_dma_mapping_error(dma);
> >   /* get some data.. */
> >   /* parse the: (pipe 0x80000280): IN:  c0 00 00 00 0c 00 01 00 */
> >   pci_unmap_sg(dma);
> >   goto again;
> 
> > As the virtual address sent to pci_map_single looks to be sequentially
> > increasing.
I am not sure what virtual address I was thinking off. Looking again
at the dmesg output it looks actually correct. SWIOTLB is giving out
chunks, increasing its own index, and then unmaping those pages right
away.
This stack-trace in the middle of the log:
 
Aug 15 23:14:02 security kernel: [  103.646497] em28xx #0 em28xx_isoc_copy 
:Video frame 1, length=960, odd
Aug 15 23:14:02 security kernel: [  103.650177] xhci_hcd 0000:08:00.0: Miss 
service interval error, set skip flag
Aug 15 23:14:02 security kernel: [  103.651371] em28xx #0 em28xx_irq_callback 
:urb completition error -18.
Aug 15 23:14:02 security kernel: [  103.651406] em28xx #0 print_err_status :URB 
status -18 [Unknown].
Aug 15 23:14:02 security kernel: [  103.651432] em28xx #0 em28xx_isoc_copy 
:Video frame 2, length=2888, even
Aug 15 23:14:02 security kernel: [  103.651454] em28xx #0 get_next_buf :No 
active queue to serve
Aug 15 23:14:02 security kernel: [  103.651477] em28xx #0 print_err_status :URB 
packet 63, status -18 [Unknown].
Aug 15 23:14:02 security kernel: [  103.653143] motion: page allocation 
failure. order:1, mode:0x20
Aug 15 23:14:02 security kernel: [  103.653177] Pid: 1418, comm: motion Not 
tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14
Aug 15 23:14:02 security kernel: [  103.653205] Call Trace:
Aug 15 23:14:02 security kernel: [  103.653219]  <IRQ>  [<ffffffff810a7ebf>] 
__alloc_pages_nodemask+0x667/0x6c7
Aug 15 23:14:02 security kernel: [  103.653272]  [<ffffffff810071ef>] ? 
xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:02 security kernel: [  103.653300]  [<ffffffff810ca32f>] 
cache_alloc_refill+0x3b3/0x851
Aug 15 23:14:02 security kernel: [  103.653326]  [<ffffffff81006b4d>] ? 
xen_force_evtchn_callback+0xd/0xf
Aug 15 23:14:02 security kernel: [  103.653352]  [<ffffffff81007202>] ? 
check_events+0x12/0x20
Aug 15 23:14:02 security kernel: [  103.653375]  [<ffffffff810071ef>] ? 
xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:02 security kernel: [  103.653401]  [<ffffffff810ce9bd>] ? 
create_object+0x2e/0x2a1
Aug 15 23:14:02 security kernel: [  103.653425]  [<ffffffff810c9e3f>] 
kmem_cache_alloc+0xf7/0x1c4
Aug 15 23:14:02 security kernel: [  103.653451]  [<ffffffff8135e085>] ? 
xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:02 security kernel: [  103.653475]  [<ffffffff810ce9bd>] 
create_object+0x2e/0x2a1
Aug 15 23:14:02 security kernel: [  103.653499]  [<ffffffff8135e085>] ? 
xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:02 security kernel: [  103.653523]  [<ffffffff8135e085>] ? 
xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:02 security kernel: [  103.653548]  [<ffffffff810c8f68>] ? 
cache_alloc_debugcheck_after+0x1bf/0x239
Aug 15 23:14:02 security kernel: [  103.653575]  [<ffffffff8135e085>] ? 
xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:02 security kernel: [  103.653602]  [<ffffffff8143e7d7>] 
kmemleak_alloc+0x21/0x3e
Aug 15 23:14:02 security kernel: [  103.653625]  [<ffffffff810c9edf>] 
kmem_cache_alloc+0x197/0x1c4
Aug 15 23:14:02 security kernel: [  103.653649]  [<ffffffff8135e085>] 
xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:02 security kernel: [  103.653675]  [<ffffffff81340450>] 
usb_hcd_submit_urb+0xb1f/0xbf5
Aug 15 23:14:02 security kernel: [  103.653701]  [<ffffffff810071ef>] ? 
xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:02 security kernel: [  103.653726]  [<ffffffff81006b4d>] ? 
xen_force_evtchn_callback+0xd/0xf
Aug 15 23:14:02 security kernel: [  103.653751]  [<ffffffff81007202>] ? 
check_events+0x12/0x20
Aug 15 23:14:02 security kernel: [  103.653777]  [<ffffffff812b6267>] ? 
vt_console_print+0x43/0x346
Aug 15 23:14:02 security kernel: [  103.653802]  [<ffffffff810071ef>] ? 
xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:02 security kernel: [  103.653830]  [<ffffffff810071ef>] ? 
xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:02 security kernel: [  103.653857]  [<ffffffff8106d5b0>] ? 
lock_release+0x1e0/0x1ed
Aug 15 23:14:03 security kernel: [  103.653882]  [<ffffffff81340d45>] 
usb_submit_urb+0x249/0x260
Aug 15 23:14:03 security kernel: [  103.653919]  [<ffffffffa0060615>] 
em28xx_irq_callback+0xc9/0xfd [em28xx]
Aug 15 23:14:03 security kernel: [  103.653946]  [<ffffffff8133f16a>] 
usb_hcd_giveback_urb+0x84/0xb8
Aug 15 23:14:03 security kernel: [  103.653972]  [<ffffffff8136480a>] 
xhci_handle_event+0xced/0xdf7
Aug 15 23:14:03 security kernel: [  103.653998]  [<ffffffff813648a8>] 
xhci_handle_event+0xd8b/0xdf7
Aug 15 23:14:03 security kernel: [  103.654024]  [<ffffffff813648a8>] 
xhci_handle_event+0xd8b/0xdf7
Aug 15 23:14:03 security kernel: [  103.654050]  [<ffffffff81364a31>] 
xhci_irq+0x11d/0x1af
Aug 15 23:14:03 security kernel: [  103.654074]  [<ffffffff81364ade>] 
xhci_msi_irq+0x1b/0x1d
Aug 15 23:14:03 security kernel: [  103.654103]  [<ffffffff8108e4aa>] 
handle_IRQ_event+0x20/0xc7
Aug 15 23:14:03 security kernel: [  103.654128]  [<ffffffff810902a4>] 
handle_level_irq+0x98/0xf3
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff812997a8>] 
__xen_evtchn_do_upcall+0xe1/0x183
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8129a423>] 
xen_evtchn_do_upcall+0x2a/0x3c
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8100a9ee>] 
xen_do_hypervisor_callback+0x1e/0x30
Aug 15 23:14:03 security kernel: [  103.654151]  <EOI>  [<ffffffff810012eb>] ? 
hypercall_page+0x2eb/0x1006
Aug 15 23:14:03 security kernel: [  103.654151] Mem-Info:
Aug 15 23:14:03 security kernel: [  103.654151] DMA per-cpu:
Aug 15 23:14:03 security kernel: [  103.654151] CPU    0: hi:    0, btch:   1 
usd:   0
Aug 15 23:14:03 security kernel: [  103.654151] DMA32 per-cpu:
Aug 15 23:14:03 security kernel: [  103.654151] CPU    0: hi:  186, btch:  31 
usd: 168
Aug 15 23:14:03 security kernel: [  103.654151] active_anon:4503 
inactive_anon:4767 isolated_anon:0
Aug 15 23:14:03 security kernel: [  103.654151]  active_file:4071 
inactive_file:9087 isolated_file:0
Aug 15 23:14:03 security kernel: [  103.654151]  unevictable:0 dirty:1081 
writeback:0 unstable:0
Aug 15 23:14:03 security kernel: [  103.654151]  free:1584 
slab_reclaimable:5991 slab_unreclaimable:153956
Aug 15 23:14:03 security kernel: [  103.654151]  mapped:3401 shmem:48 
pagetables:1452 bounce:0
Aug 15 23:14:03 security kernel: [  103.654151] DMA free:3992kB min:60kB 
low:72kB high:88kB active_anon:84kB inactive_anon:436kB active_file:24kB 
inactive_file:148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:15712kB mlocked:0kB dirty:20kB writeback:0kB mapped:16kB shmem:0kB 
slab_reclaimable:264kB slab_unreclaimable:8932kB kernel_stack:0kB 
pagetables:384kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? no
Aug 15 23:14:03 security kernel: [  103.654151] lowmem_reserve[]: 0 994 994 994
Aug 15 23:14:03 security kernel: [  103.654151] DMA32 free:2344kB min:4000kB 
low:5000kB high:6000kB active_anon:17928kB inactive_anon:18632kB 
active_file:16260kB inactive_file:36200kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB present:1018080kB mlocked:0kB dirty:4304kB writeback:0kB 
mapped:13588kB shmem:192kB slab_reclaimable:23700kB slab_unreclaimable:606892kB 
kernel_stack:720kB pagetables:5424kB unstable:0kB bounce:0kB writeback_tmp:0kB 
pages_scanned:0 all_unreclaimable? no
Aug 15 23:14:03 security kernel: [  103.654151] lowmem_reserve[]: 0 0 0 0
Aug 15 23:14:03 security kernel: [  103.654151] DMA: 88*4kB 1*8kB 1*16kB 1*32kB 
0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3992kB
Aug 15 23:14:03 security kernel: [  103.654151] DMA32: 406*4kB 0*8kB 1*16kB 
0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2344kB
Aug 15 23:14:03 security kernel: [  103.654151] 13236 total pagecache pages
Aug 15 23:14:03 security kernel: [  103.654151] 21 pages in swap cache
Aug 15 23:14:03 security kernel: [  103.654151] Swap cache stats: add 32, 
delete 11, find 27/27
Aug 15 23:14:03 security kernel: [  103.654151] Free swap  = 524200kB
Aug 15 23:14:03 security kernel: [  103.654151] Total swap = 524284kB
Aug 15 23:14:03 security kernel: [  103.654151] 262128 pages RAM
Aug 15 23:14:03 security kernel: [  103.654151] 73203 pages reserved
Aug 15 23:14:03 security kernel: [  103.654151] 21103 pages shared
Aug 15 23:14:03 security kernel: [  103.654151] 105944 pages non-shared
Aug 15 23:14:03 security kernel: [  103.654151] kmemleak: Cannot allocate a 
kmemleak_object structure
Aug 15 23:14:03 security kernel: [  103.654151] Pid: 1418, comm: motion Not 
tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14
Aug 15 23:14:03 security kernel: [  103.654151] Call Trace:
Aug 15 23:14:03 security kernel: [  103.654151]  <IRQ>  [<ffffffff810ce9d8>] 
create_object+0x49/0x2a1
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] ? 
xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] ? 
xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810c8f68>] ? 
cache_alloc_debugcheck_after+0x1bf/0x239
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] ? 
xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8143e7d7>] 
kmemleak_alloc+0x21/0x3e
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810c9edf>] 
kmem_cache_alloc+0x197/0x1c4
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] 
xhci_urb_enqueue+0x14c/0x3f5
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81340450>] 
usb_hcd_submit_urb+0xb1f/0xbf5
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810071ef>] ? 
xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81006b4d>] ? 
xen_force_evtchn_callback+0xd/0xf
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81007202>] ? 
check_events+0x12/0x20
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff812b6267>] ? 
vt_console_print+0x43/0x346
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810071ef>] ? 
xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810071ef>] ? 
xen_restore_fl_direct_end+0x0/0x1
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8106d5b0>] ? 
lock_release+0x1e0/0x1ed
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81340d45>] 
usb_submit_urb+0x249/0x260
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffffa0060615>] 
em28xx_irq_callback+0xc9/0xfd [em28xx]
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8133f16a>] 
usb_hcd_giveback_urb+0x84/0xb8
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8136480a>] 
xhci_handle_event+0xced/0xdf7
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff813648a8>] 
xhci_handle_event+0xd8b/0xdf7
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff813648a8>] 
xhci_handle_event+0xd8b/0xdf7
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81364a31>] 
xhci_irq+0x11d/0x1af
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81364ade>] 
xhci_msi_irq+0x1b/0x1d
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8108e4aa>] 
handle_IRQ_event+0x20/0xc7
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810902a4>] 
handle_level_irq+0x98/0xf3
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff812997a8>] 
__xen_evtchn_do_upcall+0xe1/0x183
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8129a423>] 
xen_evtchn_do_upcall+0x2a/0x3c
Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8100a9ee>] 
xen_do_hypervisor_callback+0x1e/0x30
Aug 15 23:14:03 security kernel: [  103.654151]  <EOI>  [<ffffffff810012eb>] ? 
hypercall_page+0x2eb/0x1006
Aug 15 23:14:03 security kernel: [  103.654151] kmemleak: Kernel memory leak 
detector disabled
Aug 15 23:14:03 security kernel: [  103.654151] xhci_hcd 0000:08:00.0: Found 
td. Clear skip flag.
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB 
packet 0, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB 
packet 1, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB 
packet 2, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB 
packet 3, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB 
packet 4, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB 
packet 5, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB 
packet 6, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status :URB 
packet 7, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.659299] xhci_hcd 0000:08:00.0: Miss 
service interval error, set skip flag
Aug 15 23:14:03 security kernel: [  103.659307] xhci_hcd 0000:08:00.0: Found 
td. Clear skip flag.
Aug 15 23:14:03 security kernel: [  103.661748] kmemleak: Automatic memory 
scanning thread ended
Aug 15 23:14:03 security kernel: [  103.755620] em28xx #0 print_err_status :URB 
packet 6, status -18 [Unknown].
Aug 15 23:14:03 security kernel: [  103.755630] em28xx #0 print_err_status :URB 
packet 7, status -18 [Unknown].
The first fault is kmemleaker not being able to allocate its structure,
and then it throws an error too.
Then the em28xx gets horribly confused that it could not send an URB
anymore and from there one things get worst.
There is something causing this memory problem,and I think we can narrow
it down if you output the kmemleaker output. You might have to
unload/reload the em28xx driver to get an idea of where the leak is.
> So what i have as domU kernel is:
> - from Linus tree 2.6.36-pre with latest commit 
> 8357422d4bf33bc2c35884d4016c3fc9efbbc1d2
> - pulled your devel/xen-pcifront-0.5 tree, and fixed the merge conflicts due 
> to the pv on hvm patches, in the same way Jeremy did for his 2.6.36 branch.
>     Conflicts:
>         drivers/xen/events.c
>         include/xen/events.h
> 
> - Added a patch for xhci isoc that fixes another bug. (attached 
> 0001-xHCI-update-ring-dequeue-pointer-when-process-missed.patch)
> - Added a patch that shows extra debug info for xhci from the author of the 
> xhci-isoc pathes (attached isoc_length5.patch)
> 
> - Changed some debug lines from dbg to warn level, (just enabling xhci-debug 
> in kernel config floods the logs too fast, so i just enabled them in 
> xhci-mem.c)
>   (Attached a patch with all my changes to xhci*, including the 2 patches 
> above applied.)
> 
> - Added some printk's to swiotlb-xen.c to see which functions were used, and 
> let some of them print the address as well, in the hope i could find some 
> debug info there.(attached)
So the other thing I realized is that xhci can actually do 64bit
transfers. Try booting without the 'swiotlb=force' (but still have
iommu=soft).
> 
> Apart from fixing the xhci in the end, is there a way to prevent xen from 
> freezing altogether without leaving a trace ?
I am still puzzled as why Xen hypervisor would freeze. It _should_ work
just fine.
> Even an Oops is much easier to debug than a freeze. Due to the nature of DMA 
> that could perhaps be difficult, although there is an DMA API ...
> 
> Had a fruitful LinuxCon ?
Quite so!
> 
> --
> Sander
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
 |