[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Re: xen-swiotlb



Hi Konrad,

Thx for your reply, I have been hiking in the Swiss mountains for about 2 
weeks, so i had more Zen than Xen, which is nice from time to time, but delayed 
my reply :-)
I will try everything again in the next week some time, i also saw from the 
corner of my eye someone else on the usb-devel mailing list with some problems 
with XHCI-ISOC which could be related so i will also search for patches there.
I will keep you posted with my new experiences.


Thursday, August 26, 2010, 11:43:21 PM, you wrote:

>> > Looking at the output, the physical addresses that DMA-ed are:
>> 
>> > 0x1f2962dc0
>> > 0x1f24f2e68
>> 
>> > and they look to be called quite often. In fact, there looks to be a
>> > loop that does something like this:
>> 
>> > again:
>> >   p = kmalloc(..)
>> 
>> >   dma = pci_map_single(p)
>> >   pci_dma_mapping_error(dma);
>> >   /* get some data.. */
>> >   /* parse the: (pipe 0x80000280): IN:  c0 00 00 00 0c 00 01 00 */
>> >   pci_unmap_sg(dma);
>> >   goto again;
>> 
>> > As the virtual address sent to pci_map_single looks to be sequentially
>> > increasing.

> I am not sure what virtual address I was thinking off. Looking again
> at the dmesg output it looks actually correct. SWIOTLB is giving out
> chunks, increasing its own index, and then unmaping those pages right
> away.


> This stack-trace in the middle of the log:
>  
> Aug 15 23:14:02 security kernel: [  103.646497] em28xx #0 em28xx_isoc_copy 
> :Video frame 1, length=960, odd
> Aug 15 23:14:02 security kernel: [  103.650177] xhci_hcd 0000:08:00.0: Miss 
> service interval error, set skip flag
> Aug 15 23:14:02 security kernel: [  103.651371] em28xx #0 em28xx_irq_callback 
> :urb completition error -18.
> Aug 15 23:14:02 security kernel: [  103.651406] em28xx #0 print_err_status 
> :URB status -18 [Unknown].
> Aug 15 23:14:02 security kernel: [  103.651432] em28xx #0 em28xx_isoc_copy 
> :Video frame 2, length=2888, even
> Aug 15 23:14:02 security kernel: [  103.651454] em28xx #0 get_next_buf :No 
> active queue to serve
> Aug 15 23:14:02 security kernel: [  103.651477] em28xx #0 print_err_status 
> :URB packet 63, status -18 [Unknown].
> Aug 15 23:14:02 security kernel: [  103.653143] motion: page allocation 
> failure. order:1, mode:0x20
> Aug 15 23:14:02 security kernel: [  103.653177] Pid: 1418, comm: motion Not 
> tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14
> Aug 15 23:14:02 security kernel: [  103.653205] Call Trace:
> Aug 15 23:14:02 security kernel: [  103.653219]  <IRQ>  [<ffffffff810a7ebf>] 
> __alloc_pages_nodemask+0x667/0x6c7
> Aug 15 23:14:02 security kernel: [  103.653272]  [<ffffffff810071ef>] ? 
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:02 security kernel: [  103.653300]  [<ffffffff810ca32f>] 
> cache_alloc_refill+0x3b3/0x851
> Aug 15 23:14:02 security kernel: [  103.653326]  [<ffffffff81006b4d>] ? 
> xen_force_evtchn_callback+0xd/0xf
> Aug 15 23:14:02 security kernel: [  103.653352]  [<ffffffff81007202>] ? 
> check_events+0x12/0x20
> Aug 15 23:14:02 security kernel: [  103.653375]  [<ffffffff810071ef>] ? 
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:02 security kernel: [  103.653401]  [<ffffffff810ce9bd>] ? 
> create_object+0x2e/0x2a1
> Aug 15 23:14:02 security kernel: [  103.653425]  [<ffffffff810c9e3f>] 
> kmem_cache_alloc+0xf7/0x1c4
> Aug 15 23:14:02 security kernel: [  103.653451]  [<ffffffff8135e085>] ? 
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:02 security kernel: [  103.653475]  [<ffffffff810ce9bd>] 
> create_object+0x2e/0x2a1
> Aug 15 23:14:02 security kernel: [  103.653499]  [<ffffffff8135e085>] ? 
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:02 security kernel: [  103.653523]  [<ffffffff8135e085>] ? 
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:02 security kernel: [  103.653548]  [<ffffffff810c8f68>] ? 
> cache_alloc_debugcheck_after+0x1bf/0x239
> Aug 15 23:14:02 security kernel: [  103.653575]  [<ffffffff8135e085>] ? 
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:02 security kernel: [  103.653602]  [<ffffffff8143e7d7>] 
> kmemleak_alloc+0x21/0x3e
> Aug 15 23:14:02 security kernel: [  103.653625]  [<ffffffff810c9edf>] 
> kmem_cache_alloc+0x197/0x1c4
> Aug 15 23:14:02 security kernel: [  103.653649]  [<ffffffff8135e085>] 
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:02 security kernel: [  103.653675]  [<ffffffff81340450>] 
> usb_hcd_submit_urb+0xb1f/0xbf5
> Aug 15 23:14:02 security kernel: [  103.653701]  [<ffffffff810071ef>] ? 
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:02 security kernel: [  103.653726]  [<ffffffff81006b4d>] ? 
> xen_force_evtchn_callback+0xd/0xf
> Aug 15 23:14:02 security kernel: [  103.653751]  [<ffffffff81007202>] ? 
> check_events+0x12/0x20
> Aug 15 23:14:02 security kernel: [  103.653777]  [<ffffffff812b6267>] ? 
> vt_console_print+0x43/0x346
> Aug 15 23:14:02 security kernel: [  103.653802]  [<ffffffff810071ef>] ? 
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:02 security kernel: [  103.653830]  [<ffffffff810071ef>] ? 
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:02 security kernel: [  103.653857]  [<ffffffff8106d5b0>] ? 
> lock_release+0x1e0/0x1ed
> Aug 15 23:14:03 security kernel: [  103.653882]  [<ffffffff81340d45>] 
> usb_submit_urb+0x249/0x260
> Aug 15 23:14:03 security kernel: [  103.653919]  [<ffffffffa0060615>] 
> em28xx_irq_callback+0xc9/0xfd [em28xx]
> Aug 15 23:14:03 security kernel: [  103.653946]  [<ffffffff8133f16a>] 
> usb_hcd_giveback_urb+0x84/0xb8
> Aug 15 23:14:03 security kernel: [  103.653972]  [<ffffffff8136480a>] 
> xhci_handle_event+0xced/0xdf7
> Aug 15 23:14:03 security kernel: [  103.653998]  [<ffffffff813648a8>] 
> xhci_handle_event+0xd8b/0xdf7
> Aug 15 23:14:03 security kernel: [  103.654024]  [<ffffffff813648a8>] 
> xhci_handle_event+0xd8b/0xdf7
> Aug 15 23:14:03 security kernel: [  103.654050]  [<ffffffff81364a31>] 
> xhci_irq+0x11d/0x1af
> Aug 15 23:14:03 security kernel: [  103.654074]  [<ffffffff81364ade>] 
> xhci_msi_irq+0x1b/0x1d
> Aug 15 23:14:03 security kernel: [  103.654103]  [<ffffffff8108e4aa>] 
> handle_IRQ_event+0x20/0xc7
> Aug 15 23:14:03 security kernel: [  103.654128]  [<ffffffff810902a4>] 
> handle_level_irq+0x98/0xf3
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff812997a8>] 
> __xen_evtchn_do_upcall+0xe1/0x183
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8129a423>] 
> xen_evtchn_do_upcall+0x2a/0x3c
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8100a9ee>] 
> xen_do_hypervisor_callback+0x1e/0x30
> Aug 15 23:14:03 security kernel: [  103.654151]  <EOI>  [<ffffffff810012eb>] 
> ? hypercall_page+0x2eb/0x1006
> Aug 15 23:14:03 security kernel: [  103.654151] Mem-Info:
> Aug 15 23:14:03 security kernel: [  103.654151] DMA per-cpu:
> Aug 15 23:14:03 security kernel: [  103.654151] CPU    0: hi:    0, btch:   1 
> usd:   0
> Aug 15 23:14:03 security kernel: [  103.654151] DMA32 per-cpu:
> Aug 15 23:14:03 security kernel: [  103.654151] CPU    0: hi:  186, btch:  31 
> usd: 168
> Aug 15 23:14:03 security kernel: [  103.654151] active_anon:4503 
> inactive_anon:4767 isolated_anon:0
> Aug 15 23:14:03 security kernel: [  103.654151]  active_file:4071 
> inactive_file:9087 isolated_file:0
> Aug 15 23:14:03 security kernel: [  103.654151]  unevictable:0 dirty:1081 
> writeback:0 unstable:0
> Aug 15 23:14:03 security kernel: [  103.654151]  free:1584 
> slab_reclaimable:5991 slab_unreclaimable:153956
> Aug 15 23:14:03 security kernel: [  103.654151]  mapped:3401 shmem:48 
> pagetables:1452 bounce:0
> Aug 15 23:14:03 security kernel: [  103.654151] DMA free:3992kB min:60kB 
> low:72kB high:88kB active_anon:84kB inactive_anon:436kB active_file:24kB 
> inactive_file:148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
> present:15712kB mlocked:0kB dirty:20kB writeback:0kB mapped:16kB shmem:0kB 
> slab_reclaimable:264kB slab_unreclaimable:8932kB kernel_stack:0kB 
> pagetables:384kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 
> all_unreclaimable? no
> Aug 15 23:14:03 security kernel: [  103.654151] lowmem_reserve[]: 0 994 994 
> 994
> Aug 15 23:14:03 security kernel: [  103.654151] DMA32 free:2344kB min:4000kB 
> low:5000kB high:6000kB active_anon:17928kB inactive_anon:18632kB 
> active_file:16260kB inactive_file:36200kB unevictable:0kB isolated(anon):0kB 
> isolated(file):0kB present:1018080kB mlocked:0kB dirty:4304kB writeback:0kB 
> mapped:13588kB shmem:192kB slab_reclaimable:23700kB 
> slab_unreclaimable:606892kB kernel_stack:720kB pagetables:5424kB unstable:0kB 
> bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> Aug 15 23:14:03 security kernel: [  103.654151] lowmem_reserve[]: 0 0 0 0
> Aug 15 23:14:03 security kernel: [  103.654151] DMA: 88*4kB 1*8kB 1*16kB 
> 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3992kB
> Aug 15 23:14:03 security kernel: [  103.654151] DMA32: 406*4kB 0*8kB 1*16kB 
> 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2344kB
> Aug 15 23:14:03 security kernel: [  103.654151] 13236 total pagecache pages
> Aug 15 23:14:03 security kernel: [  103.654151] 21 pages in swap cache
> Aug 15 23:14:03 security kernel: [  103.654151] Swap cache stats: add 32, 
> delete 11, find 27/27
> Aug 15 23:14:03 security kernel: [  103.654151] Free swap  = 524200kB
> Aug 15 23:14:03 security kernel: [  103.654151] Total swap = 524284kB
> Aug 15 23:14:03 security kernel: [  103.654151] 262128 pages RAM
> Aug 15 23:14:03 security kernel: [  103.654151] 73203 pages reserved
> Aug 15 23:14:03 security kernel: [  103.654151] 21103 pages shared
> Aug 15 23:14:03 security kernel: [  103.654151] 105944 pages non-shared
> Aug 15 23:14:03 security kernel: [  103.654151] kmemleak: Cannot allocate a 
> kmemleak_object structure
> Aug 15 23:14:03 security kernel: [  103.654151] Pid: 1418, comm: motion Not 
> tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14
> Aug 15 23:14:03 security kernel: [  103.654151] Call Trace:
> Aug 15 23:14:03 security kernel: [  103.654151]  <IRQ>  [<ffffffff810ce9d8>] 
> create_object+0x49/0x2a1
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] ? 
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] ? 
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810c8f68>] ? 
> cache_alloc_debugcheck_after+0x1bf/0x239
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] ? 
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8143e7d7>] 
> kmemleak_alloc+0x21/0x3e
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810c9edf>] 
> kmem_cache_alloc+0x197/0x1c4
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8135e085>] 
> xhci_urb_enqueue+0x14c/0x3f5
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81340450>] 
> usb_hcd_submit_urb+0xb1f/0xbf5
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810071ef>] ? 
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81006b4d>] ? 
> xen_force_evtchn_callback+0xd/0xf
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81007202>] ? 
> check_events+0x12/0x20
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff812b6267>] ? 
> vt_console_print+0x43/0x346
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810071ef>] ? 
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810071ef>] ? 
> xen_restore_fl_direct_end+0x0/0x1
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8106d5b0>] ? 
> lock_release+0x1e0/0x1ed
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81340d45>] 
> usb_submit_urb+0x249/0x260
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffffa0060615>] 
> em28xx_irq_callback+0xc9/0xfd [em28xx]
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8133f16a>] 
> usb_hcd_giveback_urb+0x84/0xb8
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8136480a>] 
> xhci_handle_event+0xced/0xdf7
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff813648a8>] 
> xhci_handle_event+0xd8b/0xdf7
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff813648a8>] 
> xhci_handle_event+0xd8b/0xdf7
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81364a31>] 
> xhci_irq+0x11d/0x1af
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff81364ade>] 
> xhci_msi_irq+0x1b/0x1d
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8108e4aa>] 
> handle_IRQ_event+0x20/0xc7
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff810902a4>] 
> handle_level_irq+0x98/0xf3
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff812997a8>] 
> __xen_evtchn_do_upcall+0xe1/0x183
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8129a423>] 
> xen_evtchn_do_upcall+0x2a/0x3c
> Aug 15 23:14:03 security kernel: [  103.654151]  [<ffffffff8100a9ee>] 
> xen_do_hypervisor_callback+0x1e/0x30
> Aug 15 23:14:03 security kernel: [  103.654151]  <EOI>  [<ffffffff810012eb>] 
> ? hypercall_page+0x2eb/0x1006
> Aug 15 23:14:03 security kernel: [  103.654151] kmemleak: Kernel memory leak 
> detector disabled
> Aug 15 23:14:03 security kernel: [  103.654151] xhci_hcd 0000:08:00.0: Found 
> td. Clear skip flag.
> Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status 
> :URB packet 0, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status 
> :URB packet 1, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status 
> :URB packet 2, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status 
> :URB packet 3, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status 
> :URB packet 4, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status 
> :URB packet 5, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status 
> :URB packet 6, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [  103.654151] em28xx #0 print_err_status 
> :URB packet 7, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [  103.659299] xhci_hcd 0000:08:00.0: Miss 
> service interval error, set skip flag
> Aug 15 23:14:03 security kernel: [  103.659307] xhci_hcd 0000:08:00.0: Found 
> td. Clear skip flag.
> Aug 15 23:14:03 security kernel: [  103.661748] kmemleak: Automatic memory 
> scanning thread ended
> Aug 15 23:14:03 security kernel: [  103.755620] em28xx #0 print_err_status 
> :URB packet 6, status -18 [Unknown].
> Aug 15 23:14:03 security kernel: [  103.755630] em28xx #0 print_err_status 
> :URB packet 7, status -18 [Unknown].


> The first fault is kmemleaker not being able to allocate its structure,
> and then it throws an error too.

> Then the em28xx gets horribly confused that it could not send an URB
> anymore and from there one things get worst.

> There is something causing this memory problem,and I think we can narrow
> it down if you output the kmemleaker output. You might have to
> unload/reload the em28xx driver to get an idea of where the leak is.

>> So what i have as domU kernel is:
>> - from Linus tree 2.6.36-pre with latest commit 
>> 8357422d4bf33bc2c35884d4016c3fc9efbbc1d2
>> - pulled your devel/xen-pcifront-0.5 tree, and fixed the merge conflicts due 
>> to the pv on hvm patches, in the same way Jeremy did for his 2.6.36 branch.
>>     Conflicts:
>>         drivers/xen/events.c
>>         include/xen/events.h
>> 
>> - Added a patch for xhci isoc that fixes another bug. (attached 
>> 0001-xHCI-update-ring-dequeue-pointer-when-process-missed.patch)
>> - Added a patch that shows extra debug info for xhci from the author of the 
>> xhci-isoc pathes (attached isoc_length5.patch)
>> 
>> - Changed some debug lines from dbg to warn level, (just enabling xhci-debug 
>> in kernel config floods the logs too fast, so i just enabled them in 
>> xhci-mem.c)
>>   (Attached a patch with all my changes to xhci*, including the 2 patches 
>> above applied.)
>> 
>> - Added some printk's to swiotlb-xen.c to see which functions were used, and 
>> let some of them print the address as well, in the hope i could find some 
>> debug info there.(attached)

> So the other thing I realized is that xhci can actually do 64bit
> transfers. Try booting without the 'swiotlb=force' (but still have
> iommu=soft).

I will give this a try !

>> 
>> Apart from fixing the xhci in the end, is there a way to prevent xen from 
>> freezing altogether without leaving a trace ?

> I am still puzzled as why Xen hypervisor would freeze. It _should_ work
> just fine.

Yes that's some concern, and it does with USB 2, just before my holiday I 
switched back to USB 2 with the same grabber, and it has run for 20 days now 
with continuous grabbing and a lot of disk and network activity.
So it definitely is about XHCI.
And freezing the hypervisor without a trace is definitely the last and worst 
option on my list :-)


>> Even an Oops is much easier to debug than a freeze. Due to the nature of DMA 
>> that could perhaps be difficult, although there is an DMA API ...
>> 
>> Had a fruitful LinuxCon ?

> Quite so!
>> 
>> --
>> Sander








-- 
Best regards,
 Sander                            mailto:linux@xxxxxxxxxxxxxx


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.