[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: xen-swiotlb issue when NVMe driver is enabled in Dom0 on ARM
Hi Stefano,
Thanks again for helping us to find the root cause of the issue. > On 20 Apr 2022, at 3:36 am, Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote: > >>> Then there is xen_swiotlb_init() which allocates some memory for >>> swiotlb-xen at boot. It could lower the total amount of memory >>> available, but if you disabled swiotlb-xen like I suggested, >>> xen_swiotlb_init() still should get called and executed anyway at boot >>> (it is called from arch/arm/xen/mm.c:xen_mm_init). So xen_swiotlb_init() >>> shouldn't be the one causing problems. >>> >>> That's it -- there is nothing else in swiotlb-xen that I can think of. >>> >>> I don't have any good ideas, so I would only suggest to add more printks >>> and report the results, for instance: >> >> As suggested I added the more printks but only difference I see is the size apart >> from that everything looks same . >> >> Please find the attached logs for xen and native linux boot. > > One difference is that the order of the allocations is significantly > different after the first 3 allocations. It is very unlikely but > possible that this is an unrelated concurrency bug that only occurs on > Xen. I doubt it. I am not sure but just to confirm with you, I see below logs in every scenario. SWIOTLB memory allocated by linux swiotlb and used by xen-swiotlb. Is that okay or it can cause some issue. [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] software IO TLB: mapped [mem 0x00000000f4000000-0x00000000f8000000] (64MB) snip from int __ref xen_swiotlb_init(int verbose, bool early) /* * IO TLB memory already allocated. Just use it. */ if (io_tlb_start != 0) { xen_io_tlb_start = phys_to_virt(io_tlb_start); goto end; } > > I think you could try booting native and Xen with only 1 CPU enabled in > both cases. > > For native, you can do that with maxcpus, e.g. maxcpus=1. > For Xen, you can do that with dom0_max_vcpus=1. I don't think we need to > reduce the number of pCPUs seen by Xen, but it could be useful to pass > sched=null to avoid any scheduler effects. This is just for debugging of > course. > I tried to boot the XEN with "dom0_max_vcpus=1” & “schedule-null” and issue remains . > > In reality, the most likely explanation is that the issue is a memory > corruption. Something somewhere is corrupting Linux memory and it just > happens that we see it when calling dma_direct_alloc. This means it is > going to be difficult to find as the only real clue is that it is > swiotlb-xen that is causing it. Agree we observe issue with xen-swiotlb dma ops only. > > > I added more printks with the goal of detecting swiotlb-xen code paths > that shouldn't be taken in a normal dom0 boot without domUs. For > instance, range_straddles_page_boundary should always return zero and > the dma_mask check in xen_swiotlb_alloc_coherent should always succeed. > > Fingers crossed we'll notice that the wrong path is taken just before > the crash. Please find the attached logs. I captured the logs for Xen with and without (dom0_max_vcpus=1 & sched=null) and also for native linux with and without (maxcpus=1) Regards, Rahul Attachment:
xen_boot_with_dom0_max_vcpus_1_debug.log Attachment:
native_linux_with_ maxcpus_1_debug.log Attachment:
native_linux_boot_debug.log Attachment:
xen_boot_debug.log
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |