[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH V3 (resend) 01/19] x86: Create per-domain mapping of guest_root_pt
On 15.05.2024 20:25, Elias El Yandouzi wrote: > However, I noticed quite a weird bug while doing some testing. I may > need your expertise to find the root cause. Looks like you've overflowed the dom0 kernel stack, most likely because of recurring nested exceptions. > In the case where I have more vCPUs than pCPUs (and let's consider we > have one pCPU for two vCPUs), I noticed that I would always get a page > fault in dom0 kernel (5.10.0-13-amd64) at the exact same location. I did > a bit of investigation but I couldn't come to a clear conclusion. > Looking at the stack trace [1], I have the feeling the crash occurs in a > loop or a recursive call. > > I tried to identify where the crash occurred using addr2line: > > > addr2line -e vmlinux-5.10.0-29-amd64 0xffffffff810218a0 > debian/build/build_amd64_none_amd64/arch/x86/xen/mmu_pv.c:880 > > It turns out to point on the closing bracket of the function > xen_mm_unpin_all()[2]. > > I thought the crash could happen while returning from the function in > the assembly epilogue but the output of objdump doesn't even show the > address. > > The only theory I could think of was that because we only have one pCPU, > we may never execute one of the two vCPUs, and never setup the mapping > to the guest_root_pt in write_ptbase(), hence the page fault. This is > just a random theory, I couldn't find any hint suggesting it would be > the case though. Any idea how I could debug this? I guess you want to instrument Xen enough to catch the top level fault (or the 2nd from top, depending on where the nesting actually starts) to see why that happens. Quite likely some guest mapping isn't set up properly. Jan > [1] https://pastebin.com/UaGRaV6a > [2] https://github.com/torvalds/linux/blob/v5.10/arch/x86/xen/mmu_pv.c#L880 > > Elias >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |