 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH V3 (resend) 01/19] x86: Create per-domain mapping of guest_root_pt
 Hi Jan, On 16/05/2024 08:17, Jan Beulich wrote: On 15.05.2024 20:25, Elias El Yandouzi wrote:However, I noticed quite a weird bug while doing some testing. I may need your expertise to find the root cause.Looks like you've overflowed the dom0 kernel stack, most likely because of recurring nested exceptions.In the case where I have more vCPUs than pCPUs (and let's consider we have one pCPU for two vCPUs), I noticed that I would always get a page fault in dom0 kernel (5.10.0-13-amd64) at the exact same location. I did a bit of investigation but I couldn't come to a clear conclusion. Looking at the stack trace [1], I have the feeling the crash occurs in a loop or a recursive call. I tried to identify where the crash occurred using addr2line: > addr2line -e vmlinux-5.10.0-29-amd64 0xffffffff810218a0 debian/build/build_amd64_none_amd64/arch/x86/xen/mmu_pv.c:880 It turns out to point on the closing bracket of the function xen_mm_unpin_all()[2]. I thought the crash could happen while returning from the function in the assembly epilogue but the output of objdump doesn't even show the address. The only theory I could think of was that because we only have one pCPU, we may never execute one of the two vCPUs, and never setup the mapping to the guest_root_pt in write_ptbase(), hence the page fault. This is just a random theory, I couldn't find any hint suggesting it would be the case though. Any idea how I could debug this?I guess you want to instrument Xen enough to catch the top level fault (or the 2nd from top, depending on where the nesting actually starts) to see why that happens. Quite likely some guest mapping isn't set up properly. Julien helped me with this one and I believe we have identified the problem. As you've suggested, I wrote the mapping of the guest root PT in our per-domain section, root_pt_l1tab, within write_ptbase() function as we'd always be in the case v == current plus switch_cr3_cr4() would always flush local tlb. However, there exists a path, in toggle_guest_mode(), where we could call update_cr3()/make_cr3() without calling write_ptbase() and hence not maintain mappings properly. Instead toggle_guest_mode() has a partly open-coded version of write_ptbase(). Would you rather like to see the mappings written in make_cr3() or in toggle_guest_mode() within the pseudo open-coded version of write_ptbase()? Elias 
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |