Xen project Mailing List

Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view

From: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>

Date: Tue, 17 Apr 2018 13:49:16 +0300

Cc: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 17 Apr 2018 10:49:28 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 04/17/2018 11:24 AM, Razvan Cojocaru wrote: > On 04/16/2018 11:21 PM, George Dunlap wrote: >> On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru >> <rcojocaru@xxxxxxxxxxxxxxx> wrote: >>> On 04/16/2018 08:47 PM, George Dunlap wrote: >>>> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote: >>>>> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote: >>>>>> Debugging continues. >>>>> >>>>> Finally, the attached patch seems to get the display unstuck in my >>>>> scenario, although for one guest I get: >>>>> >>>>> (XEN) d2v0 Unexpected vmexit: reason 49 >>>>> (XEN) domain_crash called from vmx.c:4120 >>>>> (XEN) Domain 2 (vcpu#0) crashed on cpu#1: >>>>> (XEN) ----[ Xen-4.11-unstable x86_64 debug=y Not tainted ]---- >>>>> (XEN) CPU: 1 >>>>> (XEN) RIP: 0010:[<fffff96000842354>] >>>>> (XEN) RFLAGS: 0000000000010246 CONTEXT: hvm guest (d2v0) >>>>> (XEN) rax: fffff88003000000 rbx: fffff900c0083db0 rcx: >>>>> 00000000aa55aa55 >>>>> (XEN) rdx: fffffa80041bdc41 rsi: fffff900c00c69a0 rdi: >>>>> 0000000000000001 >>>>> (XEN) rbp: 0000000000000000 rsp: fffff88002ee9ef0 r8: >>>>> fffffa80041bdc40 >>>>> (XEN) r9: fffff80001810e80 r10: fffffa800342aa70 r11: >>>>> fffff88002ee9e80 >>>>> (XEN) r12: 0000000000000005 r13: 0000000000000001 r14: >>>>> fffff900c00c08b0 >>>>> (XEN) r15: 0000000000000001 cr0: 0000000080050031 cr4: >>>>> 00000000000406f8 >>>>> (XEN) cr3: 00000000ef771000 cr2: fffff900c00c8000 >>>>> (XEN) fsb: 00000000fffde000 gsb: fffff80001810d00 gss: >>>>> 000007fffffdc000 >>>>> (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 >>>>> >>>>> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping >>>>> somebody more familiar with the code can point to a more elegant >>>>> solution if one exists. >>>> >>>> I think I have an idea what's going on, but it's complicated. :-) >>>> >>>> Basically, the logdirty functionality isn't simple, and needs careful >>>> thought on how to integrate it. I'll write some more tomorrow, and see >>>> if I can come up with a solution. >>> >>> I think I know why this happens for the one guest - the other guests >>> start at a certain resolution display-wise and stay that way until shutdown. >>> >>> This particular guest starts with a larger screen, then goes to roughly >>> 2/3rds of it, then tries to go back to the initial larger one - at which >>> point the above happens. I assume this corresponds to some pages being >>> removed and/or added. I'll test this theory more tomorrow - if it's >>> correct I should be able to reproduce the crash (with the patch) by >>> simply resetting the screen resolution (increasing it). >> >> The trick is that p2m_change_type doesn't actually iterate over the >> entire p2m range, individually changing entries as it goes. Instead >> it misconfigures the entries at the top-level, which causes the kinds >> of faults shown above. As it gets faults for each entry, it checks >> the current type, the logdirty ranges, and the global logdirty bit to >> determine what the new types should be. >> >> Your patch makes it so that all the altp2ms now get the >> misconfiguration when the logdirty range is changed; but clearly >> handling the misconfiguration isn't integrated properly with the >> altp2m system yet. Doing it right may take some thought. > > FWIW, the attached patch has solved the misconfig-related domain crash > for me (though I'm very likely missing some subtleties). It all seems to > work as expected when enabling altp2m and switching early to a new view. > However, now I have domUs with a frozen display when I disconnect the > introspection application (that is, after I switch back to the default > view and disable altp2m on the domain). The for() loop in the previous patch is unnecessary, so here's a new (cleaner) patch. I can't get the guest to freeze the display when detaching anymore - unrelated to the for() - (so it might have been something else in my setup), but I'll watch for it in the following days. Hopefully this is either a reasonable fix or a basis for one. Thanks, Razvan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.