Re: [Xen-devel] Weird altp2m behaviour when switching early to a new view

On Mon, Apr 16, 2018 at 7:46 PM, Razvan Cojocaru
rcojocaru@xxxxxxxxxxxxxxx
> On 04/16/2018 08:47 PM, George Dunlap wrote:
>> On 04/13/2018 03:44 PM, Razvan Cojocaru wrote:
>>> On 04/11/2018 11:04 AM, Razvan Cojocaru wrote:
>>>> Debugging continues.
>>> Finally, the attached patch seems to get the display unstuck in my
>>> scenario, although for one guest I get:
>>> (XEN) d2v0 Unexpected vmexit: reason 49
>>> (XEN) domain_crash called from vmx.c:4120
>>> (XEN) Domain 2 (vcpu#0) crashed on cpu#1:
>>> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
>>> (XEN) CPU:    1
>>> (XEN) RIP:    0010:[<fffff96000842354>]
>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest (d2v0)
>>> (XEN) rax: fffff88003000000   rbx: fffff900c0083db0   rcx: 00000000aa55aa55
>>> (XEN) rdx: fffffa80041bdc41   rsi: fffff900c00c69a0   rdi: 0000000000000001
>>> (XEN) rbp: 0000000000000000   rsp: fffff88002ee9ef0   r8:  fffffa80041bdc40
>>> (XEN) r9:  fffff80001810e80   r10: fffffa800342aa70   r11: fffff88002ee9e80
>>> (XEN) r12: 0000000000000005   r13: 0000000000000001   r14: fffff900c00c08b0
>>> (XEN) r15: 0000000000000001   cr0: 0000000080050031   cr4: 00000000000406f8
>>> (XEN) cr3: 00000000ef771000   cr2: fffff900c00c8000
>>> (XEN) fsb: 00000000fffde000   gsb: fffff80001810d00   gss: 000007fffffdc000
>>> (XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
>>> i.e. EXIT_REASON_EPT_MISCONFIG - so not of the woods yet. I am hoping
>>> somebody more familiar with the code can point to a more elegant
>>> solution if one exists.
>> I think I have an idea what's going on, but it's complicated. :-)
>> Basically, the logdirty functionality isn't simple, and needs careful
>> thought on how to integrate it.  I'll write some more tomorrow, and see
>> if I can come up with a solution.
> I think I know why this happens for the one guest - the other guests
> start at a certain resolution display-wise and stay that way until shutdown.
> This particular guest starts with a larger screen, then goes to roughly
> 2/3rds of it, then tries to go back to the initial larger one - at which
> point the above happens. I assume this corresponds to some pages being
> removed and/or added. I'll test this theory more tomorrow - if it's
> correct I should be able to reproduce the crash (with the patch) by
> simply resetting the screen resolution (increasing it).

The trick is that p2m_change_type doesn't actually iterate over the
entire p2m range, individually changing entries as it goes.  Instead
it misconfigures the entries at the top-level, which causes the kinds
of faults shown above.  As it gets faults for each entry, it checks
the current type, the logdirty ranges, and the global logdirty bit to
determine what the new types should be.

Your patch makes it so that all the altp2ms now get the
misconfiguration when the logdirty range is changed; but clearly
handling the misconfiguration isn't integrated properly with the
altp2m system yet.  Doing it right may take some thought.


