Xen project Mailing List

Re: [Xen-devel] Bug on shadow page mode

From: Tim Deegan <tim@xxxxxxx>

Date: Tue, 2 Apr 2013 17:45:09 +0100

Cc: Xudong Hao <xudong.hao@xxxxxxxxx>, "xen-devel \(xen-devel@xxxxxxxxxxxxx\)" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 02 Apr 2013 16:45:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

At 12:50 +0100 on 02 Apr (1364907054), Jan Beulich wrote: > >>> On 02.04.13 at 10:40, "Hao, Xudong" <xudong.hao@xxxxxxxxx> wrote: > > (XEN) [<ffff82c4c01e637f>] guest_walk_tables_4_levels+0x135/0x6a6 > > (XEN) [<ffff82c4c020d8cc>] sh_page_fault__guest_4+0x505/0x2015 > > (XEN) [<ffff82c4c01d2135>] vmx_vmexit_handler+0x86c/0x1748 > > (XEN) > > (XEN) Pagetable walk from ffff82c406a00000: > > (XEN) L4[0x105] = 000000007f26e063 ffffffffffffffff > > (XEN) L3[0x110] = 000000005ce30063 ffffffffffffffff > > (XEN) L2[0x035] = 0000000014aab063 ffffffffffffffff > > (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff > > Tim, > > I'm afraid this is something for you. From what I can tell, despite > sh_walk_guest_tables() being called from sh_page_fault() without > the paging lock held, there doesn't appear to be a way for this to > race sh_update_cr3(). And with the way the latter updates > guest_vtable, the only way for a page fault to happen upon use > of that cached mapping would be between the call to > sh_unmap_domain_page_global() and the immediately following > one to sh_map_domain_page_global() (i.e. while the pointer is > stale). I'll have a look at it on Thursday; swapping the map and the unmap should be trivial, anyway. Is this bug easily reproducable, or was it only hit once? I'd expect a race like this to be nigh impossible, especially considering that 32-bit Xen had the same code for years. > What I do note is > > /* PAGING_LEVELS==4 implies 64-bit, which means that > * map_domain_page_global can't fail */ > BUG_ON(v->arch.paging.shadow.guest_vtable == NULL); > > which is no longer true. Sadly the 2-level paging case also > doesn't really handle the similar error there, so it's not really > clear to me how this would best be fixed. And that's not the > reason for the problem here anyway. I'll look at that too -- it may be that we can avoid the _global() map altogether. HAP seems to manage without it, but it has far fewer lookups. Maybe I could add a per-vcpu fixmap for it, which would cover most cases (i.e. local lookups). Cheers, Tim. > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 4: > > (XEN) FATAL PAGE FAULT > > (XEN) [error_code=0000] > > (XEN) Faulting linear address: ffff82c406a00000 > > (XEN) **************************************** > > (XEN) > > (XEN) Reboot in five seconds... > > (XEN) Resetting with ACPI MEMORY or I/O RESET_REG > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.