[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Bug on shadow page mode



At 12:50 +0100 on 02 Apr (1364907054), Jan Beulich wrote:
> >>> On 02.04.13 at 10:40, "Hao, Xudong" <xudong.hao@xxxxxxxxx> wrote:
> > (XEN)    [<ffff82c4c01e637f>] guest_walk_tables_4_levels+0x135/0x6a6
> > (XEN)    [<ffff82c4c020d8cc>] sh_page_fault__guest_4+0x505/0x2015
> > (XEN)    [<ffff82c4c01d2135>] vmx_vmexit_handler+0x86c/0x1748
> > (XEN)    
> > (XEN) Pagetable walk from ffff82c406a00000:
> > (XEN)  L4[0x105] = 000000007f26e063 ffffffffffffffff
> > (XEN)  L3[0x110] = 000000005ce30063 ffffffffffffffff
> > (XEN)  L2[0x035] = 0000000014aab063 ffffffffffffffff 
> > (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> 
> Tim,
> 
> I'm afraid this is something for you. From what I can tell, despite
> sh_walk_guest_tables() being called from sh_page_fault() without
> the paging lock held, there doesn't appear to be a way for this to
> race sh_update_cr3(). And with the way the latter updates
> guest_vtable, the only way for a page fault to happen upon use
> of that cached mapping would be between the call to
> sh_unmap_domain_page_global() and the immediately following
> one to sh_map_domain_page_global() (i.e. while the pointer is
> stale).

I'll have a look at it on Thursday; swapping the map and the unmap
should be trivial, anyway.

Is this bug easily reproducable, or was it only hit once?  I'd expect a
race like this to be nigh impossible, especially considering that 32-bit
Xen had the same code for years.

> What I do note is
> 
>         /* PAGING_LEVELS==4 implies 64-bit, which means that
>          * map_domain_page_global can't fail */
>         BUG_ON(v->arch.paging.shadow.guest_vtable == NULL);
> 
> which is no longer true. Sadly the 2-level paging case also
> doesn't really handle the similar error there, so it's not really
> clear to me how this would best be fixed. And that's not the
> reason for the problem here anyway.

I'll look at that too -- it may be that we can avoid the _global() map
altogether.  HAP seems to manage without it, but it has far fewer
lookups.  Maybe I could add a per-vcpu fixmap for it, which would cover
most cases (i.e. local lookups).

Cheers,

Tim.

> > (XEN) 
> > (XEN) ****************************************
> > (XEN) Panic on CPU 4:
> > (XEN) FATAL PAGE FAULT
> > (XEN) [error_code=0000]
> > (XEN) Faulting linear address: ffff82c406a00000
> > (XEN) ****************************************
> > (XEN) 
> > (XEN) Reboot in five seconds...
> > (XEN) Resetting with ACPI MEMORY or I/O RESET_REG
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.