[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Bug on shadow page mode


  • To: Tim Deegan <tim@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>
  • From: "Hao, Xudong" <xudong.hao@xxxxxxxxx>
  • Date: Sun, 7 Apr 2013 09:25:39 +0000
  • Accept-language: en-US
  • Cc: "xen-devel \(xen-devel@xxxxxxxxxxxxx\)" <xen-devel@xxxxxxxxxxxxx>
  • Delivery-date: Sun, 07 Apr 2013 09:26:46 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Thread-index: Ac4sRI1XnYYXkrGDSbWhn4ywDiyaQwC7W9cAABITA+D//7X7AIADCr4AgAAElQD/+tkJIA==
  • Thread-topic: [Xen-devel] Bug on shadow page mode

> -----Original Message-----
> From: Tim Deegan [mailto:tim@xxxxxxx]
> Sent: Thursday, April 04, 2013 6:35 PM
> To: Jan Beulich
> Cc: Hao, Xudong; xen-devel (xen-devel@xxxxxxxxxxxxx)
> Subject: Re: [Xen-devel] Bug on shadow page mode
> 
> At 11:18 +0100 on 04 Apr (1365074288), Tim Deegan wrote:
> > Hi,
> >
> > At 12:50 +0100 on 02 Apr (1364907054), Jan Beulich wrote:
> > > > (XEN) Xen call trace:
> > > > (XEN)    [<ffff82c4c01e637f>] guest_walk_tables_4_levels+0x135/0x6a6
> > > > (XEN)    [<ffff82c4c020d8cc>] sh_page_fault__guest_4+0x505/0x2015
> > > > (XEN)    [<ffff82c4c01d2135>] vmx_vmexit_handler+0x86c/0x1748
> > > > (XEN)
> > > > (XEN) Pagetable walk from ffff82c406a00000:
> > > > (XEN)  L4[0x105] = 000000007f26e063 ffffffffffffffff
> > > > (XEN)  L3[0x110] = 000000005ce30063 ffffffffffffffff
> > > > (XEN)  L2[0x035] = 0000000014aab063 ffffffffffffffff
> > > > (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> > >
> > > Tim,
> > >
> > > I'm afraid this is something for you. From what I can tell, despite
> > > sh_walk_guest_tables() being called from sh_page_fault() without
> > > the paging lock held, there doesn't appear to be a way for this to
> > > race sh_update_cr3(). And with the way the latter updates
> > > guest_vtable, the only way for a page fault to happen upon use
> > > of that cached mapping would be between the call to
> > > sh_unmap_domain_page_global() and the immediately following
> > > one to sh_map_domain_page_global() (i.e. while the pointer is
> > > stale).
> >
> > Hmmm.  So the only way I can see that happening is if some foreign agent
> > resets the vcpu's state while it's actually running, which AFAICT
> > shouldn't happen.
> 
> OTOH, looking at map_domain_page_global, there doesn't seem to be any
> locking preventing two CPUs from populating a page of global-map l1es at
> the same time.  So, here's a different patch to test -- it would be good
> to know if this patch by itself fixes the crash.
> 

Holding lock during l1e populating fixes the crash on my side.

Thanks
-Xudong

> Tim.
> 
> diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
> index 7421e03..efda6af 100644
> --- a/xen/arch/x86/domain_page.c
> +++ b/xen/arch/x86/domain_page.c
> @@ -354,9 +354,10 @@ void *map_domain_page_global(unsigned long mfn)
>      set_bit(idx, inuse);
>      inuse_cursor = idx + 1;
> 
> +    pl1e = virt_to_xen_l1e(va);
> +
>      spin_unlock(&globalmap_lock);
> 
> -    pl1e = virt_to_xen_l1e(va);
>      if ( !pl1e )
>          return NULL;
>      l1e_write(pl1e, l1e_from_pfn(mfn, __PAGE_HYPERVISOR));

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.