[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] x86_32: spurious page faults in guest GDT area


  • To: Jan Beulich <jbeulich@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
  • Date: Mon, 16 Jun 2008 11:41:27 +0100
  • Delivery-date: Mon, 16 Jun 2008 03:42:10 -0700
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcjPnYd9xeC76DuQEd2lAgAX8io7RQ==
  • Thread-topic: [Xen-devel] x86_32: spurious page faults in guest GDT area

What's the #PF error code -- is it a not-present or an access-violation
fault; read/write access; etc?

Do these faults happen under stable workload (by which I mean no domains
being created/destroyed -- all VMs are booted and just running normal kinds
of stuff)?

 -- Keir

On 16/6/08 11:32, "Jan Beulich" <jbeulich@xxxxxxxxxx> wrote:

> While under long-during stress I can reproduce this issue back to at least
> c/s 16084, in older change sets it was apparently so rare that during
> normal work/testing I never noticed it or had to ignore it due to not being
> re-creatable. However, on recent change sets (tested with our 2.6.25-
> based kernels only so far) it happens much more frequently (and
> occasionally even while the machine boots).
> 
> I inserted selector validation code in the context switch path to verify
> that a vcpu's selectors are okay (or better, that the guest-provided
> part of the GDT is accessible). These checks never indicated a failure
> so far.
> 
> The faults may happen in various places (hypervisor exit path as well
> as guest code), and always involve loading a selector register with a
> guest defined value (i.e. in the first page of the GDT). A page walk
> in the (hypervisor) fault handler shows that all levels of the translation
> exist (and are valid/consistent), and instrumentation of the selector
> manipulation functions shows that none of them get called spuriously.
> 
> Hence I can only suspect some asynchronous page table manipulation
> (but I'm not aware of anything like that) lacking proper TLB flushing, or
> some very rare issue with the CR3 reloading code.
> 
> The same 32-bit kernel used with a 64-bit hypervisor so far did not
> show similar problems - while I first thought this would help narrow
> the problem, I'm pretty clueless at this point because the candidate
> areas where 32-bit code is different from 64-bit all don't look
> troublesome to me (most notably TLB flushing is identical between
> the two).
> 
> Any ideas on how to narrow the problem would be appreciated.
> Thanks, Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.