[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen crash: map_domain_page() on an NMI path



On 19/12/13 14:55, Jan Beulich wrote:
>>>> Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 12/18/13 8:37 PM >>>
>> However, the interesting point is the nested crash.  This is a failed
>> assertion while attempting to execute the kexec crash path.  Xen is
>> 4.3.1 based, and built with debug, so the stack trace below is generated
>> with frame pointers, and is correct.
>> ...
>> Here, we have managed to re-enter the __context_switch() path because of
>> an NMI interrupting it.  The sync_local_execstate() in map_domain_page()
>> is by way of mapcache_current_vcpu().
>>
>> I am struggling to work out how best to fix this.  Would it be best for
>> the crash path to unconditionally change to the idle_pagetables and use
>> mapcache_override_current(NULL)?
> It is wrong to at all call map_domain_page() in NMI context, so fixing the
> environment for it to get invoked is not the right solution in any case (or
> else you'd have to fix more than this, like forcibly making the spin lock
> available that the code will want to acquire subsequently).
>
> As with anything else on the NMI path, and as you probably know better
> than me - great care is needed in every piece of code that may get
> invoked here. I don't think we want to make map_domain_page() usable
> in NMI context; instead I would think map_vtd_domain_page() might
> better learn of possibly getting called this way, and use fixmaps in that
> case (assuming we can determine a reasonably low maximum number of
> pages that may need to be mapped this way at any one time - ideally that
> would turn out to be just one).
>
> Jan

I would certainly agree that anything complicated in NMI context
(map_domain_page() included) must be completely avoided.

However, once we have started crashing, I would argue we have moved to a
new context (a 'crashing' context perhaps?), which must be able to
function correctly, no matter which context it originated from (be it
regular, interrupt, NMI or MCE contexts).  The advantage of the crashing
context is that we can safely trash anything we need to, in order to
execute purgatory with sensible state.


This particular use of map_vtd_domain_page() is for the qinval control
region, which looks to be up to 8 contiguous frames, per IOMMU.  There
are another 8 frames per IOMMU for the intremap control region (and
other control regions as well).

I don't think this practical to put all of this in the fixmap.

However, for hardware pieces like this which are set up once at the
start of day, and have the hardware pointed at a chosen region, would it
be acceptable to allocate their frames low enough to be covered by the
direct map area (protected by BUG()s?) and set up their base virtual
addresses knowing that there will always be a valid mapping from any Xen
pagetables?  This seems better than constantly playing around with the
mappings.


For the 'crashing' context, I was thinking of extending enum
system_state to include "panic" and "crash_single" states, where
crash_single implies "safe to not actually lock a spinlock"  Integrating
this without an adverse effect on spinlock performance might be a little
tricky however.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.