[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-ia64-devel] EFI Mapping Windows Install Crash Bug



On Tue, Jul 01, 2008 at 04:07:53PM +0900, Isaku Yamahata wrote:
> On Tue, Jul 01, 2008 at 11:03:28AM +1000, Simon Horman wrote:
> > Hi,
> > 
> > I'm a bit hesitant to jump the gun, but I think that I might have
> > isolated the cause of win2k3-sp2 crashing during install when my EFI
> > Mapping patches are applied. Well, perhaps not the cause, but I think I
> > know where it is dying.
> > 
> >     Quickly as background, the EFI Mapping parches move the mapping
> >     that EFI is taught on boot time to map memory where Linux places
> >     it ( basically pa + (0xe<60) ) instead of where Xen usually
> >     places it ( basically pa + (0xf<60) ). In order to protect this
> >     mapping from HVM domains a special region id is used. The
> >     hypervisor switches to that region id just before making any
> >     PAL, SAL or EFI calls, and switches back to the previous region
> >     id once the call completes.  As region 7 has to be changed,
> >     entries that are pinned into the TLB have to be repinned. And
> >     that is roughly where the fun begins.
> > 
> > As for the problem? It seems to be caused by ia64_mca_cpe_int_caller()
> > calling ia64_log_queue() which calls ia64_sal_get_state_info(). I
> > believe that the hypervisor dies in ia64_log_queue() somewhere after
> > ia64_sal_get_state_info() completes. That is, I am suspecting that the
> > call to ia64_sal_get_state_info() is returning bogus data.
> 
> Is ia64_mca_cpe_int_caller() called in interrupt context?
> If so, ia64_log_queue() calls xmalloc() which can't be called
> from interrupt context. Then xen VMM crashes at ASSERT(!in_irq())
> in _xmalloc().

That is a good point. Although xmalloc() is only called if
ia64_sal_get_state_info() returns a non-zero value. Which
according to tracing that I have done this afternoon, does
not seem to be the case (when ia64_log_queue() is called
from other places in mca.c.

How can I check if the call is being made in interrupt context?

Also, after some more investigation, I now believe that the hypervisor
is locking up inside ia64_sal_get_state_info() not later on in
ia64_log_queue() as I thought this morning.

> > Furthermore, my traces seem to indicate that the problem arises the
> > call to ia64_log_queue() and in turn to ia64_sal_get_state_info() is
> > made when the region id is already switched to make some other PAL, SAL
> > or EFI call (though I doubt it is particularly important which one).
> > 
> > This seems to make sense to me as ia64_mca_cpe_int_caller() is
> > "Triggered by sw interrupt from CPE polling routine.".
> > 
> > I am unsure about what to do about this problem, but for testing
> > purposes I simply removed the call to ia64_log_queue() from
> > ia64_mca_cpe_int_caller() and things seem to work.
> > 
> > When I say seem to work, this bug does not manifest every time I install
> > win2k3-sp2. So it can be hard to tell if a change has improved things or
> > not. But for now, I have not seen a crash occur with this hack in place
> > (+ various other changes which may or may not be relevant, but this one
> > seems to be particularly important).
> > 
> > I will investigate my theory that things die in ia64_log_queue()
> > further. But I wonder if there might be a way to permanently remove/move
> > the call to ia64_log_queue() out of ia64_mca_cpe_int_caller() and
> > possibly other PAL, SAL or EFI calls inside other MCA code.
> > 
> 
> -- 
> yamahata

-- 
宝曼 西門 (ホウマン・サイモン) | Simon Horman (Horms)

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.