[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-ia64-devel] RE: No scheduling after domU launch


  • To: "Haavard Bjerke" <havard.bjerke@xxxxxxx>
  • From: "Magenheimer, Dan (HP Labs Fort Collins)" <dan.magenheimer@xxxxxx>
  • Date: Fri, 20 May 2005 12:58:41 -0700
  • Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Fri, 20 May 2005 19:57:56 +0000
  • List-id: DIscussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
  • Thread-index: AcVdUEpSQz2ksOooQVmLyWcPqqqblwACDD3A
  • Thread-topic: No scheduling after domU launch

Excellent work isolating this!

It's not clear to me why your fix works, but as long as
it gets you going, Matt and I will try to figure out
the real problem... we are trying to get a hardware
debugger working which will make finding this kind
of problem easier.

In the meantime, in case anyone else is tracking this,
here's some more info gathered from the simulator:

The address that is being cmpxchg'd is 0xf0ffffffffff0000
which is the first 4 bytes in the local_cpu_data page
(a per-cpu page, with the symbol per_cpu__cpu_init).
The 4 bytes contain the softirq_pending flags.

If this is getting trashed somehow, that would certainly
explain the behavior.

Dan

> -----Original Message-----
> From: Haavard Bjerke [mailto:havard.bjerke@xxxxxxx] 
> Sent: Friday, May 20, 2005 9:26 AM
> To: Magenheimer, Dan (HP Labs Fort Collins)
> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: No scheduling after domU launch
> 
> Here's an update on this bug. The problem seems to be the
> 
> asm volatile ("cmpxchg4.acq %0=[%1],%2,ar.ccv":
>                       "=r"(ia64_intri_res) : "r"(ptr), 
> "r"(new) : "memory");
> 
> line in gcc_intrin.h:ia64_cmpxchg4_acq(). When I comment it 
> out when launching domU, things are back to normal.
> 
> The call sequence leading up to this instruction is:
> sched_bvt.c:bvt_wake()
> softirq.h:cpu_raise_softirq()
> bitops.h:test_and_set_bit()
> intrinsics.h:cmpxchg_acq() 
> gcc_intrin.h:ia64_cmpxchg()
> gcc_intrin.h:ia64_cmpxchg4_acq()
> 
> Håvard
> 
> On Thu, May 19, 2005 at 07:32:54PM +0200, Haavard Bjerke wrote:
> > I think I've found some leads to the most recent bug (dom0 
> freezes immediately, as opposed to after a short while). The 
> problem seems to be somewhere within the cpu_raise_softirq() 
> routine, which is called from bvt_wake() in sched_bvt.c. By 
> not calling that routine when launching domU, I've managed to 
> get control back to dom0 for a short while, after which it 
> freezes as before. I'll look more into it tomorrow.
> > 
> > Håvard
> > 
> > On Wed, May 18, 2005 at 08:32:20AM -0700, Magenheimer, Dan 
> (HP Labs Fort Collins) wrote:
> > > Given the previous discussion around this (last month?), I suspect
> > > that there is a bug somewhere that is overwriting some random
> > > memory related to the scheduler.  As Mark W pointed out, your
> > > previous workaround fixed a problem that should never happen.
> > > And I don't think any recent changes in xeno-unstable-ia64 have
> > > had anything to do with the scheduler, so I suspect the
> > > "random memory" moved to a different random spot which
> > > is causing your current problem.
> > > 
> > > This is just a theory... you are probably as familiar with this
> > > part of the code as anybody on this list right now.  Try
> > > adding some more printf's to see if any clues arise.
> > > I'll try to take a look but probably not today, so reply
> > > to this thread if you learn anything new or interesting.
> > > 
> > > Dan
> > > 
> > > > -----Original Message-----
> > > > From: Haavard Bjerke [mailto:havard.bjerke@xxxxxxx] 
> > > > Sent: Wednesday, May 18, 2005 9:23 AM
> > > > To: Magenheimer, Dan (HP Labs Fort Collins)
> > > > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> > > > Subject: No scheduling after domU launch
> > > > 
> > > > Since pulling the latest xeno-unstable-ia64 a few days ago, 
> > > > scheduling seems to stop immediately after launching domU, 
> > > > that is, domU continues to load, while dom0 stops because the 
> > > > scheduler is never entered again. This looks like the same 
> > > > problem as before, with dom0 freezing after domU launch; 
> > > > only, now it seems to freeze earlier. Before, I was able to 
> > > > run a hypercall right after launch. This is kind of critical, 
> > > > since a user-space app running in dom0 is supposed to 
> > > > establish a ctrl-channel right after launch, while domU 
> is booting.
> > > > 
> > > > So I'm quite stuck and wondering why it stops scheduling, and 
> > > > I could use some input. So far I've found out that 
> > > > __enter_sceduler() is never called after domU launch, while 
> > > > the routine that's supposed to call that routine, 
> > > > ac_timer_softirq_action(), continues to be called. I think 
> > > > the __enter_sceduler() routine should be in a heap, but I 
> > > > don't understand why it would suddenly be removed from 
> the heap..
> > > > 
> > > > Thanks,
> > > > Håvard
> > > > 
> 

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.