[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-ia64-devel] why sedf won't work on Xen/ia64
I spent some time looking into why sedf won't work on Xen/ia64. I was able to understand the symptoms but I don't have a fix. Since I may not get back to working on it (at least for awhile), I wanted to post my status so others don't have to reproduce my research. (Tristan, it appears to me that this will directly affect the SMP implementation so this might be something worth looking into while you are waiting for multiple domains and virtual I/O to work solidly.) Problem: When bvt was the default scheduler, Xen/ia64 seemed to work fine, even for (limited-functionality) multiple domains. In July(?), the core xen team switched the default scheduler to sedf; Xen/ia64 with sedf crashes early and mysteriously. As a result, in order to run Xen/ia64, it is necessary to specify "sched=bvt" on the command line. This is easily forgotten and is also one more annoying thing to explain to new users/developers. (And I can't find a way to pass the option on ski, which I sometimes use for debugging, so I instead have to manually change common/schedule.c.) Findings: When running bvt, the idle domain ("idle") never gets scheduled. When running sedf, idle gets scheduled often. (I consider this a bug in sedf or at least in the default parameters for it.) Idle is not a real domain... no "guest" or "user" code is run when idle is scheduled. When idle is run repeatedly, the stack pointer quickly goes down in value until the stack overruns other data. This results in bizarre errors. After idle is scheduled and started by context_switch, a timer interrupt often happens immediately when __enter_scheduler (after returning from context_switch) re-enables interrupts. When idle is scheduled alternately with domain0, no stack changes occurs. However, if a timer interrupt is processed while idle is active, a "stack activation record" gets created that NEVER gets unwound. Thus eventually (and fairly randomly), idle's stack overruns other data. Ramifications: If idle doesn't work "properly", domain0 must run on all processors in an SMP. So unless we implement "Xen SMP" and "guest SMP" concurrently, this problem will need to be fixed for Xen SMP to run. Fix: TBD. It may be possible to find/leverage some Xen/x86 code but we need to ensure that we minimize "full context switches" to idle as they are much more expensive on ia64 than on x86. _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |