[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/02/11 17:01, Stephan Diestelhorst wrote: On Wednesday 02 February 2011 16:14:25 Juergen Gross wrote:On 02/02/11 15:39, Stephan Diestelhorst wrote:We have the following theory of what happens: * some vcpus of a particular domain are currently in the process of being moved to the new poolThe only _vcpus_ to be moved between pools are the idle vcpus. And those never contribute to accounting in credit scheduler. We are moving _pcpus_ only (well, moving a domain between pools actually moves vcpus as well, but then the domain is paused).How do you ensure that the domain is paused and stays that way? Pausing the domain was what I had in mind, too... Look at sched_move_domain() in schedule.c: I'm calling domain_pause() before moving the vcpus and domain_unpause() after that. Despite the rant, it is amazing to see the ability to move running things around through this remote continuation trick! In my (ancient) balancer experiments I added hypervisor-threads just for side- stepping this issue..I think the easiest way to solve the problem would be to move the cpu to the new pool in a tasklet. This is possible now, because tasklets are always executed in the idle vcpus.Yep. That was exactly what I build. At the time stuff like that did not exist (2005).OTOH I'd like to understand what is wrong with my current approach...Nothing, in fact I like it. In my rant I complained about the fact that splitting the critical section accross this continuation looks scary, basically causing some generic red lights to turn on :-) And making reasoning about the correctness a little complicated, but that may well be a local issue ;-) Perhaps you can help solving the miracle: Could you replace the BUG_ON in sched_credit.c:389 with something like this: if (!is_idle_vcpu(per_cpu(schedule_data, cpu).curr)) { extern void dump_runq(unsigned char key); struct vcpu *vc = per_cpu(schedule_data, cpu).curr; printk("+++ (%d.%d) instead idle vcpu on cpu %d\n", vc->domain->domain_id, vc->vcpu_id, cpu); dump_runq('q'); BUG(); } Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |