[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/09/11 13:27, George Dunlap wrote: Sorry, forgot the patch... -G On Wed, Feb 9, 2011 at 12:27 PM, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara<andre.przywara@xxxxxxx> wrote:(XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24 (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24 (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24 (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25 (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25 (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25 (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26 (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26 (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26 (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27 (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27 (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27 (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27 (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28 (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28 (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28 (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28 (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28 (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29Interesting -- what seems to happen here is that as cpus are disabled, vcpus are "shovelled" in an accumulative fashion from one cpu to the next: * v18,34,42 start on cpu 24. * When 24 is brought down, they're all migrated to 25; then when 25 is brougth down, to 26, then to 27 * v24 is running on cpu 27, so when 27 is brought down, v24 is added to the mix * v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu 29. While that behavior may not be ideal, it should certainly be bug-free. Another interesting thing to note is that the bug happened on pcpu 32, but there were no advertised migrations from that cpu. If I understand the configuration of Andre's machine correctly, pcpu32 will be the target of the next migrations. This pcpu is member of the next numa node, correct? Could it be there is a problem with the call of domain_update_node_affinity() from cpu_disable_scheduler() ? Hmm, I think this could really be the problem. Andre, could you try the following patch? diff -r f1fac30a531b xen/common/schedule.c --- a/xen/common/schedule.c Wed Feb 09 08:58:11 2011 +0000 +++ b/xen/common/schedule.c Wed Feb 09 14:02:12 2011 +0100 @@ -491,6 +491,10 @@ int cpu_disable_scheduler(unsigned int c v->domain->domain_id, v->vcpu_id); cpus_setall(v->cpu_affinity); affinity_broken = 1; + } + if ( cpus_weight(v->cpu_affinity) < NR_CPUS ) + { + cpu_clear(cpu, v->cpu_affinity); } if ( v->processor == cpu ) Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |