[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split


  • To: Andre Przywara <andre.przywara@xxxxxxx>
  • From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
  • Date: Fri, 28 Jan 2011 12:44:00 +0100
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
  • Delivery-date: Fri, 28 Jan 2011 03:45:45 -0800
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=wNK8SVJA/JGit5yUpdhW+J4U/lJodhEMM4fffT5LjUDSpKnQKMuuCbbk eXs8+JhsZ6TRNBxD/+EFQ1/qMfWUiHAj+6zXAJucpFUuPh742O7Sbok7e 2n/QPjE98vPwIzzwYSscYY6oZyv31vbY6y9KCMdwT66hC6j4MbJVAmB1C Bqm/FSKj7tfcrwdZKz4uYby4wwwv+4AHz15aH5+XguBt3izF6Z90V0cpP D2wgIC+IKajnZ8IN1gpMG3xmk5sRB;
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 01/28/11 12:07, Andre Przywara wrote:
Juergen Gross wrote:
On 01/28/11 00:18, Andre Przywara wrote:
Hi,

when I boot my machine without restricting Dom0 (dom0_mem=
dom0_max_vcpus=) I get an _hypervisor_ crash when I run
# xl cpupool-numa-split
If Dom0's resources are limited on the Xen cmdline, everything works
fine.
The crashdump points to a scheduling problem with weights, so I assume
the NUMA distribution algorithm some fools the hypervisor completely.

I will investigate this further tomorrow, but maybe someone has some
good idea.

I've seen this once with an older cpupool version on a 24 processor
machine.
It was NOT related to NUMA, but did occur only on reboot after a Dom0
panic.
The machine had an init script creating a cpupool and populating it with
cpus. The machine was in a panic loop due to the BUG in sched_acct
then until
it was resetted manually. After the reset the problem was gone.

As I was never able to reproduce the problem later (the same software is
running on dozens of machines!), I assumed there was a problem related to
the first Dom0 panic, may be some destroyed BIOS tables.

Can the crash be reproduced easily?
Yes.
If I don't specify dom0_max_vcpus= and dom0_mem= on the Xen cmdline, I
can reliably trigger the crash with xl cpupool-numa-split.
Omitting dom0_max_vcpus only does not suffice.

Do I understand correctly?
No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ?

Could you try this patch?

diff -r b59f04eb8978 xen/common/schedule.c
--- a/xen/common/schedule.c     Fri Jan 21 18:06:23 2011 +0000
+++ b/xen/common/schedule.c     Fri Jan 28 12:42:46 2011 +0100
@@ -1301,7 +1301,9 @@ void schedule_cpu_switch(unsigned int cp

     idle = idle_vcpu[cpu];
     ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
+    BUG_ON(ppriv == NULL);
     vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
+    BUG_ON(vpriv == NULL);

     pcpu_schedule_lock_irqsave(cpu, flags);



--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.