Xen project Mailing List

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

To: Andre Przywara <andre.przywara@xxxxxxx>

From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>

Date: Fri, 28 Jan 2011 12:44:00 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>

Delivery-date: Fri, 28 Jan 2011 03:45:45 -0800

Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=wNK8SVJA/JGit5yUpdhW+J4U/lJodhEMM4fffT5LjUDSpKnQKMuuCbbk eXs8+JhsZ6TRNBxD/+EFQ1/qMfWUiHAj+6zXAJucpFUuPh742O7Sbok7e 2n/QPjE98vPwIzzwYSscYY6oZyv31vbY6y9KCMdwT66hC6j4MbJVAmB1C Bqm/FSKj7tfcrwdZKz4uYby4wwwv+4AHz15aH5+XguBt3izF6Z90V0cpP D2wgIC+IKajnZ8IN1gpMG3xmk5sRB;

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 01/28/11 12:07, Andre Przywara wrote:

Juergen Gross wrote:

On 01/28/11 00:18, Andre Przywara wrote:

Hi,

when I boot my machine without restricting Dom0 (dom0_mem=
dom0_max_vcpus=) I get an _hypervisor_ crash when I run
# xl cpupool-numa-split
If Dom0's resources are limited on the Xen cmdline, everything works
fine.
The crashdump points to a scheduling problem with weights, so I assume
the NUMA distribution algorithm some fools the hypervisor completely.

I will investigate this further tomorrow, but maybe someone has some
good idea.


I've seen this once with an older cpupool version on a 24 processor
machine.
It was NOT related to NUMA, but did occur only on reboot after a Dom0
panic.
The machine had an init script creating a cpupool and populating it with
cpus. The machine was in a panic loop due to the BUG in sched_acct
then until
it was resetted manually. After the reset the problem was gone.

As I was never able to reproduce the problem later (the same software is
running on dozens of machines!), I assumed there was a problem related to
the first Dom0 panic, may be some destroyed BIOS tables.

Can the crash be reproduced easily?

Yes.
If I don't specify dom0_max_vcpus= and dom0_mem= on the Xen cmdline, I
can reliably trigger the crash with xl cpupool-numa-split.
Omitting dom0_max_vcpus only does not suffice.

Do I understand correctly? No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ? Could you try this patch? diff -r b59f04eb8978 xen/common/schedule.c --- a/xen/common/schedule.c Fri Jan 21 18:06:23 2011 +0000 +++ b/xen/common/schedule.c Fri Jan 28 12:42:46 2011 +0100 @@ -1301,7 +1301,9 @@ void schedule_cpu_switch(unsigned int cp idle = idle_vcpu[cpu]; ppriv = SCHED_OP(new_ops, alloc_pdata, cpu); + BUG_ON(ppriv == NULL); vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv); + BUG_ON(vpriv == NULL); pcpu_schedule_lock_irqsave(cpu, flags); -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.