[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

To: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
Date: Mon, 7 Feb 2011 15:55:54 +0000
Cc: Andre Przywara <andre.przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@xxxxxxx>
Delivery-date: Mon, 07 Feb 2011 07:56:53 -0800
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=tF1A4d0/u68HNKLbxAStVJXHomIlbsetSucTm1Pd70AhjHZA+m/N6WwPMrYHE4eIEj XqXNbMpVluaKeCgngALc4505fx827bNkgUE98Q19v/SAIMWKTJcXhqDkbPi4RtBfJUAY RyxbFexzc0/pswGSJ30RrXz5xhj+VdYw/HQlQ=
List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Juergen,

What is supposed to happen if a domain is in cpupool0, and then all of
the cpus are taken out of cpupool0?  Is that possible?

It looks like there's code in cpupools.c:cpupool_unassign_cpu() which
will move all VMs in a cpupool to cpupool0 before removing the last
cpu.  But what happens if cpupool0 is the pool that has become empty?
It seems like that breaks a lot of the assumptions; e.g.,
sched_move_domain() seems to assume that the pool we're moving a VM to
actually has cpus.

While we're at it, what's with the "(cpu != cpu_moving_cpu)" in the
first half of cpupool_unassign_cpu()?  Under what conditions are you
anticipating cpupool_unassign_cpu() being called a second time before
the first completes?  If you have to abort the move because
schedule_cpu_switch() failed, wouldn't it be better just to roll the
whole transaction back, rather than leaving it hanging in the middle?

Hmm, and why does RMCPU call cpupool_get_by_id() with exact==0?  What
could possibly be the use of grabbing a random cpupool and then trying
to remove the specified cpu from it?

Andre, you might think about folding the attached patch into your debug patch.

 -George

On Mon, Feb 7, 2011 at 1:32 PM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:
> On 02/07/11 13:38, Andre Przywara wrote:
>>
>> Juergen,
>>
>> as promised some more debug data. This is from c/s 22858 with Stephans
>> debug patch (attached).
>> We get the following dump when the hypervisor crashes, note that the
>> first lock is different from the second and subsequent ones:
>>
>> (XEN) sched_credit.c, 572: prv: ffff831836df2970 &prv->lock:
>> ffff831836df2970 prv->weight: 256 sdom->active_vcpu_count: 3
>> sdom->weight: 256
>> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
>> ffff830437ffa5e0 prv->weight: 768 sdom->active_vcpu_count: 4
>> sdom->weight: 256
>> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
>> ffff830437ffa5e0 prv->weight: 1024 sdom->active_vcpu_count: 5
>> sdom->weight: 256
>> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
>> ffff830437ffa5e0 prv->weight: 1280 sdom->active_vcpu_count: 6
>> sdom->weight: 256
>>
>> ....
>>
>> Hope that gives you an idea. I attach the whole log for your reference.
>
> Hmm, could it be your log wasn't created with the attached patch? I'm
> missing
> Dom-Id and VCPU from the printk() above, which would be interesting (at
> least
> I hope so)...
> Additionally printing the local pcpu number would help, too.
> And could you add a printk for the new prv address in csched_init()?
>
> It would be nice if you could enable cpupool diag output. Please use the
> attached patch (includes the previous patch for executing the cpu move on
> the
> cpu to be moved, plus some diag printk corrections).
>
>
> Juergen
>
> --
> Juergen Gross                 Principal Developer Operating Systems
> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
> Fujitsu Technology Solutions              e-mail:
> juergen.gross@xxxxxxxxxxxxxx
> Domagkstr. 28                           Internet: ts.fujitsu.com
> D-80807 Muenchen                 Company details:
> ts.fujitsu.com/imprint.html
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
>

Attachment: cpupools-bug-on-move-to-self.diff
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

Follow-Ups:
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
  - From: Juergen Gross

References:
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
  - From: Stephan Diestelhorst
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
  - From: Juergen Gross
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
  - From: Stephan Diestelhorst
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
  - From: Juergen Gross
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
  - From: Juergen Gross
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
  - From: Andre Przywara
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
  - From: Andre Przywara
- Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
  - From: Juergen Gross

Prev by Date: Re: [Xen-devel] [PATCH] netback: allow arbitrary mtu size until frontend connects
Next by Date: [Xen-devel] Re: [PATCH] sched: provide scheduler_ipi() callback in response to smp_send_reschedule()
Previous by thread: Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Next by thread: Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.