[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split


  • To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
  • From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
  • Date: Tue, 08 Feb 2011 06:43:43 +0100
  • Cc: Andre Przywara <andre.przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@xxxxxxx>
  • Delivery-date: Mon, 07 Feb 2011 21:44:38 -0800
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=SmGXXOZqSkyZKijv7PChx2UPYxcXw7JSXbzgHKhJS8opAhE6BBq+cGx2 E9s1GGVN9KlTjgSQsFxZ02ZdKnl88HCGPoMV2N8rv5DZrDOa2V3auSc3j ieErUMmkKJtbqNToyycU04QRd8+aOlgvs9co1bgqvjyTn2S7IBzON31oM HebArq1n3XyDPTLg73PPoTtIHzZUzTlYNe4d4c6AOj0F83MFIOjbl6IYT 3GpKXqOB03qoS84dWd7S8ksxTWkow;
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 02/07/11 16:55, George Dunlap wrote:
Juergen,

What is supposed to happen if a domain is in cpupool0, and then all of
the cpus are taken out of cpupool0?  Is that possible?

No. Cpupool0 can't be without any cpu, as Dom0 is always member of cpupool0.


It looks like there's code in cpupools.c:cpupool_unassign_cpu() which
will move all VMs in a cpupool to cpupool0 before removing the last
cpu.  But what happens if cpupool0 is the pool that has become empty?
It seems like that breaks a lot of the assumptions; e.g.,
sched_move_domain() seems to assume that the pool we're moving a VM to
actually has cpus.

The move of VMs to cpupool0 is done only for domains which are dying.
If there are any active domains in the cpupool, removing the last cpu from
it will be denied.


While we're at it, what's with the "(cpu != cpu_moving_cpu)" in the
first half of cpupool_unassign_cpu()?  Under what conditions are you
anticipating cpupool_unassign_cpu() being called a second time before
the first completes?  If you have to abort the move because
schedule_cpu_switch() failed, wouldn't it be better just to roll the
whole transaction back, rather than leaving it hanging in the middle?

Not really. It could take some time until all vcpus have been migrated to
another cpu. In this case -EAGAIN is returned and the cpu is already
removed from the cpumask of valid cpus for that cpupool to avoid scheduling
of other vcpus on that cpu. Without cpu_moving_cpu there would be no
forward progress guaranteed.


Hmm, and why does RMCPU call cpupool_get_by_id() with exact==0?  What
could possibly be the use of grabbing a random cpupool and then trying
to remove the specified cpu from it?

This is a very good question :-)
I think this should be fixed. Seems to be a copy and paste error. I'll send a
patch.


Thanks for your thoughts,


Juergen

--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.