[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split


  • To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
  • From: Juergen Gross <juergen.gross@xxxxxxxxxxxxxx>
  • Date: Wed, 16 Feb 2011 15:28:39 +0100
  • Cc: Andre Przywara <andre.przywara@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, "Diestelhorst, Stephan" <Stephan.Diestelhorst@xxxxxxx>
  • Delivery-date: Wed, 16 Feb 2011 06:29:44 -0800
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:Message-ID:Date:From:Organization: User-Agent:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=splSEDno0JqoSVphsXWradKiAcJroQl22PU44nzd/bUy9+QRbUDIGs48 tTkRQ+4atK8JffV+tG3ihN6pCpRCtGKCKqem2eJCprBzOjp3JYAeMQVQv cmUOV6Iuov+IAjrurZKgDwl+l315OCIqinFzgA1MumKbG4cfqLUu5/Zh0 UglRfIKNAQAiysx0JrwLN0ywa4Wmx1MpdaZzES4D3LlbOb/TNmmhloyvm bTGKyRbBU8tToAdNfa84j2mZ3cXzF;
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On 02/16/11 15:11, Juergen Gross wrote:
On 02/16/11 14:54, George Dunlap wrote:
Andre (and Juergen), can you try again with the attached patch?

What the patch basically does is try to make "cpu_disable_scheduler()"
do what it seems to say it does. :-) Namely, the various
scheduler-related interrutps (both per-cpu ticks and the master tick)
is a part of the scheduler, so disable them before doing anything, and
don't enable them until the cpu is really ready to go again.

To be precise:
* cpu_disable_scheduler() disables ticks
* scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
and does it after inserting the idle vcpu
* Modify semantics, s.t., {alloc,free}_pdata() don't actually start or
stop tickers
+ Call tick_{resume,suspend} in cpu_{up,down}, respectively

I tried this before :-)
It didn't work for Andre, but may be there were some bits missing.

* Modify credit1's tick_{suspend,resume} to handle the master ticker
as well.

With this patch (if dom0 doesn't get wedged due to all 8 vcpus being
on one pcpu), I can perform thousands of operations successfully.

Nice. I'll try later. In the moment I'm testing another patch (attached
for review, if you like). I think I've identified two possible races.

My patch works for me. I think I have to rework the locking for credit1, but
that shouldn't be too hard.

My machine survived 10000 iterations of your script with additional
consistency checks in the scheduler. Without my patch the machine crashed
after less then 500 iterations.


Juergen

--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@xxxxxxxxxxxxxx
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.