[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] POD: soft lockups in dom0 kernel


  • To: Jan Beulich <JBeulich@xxxxxxxx>
  • From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>
  • Date: Wed, 29 Jan 2014 15:12:26 +0100
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 29 Jan 2014 14:13:05 +0000
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Cc:Subject:Date:Message-ID: User-Agent:In-Reply-To:References:MIME-Version: Content-Transfer-Encoding:Content-Type; b=awMJ9rL6y/NB2zpSIrud01aRJnOBAX1O029rAo7iHrgwxbyN6kT/CrNU ErCNjc5YK/VixTjxKnxP902+l/xUOTcrCIr0pvaVv69f9scD9H0iLfV81 FevzYXdJ/teEFub7AOrhO9RLZVdK5R5W5GeY8Ze3162RxVGCWHX/n8fxT L0J3O5wwGPQhhqOfe6RQSsMPHvJyIlwYR3wkLsve8NAc5tvyghFPXm46f AwARDNbR7jF4CpHyc+4umd+n/n8f+;
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi,

sorry for the delay.

Am Donnerstag 16 Januar 2014, 11:10:38 schrieb Jan Beulich:
> >>> On 05.12.13 at 14:55, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> > when creating a bigger (> 50 GB) HVM guest with maxmem > memory we get
> > softlockups from time to time.
> > 
> > kernel: [  802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351]
> > 
> > I tracked this down to the call of xc_domain_set_pod_target() and further
> > p2m_pod_set_mem_target().
> > 
> > Unfortunately I can this check only with xen-4.2.2 as I don't have a machine
> > with enough memory for current hypervisors. But it seems the code is nearly
> > the same.
> 
> While I still didn't see a formal report of this against SLE11 yet,
> attached a draft patch against the SP3 code base adding manual
> preemption to the hypercall path of privcmd. This is only lightly
> tested, and therefore has a little bit of debugging code still left in
> there. Mind giving this an try (perhaps together with the patch
> David had sent for the other issue - there may still be a need for
> further preemption points in the IOCTL_PRIVCMD_MMAP*
> handling, but without knowing for sure whether that matters to
> you I didn't want to add this right away)?
> 
> Jan

Today I did some tests with the patch. As the debug part didn't compile I
changed the per cpu variables to local variables.

OK it works! I tried several times to start a domU with
memory=100GB and maxmem=230GB and never got a soft lockup.
Following messages in /var/log/message on the first start:
Jan 29 14:14:45 gut1 kernel: [  178.976373] psi[03] 00000000:1 #2
Jan 29 14:14:46 gut1 kernel: [  179.008774] psi[03] 00000000:1 #4
Jan 29 14:14:46 gut1 kernel: [  179.073048] psi[03] 00000000:1 #8
Jan 29 14:14:46 gut1 kernel: [  179.219272] psi[03] 00000000:1 #10
Jan 29 14:14:47 gut1 kernel: [  180.220803] psi[03] 00000000:1 #20
Jan 29 14:14:48 gut1 kernel: [  181.844153] psi[03] 00000000:1 #40
Jan 29 14:14:51 gut1 kernel: [  184.769331] psi[03] 00000000:1 #80
Jan 29 14:14:56 gut1 kernel: [  189.169159] psi[03] 00000000:1 #100
Jan 29 14:14:57 gut1 kernel: [  190.178545] psi[03] 00000000:1 #200
Jan 29 14:15:03 gut1 kernel: [  196.256353] psi[00] 00000000:1 #1
Jan 29 14:15:03 gut1 kernel: [  196.260928] psi[00] 00000000:1 #2
Jan 29 14:15:03 gut1 kernel: [  196.497156] psi[00] 00000000:1 #4
Jan 29 14:15:03 gut1 kernel: [  196.552303] psi[00] 00000000:1 #8
Jan 29 14:15:04 gut1 kernel: [  197.035527] psi[00] 00000000:1 #10
Jan 29 14:15:04 gut1 kernel: [  197.060626] psi[01] 00000000:1 #1
Jan 29 14:15:04 gut1 kernel: [  197.064101] psi[01] 00000000:1 #2
Jan 29 14:15:04 gut1 kernel: [  197.096719] psi[01] 00000000:1 #4
Jan 29 14:15:04 gut1 kernel: [  197.148756] psi[01] 00000000:1 #8
Jan 29 14:15:04 gut1 kernel: [  197.517184] psi[01] 00000000:1 #10
Jan 29 14:15:05 gut1 kernel: [  198.153211] psi[01] 00000000:1 #20
Jan 29 14:15:06 gut1 kernel: [  199.162541] psi[02] 00000000:1 #1
Jan 29 14:15:06 gut1 kernel: [  199.164895] psi[02] 00000000:1 #2
Jan 29 14:15:06 gut1 kernel: [  199.169576] psi[02] 00000000:1 #4
Jan 29 14:15:06 gut1 kernel: [  199.178073] psi[02] 00000000:1 #8
Jan 29 14:15:06 gut1 kernel: [  199.195693] psi[02] 00000000:1 #10
Jan 29 14:15:06 gut1 kernel: [  199.335857] psi[02] 00000000:1 #20
Jan 29 14:15:06 gut1 kernel: [  199.805027] psi[02] 00000000:1 #40
Jan 29 14:15:07 gut1 kernel: [  200.753118] psi[00] 00000000:1 #20
Jan 29 14:15:08 gut1 kernel: [  201.524368] psi[01] 00000000:1 #40
Jan 29 14:15:09 gut1 kernel: [  202.692159] psi[01] 00000000:1 #80
Jan 29 14:15:11 gut1 kernel: [  204.968433] psi[01] 00000000:1 #100
Jan 29 14:15:16 gut1 kernel: [  209.712892] psi[01] 00000000:1 #200
Jan 29 14:15:32 gut1 kernel: [  225.940798] psi[01] 00000000:1 #400
Jan 29 14:15:38 gut1 kernel: [  231.360556] psi[00] 00000000:1 #40

Second:
Jan 29 14:49:19 gut1 kernel: [ 2250.788926] psi[02] 00000000:1 #80
Jan 29 14:49:26 gut1 kernel: [ 2257.360767] psi[02] 00000000:1 #100
Jan 29 14:49:37 gut1 kernel: [ 2268.912916] psi[02] 00000000:1 #200
Jan 29 14:50:09 gut1 kernel: [ 2300.804211] psi[01] 00000000:1 #800

Thanks.

Dietmar.

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.