[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
Date: Thu, 25 Jun 2015 18:13:42 +0200
Cc: Juergen Gross <jgross@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Jan Beulich <JBeulich@xxxxxxxx>
Delivery-date: Thu, 25 Jun 2015 16:14:20 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, 2015-06-25 at 16:52 +0100, Andrew Cooper wrote:
> On 25/06/15 16:04, Dario Faggioli wrote:
> > On Thu, 2015-06-25 at 15:20 +0100, Andrew Cooper wrote:
> >> On 25/06/15 13:15, Dario Faggioli wrote:

> >>> # xl cpupool-cpu-remove Pool-0 8-15
> >>> # xl cpupool-create name=\"Pool-1\"
> >>> # xl cpupool-cpu-add Pool-1 8-15
> >>> --> suspend
> >>> --> resume
> >>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> >>> (XEN) CPU:    8
> >>> (XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
> >>> (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
> >>> (XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 
> >>> 0000000000000000
> >>> (XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 
> >>> 0000000000000000
> >>> (XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  
> >>> 0000000000000004
> >>> (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 
> >>> 0f0f0f0f0f0f0f0f
> >>> (XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 
> >>> 0000000000000009
> >>> (XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 
> >>> 00000000000026f0
> >>> (XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
> >>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> >>> (XEN) ... ... ...
> >>> (XEN) Xen call trace:
> >>> (XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
> >>> (XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
> >>> (XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
> >>> (XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
> >>> (XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
> >>> (XEN)
> >>> (XEN) ****************************************
> >>> (XEN) Panic on CPU 8:
> >>> (XEN) GENERAL PROTECTION FAULT
> >>> (XEN) [error_code=0000]
> >>> (XEN) ****************************************
> >> What is the actual cause of the #GP fault?  There are no obviously
> >> poised registers.  
> >>

> >             do
> >             {
> >                 /*
> >                  * Get ahold of the scheduler lock for this peer CPU.
> >                  *
> >                  * Note: We don't spin on this lock but simply try it. 
> > Spinning
> >                  * could cause a deadlock if the peer CPU is also load
> >                  * balancing and trying to lock this CPU.
> >                  */
> >                 spinlock_t *lock = pcpu_schedule_trylock(peer_cpu);
> >
> > We therefore enter the inner do{}while with, for instance (that's what
> > I've seen in my debugging), peer_cpu=9, but we've not yet done
> > cpu_schedule_up()-->alloc_pdata()-->etc. for that CPU, so we die at (or
> > shortly after) the end of the code snippet shown above.
> 
> Aah - it is a dereference with %rax as a pointer, which is
> 
> #define INVALID_PERCPU_AREA (0x8000000000000000L - (long)__per_cpu_start)
> 
Exactly!

> That explains the #GP fault which is due to a non-canonical address.
> 
> It might be better to use 0xDEAD000000000000L as the constant to make it
> slightly easier to spot as a poisoned pointer.
> 
Indeed. :-)

> > I can try to think at it and to come up with something if you think it's
> > important...
> 
> Not to worry. I was more concerned about working out why it was dying
> with an otherwise unqualified #GP fault.
> 
Ok, thanks. So, just to clarify things to me, from your side, this patch
needs "just" a better changelog, right?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  - From: Andrew Cooper

References:
- [Xen-devel] [PATCH 0/4] xen: sched / cpupool: fixes and improvements, mostly for when suspend/resume is involved
  - From: Dario Faggioli
- [Xen-devel] [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  - From: Dario Faggioli
- Re: [Xen-devel] [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  - From: Andrew Cooper
- Re: [Xen-devel] [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  - From: Dario Faggioli
- Re: [Xen-devel] [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
  - From: Andrew Cooper

Prev by Date: Re: [Xen-devel] [PATCH v4 12/17] x86/hvm: split I/O completion handling from state model
Next by Date: Re: [Xen-devel] [PATCH v3 1/6] libxl: allow /local/domain/$LIBXL_TOOLSTACK_DOMID/device-model/$DOMID to be written by $DOMID
Previous by thread: Re: [Xen-devel] [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
Next by thread: Re: [Xen-devel] [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.