[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown



On 25/06/15 17:13, Dario Faggioli wrote:
> On Thu, 2015-06-25 at 16:52 +0100, Andrew Cooper wrote:
>> On 25/06/15 16:04, Dario Faggioli wrote:
>>> On Thu, 2015-06-25 at 15:20 +0100, Andrew Cooper wrote:
>>>> On 25/06/15 13:15, Dario Faggioli wrote:
>>>>> # xl cpupool-cpu-remove Pool-0 8-15
>>>>> # xl cpupool-create name=\"Pool-1\"
>>>>> # xl cpupool-cpu-add Pool-1 8-15
>>>>> --> suspend
>>>>> --> resume
>>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>>>>> (XEN) CPU:    8
>>>>> (XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
>>>>> (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
>>>>> (XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 
>>>>> 0000000000000000
>>>>> (XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 
>>>>> 0000000000000000
>>>>> (XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  
>>>>> 0000000000000004
>>>>> (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 
>>>>> 0f0f0f0f0f0f0f0f
>>>>> (XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 
>>>>> 0000000000000009
>>>>> (XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 
>>>>> 00000000000026f0
>>>>> (XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>> (XEN) ... ... ...
>>>>> (XEN) Xen call trace:
>>>>> (XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
>>>>> (XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
>>>>> (XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
>>>>> (XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
>>>>> (XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
>>>>> (XEN)
>>>>> (XEN) ****************************************
>>>>> (XEN) Panic on CPU 8:
>>>>> (XEN) GENERAL PROTECTION FAULT
>>>>> (XEN) [error_code=0000]
>>>>> (XEN) ****************************************
>>>> What is the actual cause of the #GP fault?  There are no obviously
>>>> poised registers.  
>>>>
>>>             do
>>>             {
>>>                 /*
>>>                  * Get ahold of the scheduler lock for this peer CPU.
>>>                  *
>>>                  * Note: We don't spin on this lock but simply try it. 
>>> Spinning
>>>                  * could cause a deadlock if the peer CPU is also load
>>>                  * balancing and trying to lock this CPU.
>>>                  */
>>>                 spinlock_t *lock = pcpu_schedule_trylock(peer_cpu);
>>>
>>> We therefore enter the inner do{}while with, for instance (that's what
>>> I've seen in my debugging), peer_cpu=9, but we've not yet done
>>> cpu_schedule_up()-->alloc_pdata()-->etc. for that CPU, so we die at (or
>>> shortly after) the end of the code snippet shown above.
>> Aah - it is a dereference with %rax as a pointer, which is
>>
>> #define INVALID_PERCPU_AREA (0x8000000000000000L - (long)__per_cpu_start)
>>
> Exactly!
>
>> That explains the #GP fault which is due to a non-canonical address.
>>
>> It might be better to use 0xDEAD000000000000L as the constant to make it
>> slightly easier to spot as a poisoned pointer.
>>
> Indeed. :-)
>
>>> I can try to think at it and to come up with something if you think it's
>>> important...
>> Not to worry. I was more concerned about working out why it was dying
>> with an otherwise unqualified #GP fault.
>>
> Ok, thanks. So, just to clarify things to me, from your side, this patch
> needs "just" a better changelog, right?

Yes - Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> for the x86
nature, but I am not very familiar with the cpupool code, so would
prefer review from a knowledgeable 3rd party.

>
> Regards,
> Dario
>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.