[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen4.2 S3 regression?



I've managed to determine that _csched_cpu_pick is, for reasons not yet clear, picking a cpu id outside of the range of cpus that are valid for this system
(in this case cpu id 4, on a 2 core machine)



On Mon, Sep 24, 2012 at 4:30 PM, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
Do a debug build so the backtrace can be trusted. It’s a NULL pointer dereference so shouldn’t be too tricky to make some headway on this one. Easier than the previous bug. :)

 -- Keir


On 24/09/2012 20:02, "Ben Guthro" <ben@xxxxxxxxxx> wrote:

Well...knock one bug down - and another crops up.

It appears that dom0_vcpu_pin is incompatible with S3.
I'll start digging into why, but if you have any thoughts from the stack below, I'd welcome any pointers.

/btg


(XEN) Preparing system for ACPI S3 state.
(XEN) Disabling non-boot CPUs ...
(XEN) Entering ACPI S3 state.
(XEN) mce_intel.c:1239: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank 1 extended MCE MSR 0
(XEN) CMCI: CPU0 has no CMCI support
(XEN) CPU0: Thermal monitoring enabled (TM2)
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) Booting processor 1/1 eip 8a000
(XEN) Initializing CPU#1
(XEN) CPU: L1 I cache: 32K, L1 D cache: 32K
(XEN) CPU: L2 cache: 3072K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 1
(XEN) CMCI: CPU1 has no CMCI support
(XEN) CPU1: Thermal monitoring enabled (TM2)
(XEN) CPU1: Intel(R) Core(TM)2 Duo CPU     P8400  @ 2.26GHz stepping 06
(XEN) microcode: CPU1 updated from revision 0x60c to 0x60f, date = 2010-09-29 
[   60.100054] ACPI: Low-level resume complete
[   60.100054] PM: Restoring platform NVS memory
[   60.100054] Enabling non-boot CPUs ...
[   60.100054] installing Xen timer for CPU 1
[   60.100054] cpu 1 spinlock event irq 279
(XEN) ----[ Xen-4.2.1-pre  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82c480121562>] vcpu_migrate+0x172/0x360
(XEN) RFLAGS: 0000000000010096   CONTEXT: hypervisor
(XEN) rax: 00007d3b7fd17180   rbx: ffff82c4802e8ee0   rcx: ffff82c4802e8ee0
(XEN) rdx: ffff83013a3c5068   rsi: 0000000000000004   rdi: ffff8301300b7d68
(XEN) rbp: 0000000000000001   rsp: ffff8301300b7e28   r8:  0000000000000000
(XEN) r9:  000000000000003e   r10: 000000000000003e   r11: 0000000000000246
(XEN) r12: ffff83013a3c5068   r13: ffff83013a3c5068   r14: ffff82c4802d3140
(XEN) r15: 0000000000000001   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000131a05000   cr2: 0000000000000060
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff8301300b7e28:
(XEN)    ffff82c4802d3140 ffff83013a3c5068 0000000000000246 0000000000000004
(XEN)    ffff8300bd2fe000 ffff82c4802e8ee0 00000004012d3140 ffff82c4802e8ee0
(XEN)    ffff88003fc8e820 ffff8300bd2fe000 ffff8301355d8000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff88003fc8e820 ffff82c480105a50
(XEN)    0000000000000000 ffff82c4801805ec 0000060f00000000 ffff82c480184f16
(XEN)    0000000000000032 78a20f6e65780b0f ffff88003976fdc8 ffff8300bd2fe000
(XEN)    ffff88003976fe50 ffff8300bd2fe000 ffff88003976fda0 0000000000000001
(XEN)    0000000000000000 ffff82c480214288 ffff88003fc8e820 0000000000000000
(XEN)    0000000000000000 0000000000000001 ffff88003976fda0 ffff88003fc8bdc0
(XEN)    0000000000000246 ffff88003976fe60 00000000ffffffff 0000000000000000
(XEN)    0000000000000018 ffffffff8100130a 0000000000000000 0000000000000001
(XEN)    0000000000000007 0000010000000000 ffffffff8100130a 000000000000e033
(XEN)    0000000000000246 ffff88003976fd88 000000000000e02b d43d5f3fedaef5e7
(XEN)    d3b2ddaeed5038ff 270adb813ad76c9b ddfd6ff5f85e6775 b5881cbf00000001
(XEN)    ffff8300bd2fe000 0000003cba0dc180 0a109ac649c118a1
(XEN) Xen call trace:
(XEN)    [<ffff82c480121562>] vcpu_migrate+0x172/0x360
(XEN)    [<ffff82c480105a50>] do_vcpu_op+0x1e0/0x4a0
(XEN)    [<ffff82c4801805ec>] do_invalid_op+0x19c/0x3f0
(XEN)    [<ffff82c480184f16>] copy_from_user+0x26/0x90
(XEN)    [<ffff82c480214288>] syscall_enter+0x88/0x8d
(XEN)    
(XEN) Pagetable walk from 0000000000000060:
(XEN)  L4[0x000] = 0000000000000000 ffffffffffffffff
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000060
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...


On Mon, Sep 24, 2012 at 10:28 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>> On 24.09.12 at 16:16, Ben Guthro <ben@xxxxxxxxxx> wrote:
> Would you prefer a separate [PATCH] email for this fix, or will you apply
> it as-is?

I'll put something together - the most important thing here obviously
is having a proper description. Plus I'd like to slightly extend this and
have acpi_dead_idle() actually use default_dead_idle(), just to have
things consolidated in one place. I assume I can put your S-o-b on
what you sent...

Jan

> On Mon, Sep 24, 2012 at 10:10 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>
>> >>> On 24.09.12 at 15:56, Ben Guthro <ben@xxxxxxxxxx> wrote:
>> > On Mon, Sep 24, 2012 at 9:34 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>> >> ...; the interesting ones are
>> >> - at the end of xen/arch/x86/acpu/cpu_idle.c:acpi_dead_idle()
>> >> - xen/arch/x86/domain.c:default_dead_idle()
>> >
>> >
>> > Thanks! This fixes the issue on this machine!
>>
>> Hooray!
>>
>> > Is this a reasonable long-term solution - or are there reasons not to
>> > call wbinvd() here?
>>
>> That's a perfectly valid adjustment (see my earlier reply where
>> I originally suggested it and explained why it may be necessary).
>>
>> Jan
>>
>>






_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.