[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] Xen4.2 S3 regression?
Here's my "Big hammer" debugging patch.
If I force the cpu to be scheduled on CPU0 when the appropriate cpu is not online, I can resume properly.
Clearly this is not the proper solution, and I'm sure the fix is subtle. I'm not seeing it right now though. Perhaps tomorrow morning.
If you have any ideas, I'm happy to run tests then.
/btg On Mon, Sep 24, 2012 at 4:46 PM, Ben Guthro <ben@xxxxxxxxxx> wrote:
I've managed to determine that _csched_cpu_pick is, for reasons not yet clear, picking a cpu id outside of the range of cpus that are valid for this system
(in this case cpu id 4, on a 2 core machine)
On Mon, Sep 24, 2012 at 4:30 PM, Keir Fraser <keir.xen@xxxxxxxxx> wrote:
Do a debug build so the backtrace can be trusted. It’s a NULL pointer dereference so shouldn’t be too tricky to make some headway on this one. Easier than the previous bug. :)
-- KeirWell...knock one bug down - and another crops up.
It appears that dom0_vcpu_pin is incompatible with S3.
I'll start digging into why, but if you have any thoughts from the stack below, I'd welcome any pointers.
/btg
(XEN) Preparing system for ACPI S3 state.
(XEN) Disabling non-boot CPUs ...
(XEN) Entering ACPI S3 state.
(XEN) mce_intel.c:1239: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank 1 extended MCE MSR 0
(XEN) CMCI: CPU0 has no CMCI support
(XEN) CPU0: Thermal monitoring enabled (TM2)
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs ...
(XEN) Booting processor 1/1 eip 8a000
(XEN) Initializing CPU#1
(XEN) CPU: L1 I cache: 32K, L1 D cache: 32K
(XEN) CPU: L2 cache: 3072K
(XEN) CPU: Physical Processor ID: 0
(XEN) CPU: Processor Core ID: 1
(XEN) CMCI: CPU1 has no CMCI support
(XEN) CPU1: Thermal monitoring enabled (TM2)
(XEN) CPU1: Intel(R) Core(TM)2 Duo CPU P8400 @ 2.26GHz stepping 06
(XEN) microcode: CPU1 updated from revision 0x60c to 0x60f, date = 2010-09-29
[ 60.100054] ACPI: Low-level resume complete
[ 60.100054] PM: Restoring platform NVS memory
[ 60.100054] Enabling non-boot CPUs ...
[ 60.100054] installing Xen timer for CPU 1
[ 60.100054] cpu 1 spinlock event irq 279
(XEN) ----[ Xen-4.2.1-pre x86_64 debug=n Tainted: C ]----
(XEN) CPU: 1
(XEN) RIP: e008:[<ffff82c480121562>] vcpu_migrate+0x172/0x360
(XEN) RFLAGS: 0000000000010096 CONTEXT: hypervisor
(XEN) rax: 00007d3b7fd17180 rbx: ffff82c4802e8ee0 rcx: ffff82c4802e8ee0
(XEN) rdx: ffff83013a3c5068 rsi: 0000000000000004 rdi: ffff8301300b7d68
(XEN) rbp: 0000000000000001 rsp: ffff8301300b7e28 r8: 0000000000000000
(XEN) r9: 000000000000003e r10: 000000000000003e r11: 0000000000000246
(XEN) r12: ffff83013a3c5068 r13: ffff83013a3c5068 r14: ffff82c4802d3140
(XEN) r15: 0000000000000001 cr0: 000000008005003b cr4: 00000000000026f0
(XEN) cr3: 0000000131a05000 cr2: 0000000000000060
(XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Xen stack trace from rsp=ffff8301300b7e28:
(XEN) ffff82c4802d3140 ffff83013a3c5068 0000000000000246 0000000000000004
(XEN) ffff8300bd2fe000 ffff82c4802e8ee0 00000004012d3140 ffff82c4802e8ee0
(XEN) ffff88003fc8e820 ffff8300bd2fe000 ffff8301355d8000 0000000000000000
(XEN) 0000000000000000 0000000000000000 ffff88003fc8e820 ffff82c480105a50
(XEN) 0000000000000000 ffff82c4801805ec 0000060f00000000 ffff82c480184f16
(XEN) 0000000000000032 78a20f6e65780b0f ffff88003976fdc8 ffff8300bd2fe000
(XEN) ffff88003976fe50 ffff8300bd2fe000 ffff88003976fda0 0000000000000001
(XEN) 0000000000000000 ffff82c480214288 ffff88003fc8e820 0000000000000000
(XEN) 0000000000000000 0000000000000001 ffff88003976fda0 ffff88003fc8bdc0
(XEN) 0000000000000246 ffff88003976fe60 00000000ffffffff 0000000000000000
(XEN) 0000000000000018 ffffffff8100130a 0000000000000000 0000000000000001
(XEN) 0000000000000007 0000010000000000 ffffffff8100130a 000000000000e033
(XEN) 0000000000000246 ffff88003976fd88 000000000000e02b d43d5f3fedaef5e7
(XEN) d3b2ddaeed5038ff 270adb813ad76c9b ddfd6ff5f85e6775 b5881cbf00000001
(XEN) ffff8300bd2fe000 0000003cba0dc180 0a109ac649c118a1
(XEN) Xen call trace:
(XEN) [<ffff82c480121562>] vcpu_migrate+0x172/0x360
(XEN) [<ffff82c480105a50>] do_vcpu_op+0x1e0/0x4a0
(XEN) [<ffff82c4801805ec>] do_invalid_op+0x19c/0x3f0
(XEN) [<ffff82c480184f16>] copy_from_user+0x26/0x90
(XEN) [<ffff82c480214288>] syscall_enter+0x88/0x8d
(XEN)
(XEN) Pagetable walk from 0000000000000060:
(XEN) L4[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000060
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
On Mon, Sep 24, 2012 at 10:28 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>> On 24.09.12 at 16:16, Ben Guthro <ben@xxxxxxxxxx> wrote:
> Would you prefer a separate [PATCH] email for this fix, or will you apply
> it as-is?
I'll put something together - the most important thing here obviously
is having a proper description. Plus I'd like to slightly extend this and
have acpi_dead_idle() actually use default_dead_idle(), just to have
things consolidated in one place. I assume I can put your S-o-b on
what you sent...
Jan
> On Mon, Sep 24, 2012 at 10:10 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>
>> >>> On 24.09.12 at 15:56, Ben Guthro <ben@xxxxxxxxxx> wrote:
>> > On Mon, Sep 24, 2012 at 9:34 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>> >> ...; the interesting ones are
>> >> - at the end of xen/arch/x86/acpu/cpu_idle.c:acpi_dead_idle()
>> >> - xen/arch/x86/domain.c:default_dead_idle()
>> >
>> >
>> > Thanks! This fixes the issue on this machine!
>>
>> Hooray!
>>
>> > Is this a reasonable long-term solution - or are there reasons not to
>> > call wbinvd() here?
>>
>> That's a perfectly valid adjustment (see my earlier reply where
>> I originally suggested it and explained why it may be necessary).
>>
>> Jan
>>
>>
Attachment:
debug1.patch
Description: Binary data
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|