[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x



On 15.03.2013 14:02, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 13, 2013 at 09:50:39PM +0100, Marek Marczykowski wrote:
>> Hi,
>>
>> I've still have problems with ACPI(?) on Xen. After some system startup or
>> resume CPU temperature goes high although all domUs (and dom0) are idle. On
>> "good" system startup it is about 50-55C, on "bad" - above 67C (most time
>> above 70C). I've noticed difference in C-states repored by Xen (attached
>> files). On "bad" startups in addition suspend doesn't work - system restarts
>> during suspend (still didn't managed to get console messages - I don't have
>> serial port on this system). Note that sometimes system boots fine ("good"
>> state), but problem occurs after some suspend/resume cycles. Some time ago
>> I've got other symptoms: only CPU0 was used - for all VCPUs (according to xl
>> vcpu-list). Maybe it is related?
>>
>> Hardware: Dell Latitude E6420
>> CPU: Intel i5-2520M
>>
>> Software:
>> xen stable-4.1 as of 15.02 (last commit: "xen: sched_creadit: improve picking
>> up the idle CPU for a VCPU"), with reverted commit "Introduce system_state
>> variable."
>> But the same problem on vanilla xen 4.1.2.
>>
>> Linux 3.7.6 - happens almost every boot. On Linux 3.7.4 happens much rarer
>> (but still occurs).
>> Kernel config:
>> http://git.qubes-os.org/gitweb/?p=marmarek/kernel.git;a=blob;f=config-pvops;h=a6e953f71cdc84556571b592b8af87a5a4f9a8d0;hb=HEAD
>> I've tried some bisect from 3.7.4 to 3.7.6, but without success because
>> problem isn't 100% reproducible.
>>
>> Any ideas?
> 
> That C-states difference is important. The SYSIO part on your box means that 
> the
> CPU ends up doing an MWAIT. An HALT on the other hand is not so power-saving
> friendly.
> 
> Looking at this:
>> (XEN) no cpu_id for acpi_id 5
>> (XEN) no cpu_id for acpi_id 6
>> (XEN) no cpu_id for acpi_id 7
>> (XEN) no cpu_id for acpi_id 8
> 
> .. means that xen-acpi-processor was trying to probe for the ACPI IDs of the
> the other CPUs that the machine theoritcally can support. That means it got
> the ACPI information for the first four CPUs (which is good).
> 
> You can as the first step in trying to figure this out, add #define DEBUG 1
> in xen-acpi-processor.c right before any of the #includes. And also boot
> Xen with 'cpufreq=verbose'. That should tell you what kind of C-states the
> xen-acpi-processor uploaded (And if it did it for all of the vCPUS).
> 
> If both bootups show that we do upload the C-states for all the CPUs but they
> vary that means digging a bit deeper in the ACPI code. Specifically in 
> acpi_processor_get_power_info_cst and seeing if it hits any of the 'continue'.
> 
> Then I would say take also the DSDT for both bootups and compare them. It 
> might
> be that the BIOS is using a scratch register at reboot to construct the 
> C-states
> and somehow it ends up being corrupted. Which means that on the next warm 
> reboot
> the C-states has bogus data. This does show up in the field :-(

Finally I've found some time for further debugging this. And it looks like
some deeper ACPI code problem...

I've switched to 3.8.4, on which problem is much easier to reproduce (almost
every startup).

On bad bootup, xen-acpi-processor didn't found any C-state: for each CPU
_pr.flags.power and _pr->power.count was 0 (but flags.power_setup_done=1). In
this case suspend (or shutdown) always ends up with reset.

On good one xen-acpi-processor got C1-C3 states for each CPU, then suspend
succeeded, but after resume CPU0 had C1-C3, but others only C1. Reloading
xen-acpi-processor (rmmod -f...) fixes this (according to xl debug-key c), but
still temperature keep high. Regardless of xen-acpi-processor reloading, next
suspend always fails.

Not sure how C-states can be related to S3 suspend, but perhaps something more
general with ACPI is wrong?

Each time DSDT (get from /sys/firmware/acpi/tables) is exactly the same.

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.