[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x



On Wed, Mar 13, 2013 at 09:50:39PM +0100, Marek Marczykowski wrote:
> Hi,
> 
> I've still have problems with ACPI(?) on Xen. After some system startup or
> resume CPU temperature goes high although all domUs (and dom0) are idle. On
> "good" system startup it is about 50-55C, on "bad" - above 67C (most time
> above 70C). I've noticed difference in C-states repored by Xen (attached
> files). On "bad" startups in addition suspend doesn't work - system restarts
> during suspend (still didn't managed to get console messages - I don't have
> serial port on this system). Note that sometimes system boots fine ("good"
> state), but problem occurs after some suspend/resume cycles. Some time ago
> I've got other symptoms: only CPU0 was used - for all VCPUs (according to xl
> vcpu-list). Maybe it is related?
> 
> Hardware: Dell Latitude E6420
> CPU: Intel i5-2520M
> 
> Software:
> xen stable-4.1 as of 15.02 (last commit: "xen: sched_creadit: improve picking
> up the idle CPU for a VCPU"), with reverted commit "Introduce system_state
> variable."
> But the same problem on vanilla xen 4.1.2.
> 
> Linux 3.7.6 - happens almost every boot. On Linux 3.7.4 happens much rarer
> (but still occurs).
> Kernel config:
> http://git.qubes-os.org/gitweb/?p=marmarek/kernel.git;a=blob;f=config-pvops;h=a6e953f71cdc84556571b592b8af87a5a4f9a8d0;hb=HEAD
> I've tried some bisect from 3.7.4 to 3.7.6, but without success because
> problem isn't 100% reproducible.
> 
> Any ideas?

That C-states difference is important. The SYSIO part on your box means that the
CPU ends up doing an MWAIT. An HALT on the other hand is not so power-saving
friendly.

Looking at this:
> (XEN) no cpu_id for acpi_id 5
> (XEN) no cpu_id for acpi_id 6
> (XEN) no cpu_id for acpi_id 7
> (XEN) no cpu_id for acpi_id 8

.. means that xen-acpi-processor was trying to probe for the ACPI IDs of the
the other CPUs that the machine theoritcally can support. That means it got
the ACPI information for the first four CPUs (which is good).

You can as the first step in trying to figure this out, add #define DEBUG 1
in xen-acpi-processor.c right before any of the #includes. And also boot
Xen with 'cpufreq=verbose'. That should tell you what kind of C-states the
xen-acpi-processor uploaded (And if it did it for all of the vCPUS).

If both bootups show that we do upload the C-states for all the CPUs but they
vary that means digging a bit deeper in the ACPI code. Specifically in 
acpi_processor_get_power_info_cst and seeing if it hits any of the 'continue'.

Then I would say take also the DSDT for both bootups and compare them. It might
be that the BIOS is using a scratch register at reboot to construct the C-states
and somehow it ends up being corrupted. Which means that on the next warm reboot
the C-states has bogus data. This does show up in the field :-(

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.