Xen project Mailing List

Re: [Xen-users] CPU frequency scaling - there be dragons here (resolved)

From: Daniel Widenfalk <Daniel@xxxxxxxxxxxx>

Date: Sat, 4 Jun 2016 09:48:04 +0200

Delivery-date: Sat, 04 Jun 2016 07:49:47 +0000

List-id: Xen user discussion <xen-users.lists.xen.org>

TL;DR: To get hypervisor frequency scaling, i.e. cpufreq=xen:..., to work on Debian Jessie/Stretch with an AMD-8370 (on a GIGABYTE 990FXA-UD3 R5 motherboard): 0) Debian grub setup files for xen do not handle efi boot properly. You'll need to edit /etc/grub.d/20_linux_xen and add a test for "efi" on the grub platform, i.e. change if [ "\$grub_platform" = "pc" -o "\$grub_platform" = "" ] to

if [ "\$grub_platform" = "pc" -o "\$grub_platform" = "efi" -o

"\$grub_platform" = "" ]

or grub will boot xen with the --no-real-mode flag set leaving you without a working console. (not (yet) reported to Debian) 1) Debian base installation does not add xen-acpi-processor to /etc/modules. I could not find a note anywhere that this module had to be loaded for hypervisor frequency scaling to work. Just add it, simple 'nuff. (not (yet) reported to Debian) This should probably be added to the XEN wiki. 2) *IMPORTANT* Do Not Enable HPC (High Performance Computing) In Your BIOS settings. This seems to cause all kinds of nasty breakage! (see further comments below) On 2016-06-03 23:11, Daniel Widenfalk wrote:

Hi all,

I've been trying to get cpu frequency scaling to work on my new
home server but I keep running into all kinds of problems.

HW: AMD FX-8370 on a GIGABYTE 990FXA-UD3-R5 (F2 bios)
SW: Debian stretch - Xen 4.6 + Linux 4.5.4

When running without Xen I can use cpufrequtils to set and monitor
cpufreq and it seems to be working. I see all 8 CPUs having samples
in all available P and C states. So, before installing and using
Xen it seems that my MB+CPU combo is supporting cpu frequency
scaling together with Linux 4.5.4.

My problems start when I try to start using xen.

Side note : When installing Xen on an EFI booting system it seems
            that the grub configuration files are incomplete. The
            system type test does not check for "efi" and will thus
            launch xen using the --no-real-mode command line option
            resulting in no console. IMO this is a Debian bug, esp.
            as the --no-real-mode flag is notes as "for debug purposes
            only".

So, once I got my console working properly (still waiting for my serial
-> USB cable) I tried to get cpufreq=xen and xenpm to work but to no
avail.

# xenpm set-scaling-governor ondemand

[CPU0] failed to set governor name (22 - Invalid argument)
[CPU1] failed to set governor name (22 - Invalid argument)
...

# xenpm get-cpufreq-para

Simply did not emit any output. Digging through the code and adding
some printk debug statements I found out that xen/drivers/cpufreq/
cpufreq.c:set_px_pminfo was not being called and therefore
processor_pminfo was never set up. At first I thought this was an error
in the xen hypervisor boot code but looking through the linux kernel
code I did find a call to said function (through a hypervisor call).
This call was hidden in the xen-acpi-processor module which in Debian
seems to be compiled as a module and, for some reason, is not loaded
at boot time.

HUH!

A simple

# modprobe xen-acpi-processor

made all kinds of xenpm shenanigans start working, including frequency
scaling! Woot!

However, I didn't like the output of "xenpm get-cpufreq-para":

--- excerpt from get-cpufreq-para.log ---
root@hostel:~# xenpm get-cpufreq-para
cpu id               : 0
affected_cpus        : 0 1 2 3 4 5 6 7
cpuinfo frequency    : max [4000000] min [1400000] cur [1400000]
scaling_driver       : Unknown
scaling_avail_gov    : userspace performance powersave ondemand
current_governor     : ondemand
  ondemand specific  :
    sampling_rate    : max [10000000] min [10000] cur [20000]
    up_threshold     : 80
scaling_avail_freq   : 4000000 *1400000
scaling frequency    : max [4000000] min [1400000] cur [1400000]
turbo mode           : enabled

cpu id               : 1
affected_cpus        : 0 1 2 3 4 5 6 7
cpuinfo frequency    : max [4000000] min [1400000] cur [1400000]
scaling_driver       : Unknown
scaling_avail_gov    : userspace performance powersave ondemand
current_governor     : ondemand
  ondemand specific  :
    sampling_rate    : max [10000000] min [10000] cur [20000]
    up_threshold     : 80
scaling_avail_freq   : 4000000 *1400000 -1692603520 -755174760 0 41000
3000 44000 8000 -1692187520 0 1000 0 1000 0 1000 0 1000 0 1000 0 1000 0
1000 0 1000 0 1000 0 1000 0 1000 0 0 0 0 0 0 0 0 0 0 0
scaling frequency    : max [4000000] min [1400000] cur [1400000]
turbo mode           : enabled

...
--- excerpt from get-cpufreq-para.log ---

As you can see, the information for CPU0 looks correct but the other 7
are showing a number of bogus frequencies! However, turning up the
loglevel on xen shows that each CPU is, in fact, being initialized with
only two power states each:

--- excerpt from xen-modprobe.log ---
(XEN) Set CPU acpi_id(1) cpuid(0) Px State info:
(XEN)   _PCT: descriptor=130, length=12, space_id=127, bit_width=64,
bit_offset=0, reserved=0, address=3221291106
(XEN)   _PCT: descriptor=130, length=12, space_id=127, bit_width=64,
bit_offset=0, reserved=0, address=0
(XEN)   _PSS: state_count=2
(XEN)   State0: 4000MHz 16362mW 4us 4us 0 0
(XEN)   State1: 1400MHz 3285mW 4us 4us 0x4 0x4
(XEN)   _PSD: num_entries=5 rev=0 domain=0 coord_type=252 num_processors=8
(XEN)   _PPC: 0
(XEN) max_freq: 4000000    second_max_freq: 1400000
(XEN) CPU0: Core Boost/Turbo detected and enabled
(XEN) CPU 0 initialization completed
(XEN) Set CPU acpi_id(2) cpuid(1) Px State info:
(XEN)   _PCT: descriptor=130, length=12, space_id=127, bit_width=64,
bit_offset=0, reserved=0, address=3221291106
(XEN)   _PCT: descriptor=130, length=12, space_id=127, bit_width=64,
bit_offset=0, reserved=0, address=0
(XEN)   _PSS: state_count=2
(XEN)   State0: 4000MHz 16362mW 4us 4us 0 0
(XEN)   State1: 1400MHz 3285mW 4us 4us 0x4 0x4
(XEN)   _PSD: num_entries=5 rev=0 domain=0 coord_type=252 num_processors=8
(XEN)   _PPC: 0
(XEN) adding CPU 1
....
--- excerpt from xen-modprobe.log ---

Clearly there is some kind of data corruption going on. Trying to dig
further into this I added some debug output to tools/libxc/xc_pm.c to
output what the hypervisor is returning to xenpm:

--- excerpt from get-cpufreq-para-debug.log ---
root@hostel:~/xen-4.6.0#
./debian/xen-utils-4.6/usr/lib/xen-4.6/bin/xenpm get-cpufreq-para
ret = -11
p_cpufreq->cpu_num  = 8
p_cpufreq->freq_num = 2
p_cpufreq->gov_num  = 4
ret = 0
p_cpufreq->cpu_num  = 8
p_cpufreq->freq_num = 2
p_cpufreq->gov_num  = 4
cpu id               : 0
affected_cpus        : 0 1 2 3 4 5 6 7
cpuinfo frequency    : max [4000000] min [1400000] cur [1400000]
scaling_driver       : Unknown
scaling_avail_gov    : userspace performance powersave ondemand
current_governor     : ondemand
  ondemand specific  :
    sampling_rate    : max [10000000] min [10000] cur [20000]
    up_threshold     : 80
scaling_avail_freq   : 4000000 *1400000
scaling frequency    : max [4000000] min [1400000] cur [1400000]
turbo mode           : enabled

ret = -11
p_cpufreq->cpu_num  = 8
p_cpufreq->freq_num = 43
p_cpufreq->gov_num  = 4
ret = 0
p_cpufreq->cpu_num  = 8
p_cpufreq->freq_num = 43
p_cpufreq->gov_num  = 4
cpu id               : 1
affected_cpus        : 0 1 2 3 4 5 6 7
cpuinfo frequency    : max [4000000] min [1400000] cur [1400000]
scaling_driver       : Unknown
scaling_avail_gov    : userspace performance powersave ondemand
current_governor     : ondemand
  ondemand specific  :
    sampling_rate    : max [10000000] min [10000] cur [20000]
    up_threshold     : 80
scaling_avail_freq   : 4000000 *1400000 -1692603520 -755174760 0 41000
3000 44000 8000 -1692187520 0 1000 0 1000 0 1000 0 1000 0 1000 0 1000 0
1000 0 1000 0 1000 0 1000 0 1000 0 0 0 0 0 0 0 0 0 0 0
scaling frequency    : max [4000000] min [1400000] cur [1400000]
turbo mode           : enabled
...
--- excerpt from get-cpufreq-para-debug.log ---

Clearly having 43 P-states on CPU1 while only having 2 on CPU0 seems
bogus. If this were not troubling enough it turns out that this value
varies from invocation to invocation. I can sometimes get 3, sometimes
13, 44, 45, etc...

To further dig into this issue I added some printk outputs into
xen/drivers/acpi/pmstat.c:get_cpufreq_para:

----------------------
    if ( (op->u.get_para.cpu_num  != cpumask_weight(policy->cpus)) ||
         (op->u.get_para.freq_num != pmpt->perf.state_count)    ||
         (op->u.get_para.gov_num  != gov_num) )
    {
        op->u.get_para.cpu_num =  cpumask_weight(policy->cpus);
        op->u.get_para.freq_num = pmpt->perf.state_count;
        op->u.get_para.gov_num  = gov_num;
        printk(KERN_INFO "XEN::get_cpufreq_para returning sizes:\n"
                         "  cpu_num  : %d\n"
                         "  freq_num : %d\n"
                         "  gov_num  : %d\n",
               op->u.get_para.cpu_num,
               op->u.get_para.freq_num,
               op->u.get_para.gov_num);
        return -EAGAIN;
    }

    printk(KERN_INFO "XEN::get_cpufreq_para requesting data sizes:\n"
                     "  cpu_num  : %d\n"
                     "  freq_num : %d\n"
                     "  gov_num  : %d\n",
           op->u.get_para.cpu_num,
           op->u.get_para.freq_num,
           op->u.get_para.gov_num);

    if ( !(affected_cpus = xzalloc_array(uint32_t,
op->u.get_para.cpu_num)) )
        return -ENOMEM;
----------------------

This is the result of running

# xenpm get-cpufreq-para 0:

---------------------
(XEN) XEN::get_cpufreq_para returning sizes:
(XEN)   cpu_num  : 8
(XEN)   freq_num : 2
(XEN)   gov_num  : 4
(XEN) XEN::get_cpufreq_para requesting data sizes:
(XEN)   cpu_num  : 8
(XEN)   freq_num : 2
(XEN)   gov_num  : 4
---------------------

# xenpm get-cpufreq-para 1:

---------------------
(XEN) XEN::get_cpufreq_para returning sizes:
(XEN)   cpu_num  : 8
(XEN)   freq_num : 44
(XEN)   gov_num  : 4
(XEN) XEN::get_cpufreq_para requesting data sizes:
(XEN)   cpu_num  : 8
(XEN)   freq_num : 44
(XEN)   gov_num  : 4
---------------------

Clearly it's the Xen hypervisor that is serving corrupted data back
to dom0.

While xen hypervisor based cpu frequency scaling do seem to work I
am quite worried about the corrupted data being presented by xenpm.
I also wonder if this corrupted data may be related to the rare
and random CPU lock-ups I have seen:

"NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s"

At this point I'm mostly looking for help on how to continue to track
down this issue but if any of you have some solid advice on how to fix
it I would be very grateful :)

Side note 2: This may be an issue in how data is fetched from the kernel
             and hypervisor but it looked like Linux could scale each
             CPU core independently while XEN always keeps all the CPUs
             in the same power state. While I can live with the later
             I think the former would be even better.

Best regards
/Daniel Widenfalk

Follow-up I tried adding xen-acpi-processor to /etc/modules but this resulted in a kernel panic/crash/reset just after setting up kernel variables (but before ifup). I tried juggling the modules list a bit until I suddenly got an EFI/bios notice that it had detected a boot failure and that my settings were probably not correct. I reset my settings to "optimized default" and lo and behold, not only does it boot and I can boot-time modprobe xen-acpi-processor I also got three additional P-states available to me! Fiddling with my bios setting one by one I found that enabling "HPC - High Performance Computing" in the BIOS causes these crashes. I could up-clock and configure everything else as I had before the bios reset except that one setting. /D _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.