[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] CPU frequency throttling based on the temperature



On Thu, 2019-07-25 at 17:34 +0200, Roger Pau Monné wrote:
> On Thu, Jul 25, 2019 at 02:31:40PM +0000, Jan Beulich wrote:
> > On 25.07.2019 16:17, Roger Pau Monné  wrote:
> > > On Thu, Jul 25, 2019 at 01:59:22PM +0000, Jan Beulich wrote:
> > > > On 25.07.2019 15:47, Roger Pau Monné  wrote:
> > > > > On Thu, Jul 25, 2019 at 09:29:01AM -0400, Fredy P. wrote:
> > > > > > On Thu, 2019-07-25 at 15:13 +0200, Roger Pau Monné wrote:
> > > > > > > On Thu, Jul 25, 2019 at 12:54:46PM +0000, Jan Beulich
> > > > > > > wrote:
> > > > > > > > On 25.07.2019 14:44,  Fredy P.  wrote:
> > > > > > > > > On Wed, 2019-07-24 at 17:41 +0200, Roger Pau Monné
> > > > > > > > > wrote:
> > > > > > > > > > > > What hardware interface does thermald (or the
> > > > > > > > > > > > driver in
> > > > > > > > > > > > Linux if
> > > > > > > > > > > > there's one) use to get the temperature data?
> > > > > > > > > 
> > > > > > > > > In our initial POC using Xen 4.8.x we where using
> > > > > > > > > Linux coretemp
> > > > > > > > > driver
> > > > > > > > > reading by example
> > > > > > > > > /class/sys/hwmon/hwmon0/temp3_input but it got
> > > > > > > > > deprecated at commit
> > > > > > > > > 72e038450d3d5de1a39f0cfa2d2b0f9b3d43c6c6
> > > > > > > > 
> > > > > > > > Hmm, I wouldn't call this deprecation, but a
> > > > > > > > regression. I would
> > > > > > > > say we want to re-expose this leaf to Dom0, the more
> > > > > > > > that the
> > > > > > > > commit also only mentions unprivileged domains. Andrew?
> > > > > > > 
> > > > > > > AFAICT from the documents provided by Fredy the
> > > > > > > temperature is read
> > > > > > > from a MSR that reports the current temperature of the
> > > > > > > core on which
> > > > > > > the MSR is read from. When running on Xen this will only
> > > > > > > work
> > > > > > > correctly if dom0 is given the same vCPUs as pCPUs and
> > > > > > > those are
> > > > > > > identity pinned.
> > > > > > 
> > > > > > I just want to be sure I got it correctly, by saying "When
> > > > > > running on
> > > > > > Xen this will only work correctly if ..." means in a future
> > > > > > implementation or that right now could work if I pin this
> > > > > > v/pCPUS?
> > > > > 
> > > > > No, right now there's no way to get this data from dom0,
> > > > > regardless of
> > > > > the pinning.
> > > > 
> > > > Of course you can, using the MSR "device" Linux optionally
> > > > provides (plus perhaps the rdmsr utility from the msr-tools
> > > > package).
> > > 
> > > But you won't get coherent results, since the vCPU might be
> > > jumping
> > > from pCPU to pCPU, thus returning values from multiple different
> > > pCPUs
> > > regardless of whether all rdmsr have been executed from the same
> > > vCPU
> > > from dom0 PoV.
> > 
> > I don't understand. Earlier you said "regardless of the pinning".
> > That's what my response was to, i.e. I was implying vCPU-s to be
> > pinned.
> 
> Oh sorry, that was me not taking into account the earlier context,
> you
> are right. To summarize and make things easier for Fredy I think the
> options are:
> 
>  - Create dom0 with vCPUs == pCPUs and identity pin them. Then you
>    *could* expose CPUID leaf 6 to dom0 and things should be OK IMO,
>    either when using the Linux driver or when reading values directly
>    from user-space using the MSR device pointed out by Jan.
> 
>  - Modify the Linux thermal driver to report the temperature for all
>    pCPUs (which might be different than dom0 vCPUs) using
>    XENPF_resource_op and XEN_RESOURCE_OP_MSR_READ. AFAICT you will
>    also need to expose CPUID leaf 6 to dom0 so that the thermal
> driver
>    attaches.
> 
>  - Import a thermal driver into Xen and expose the thermal data
>    somewhere, ie: a XENPF hypercall maybe.
> 
> Maybe someone can come up with more ideas, but there's likely some
> coding to be done in order to get this working.

Thanks Roger, to me first look like a workaround that could save our
project delivery right now. Second look the right path.

I'll continue reading, It looks they are different ways to get thermal
Information form Linux and per driver there are different ways to, to
be more clear:

Thermald (Intel) [1] can work using this kernel drivers:

###############################
Prerequisites:
        Kernel
                Prefers kernel with
                        Intel RAPL power capping driver : Available
from Linux kernel 3.13.rc1
                        Intel P State driver (Available in Linux kernel
stable release)
                        Intel Power clamp driver (Available in Linux
kernel stable release)
                        Intel INT340X drivers
                        Intel RAPL-mmio power capping driver: Available
from 5.3-rc1
##############################

MSR looks like part of RAPL and this other document [2] mention 3 ways
to get information from RAPL driver and mention that this information
is per package (procesor) and not per core:

#############################################
Linux support is via powercap/sysfs (as provided by
drivers/powercap/intel_rapl.c) and perf_event (as provided by
arch/x86/events/intel/rapl.c). Also users often access the relevant MSR
registers directly via the /dev/msr or safe-msr interfaces.

Various values are provided, not all chips support all values. The
"package" value is for one processor package (which may contain many
cores; a system might have multiple packages). The PP0/"cores" value is
power usage by all of the cores in the package (you cannot break down
to individual cores). The PP1 value is the uncore, in non-server chips
this often provides info for the integrated GPU. The "DRAM" value is
for the DRAM in the system. The "Psys" value is the entire SoC (system
on chip). 
#############################################

I'll try to get a more clear understanding and if I have ideas or
solutions I'll bring them here.


[1] https://github.com/intel/thermal_daemon
[2] http://web.eece.maine.edu/~vweaver/projects/rapl/rapl_support.html

> 
> Roger.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxx
> https://lists.xenproject.org/mailman/listinfo/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.