[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6] xenpm: Add get-intel-temp subcommand



Le 12/02/2026 à 11:30, Jan Beulich a écrit :
> On 12.02.2026 11:19, Teddy Astie wrote:
>> Le 09/02/2026 à 15:21, Jan Beulich a écrit :
>>> On 09.02.2026 11:31, Teddy Astie wrote:
>>>> @@ -93,6 +96,7 @@ void show_help(void)
>>>>                "                                           units default 
>>>> to \"us\" if unspecified.\n"
>>>>                "                                           truncates 
>>>> un-representable values.\n"
>>>>                "                                           0 lets the 
>>>> hardware decide.\n"
>>>> +            " get-intel-temp        [cpuid]       get Intel CPU 
>>>> temperature of <cpuid> or all\n"
>>>
>>> Sorry, thinking about it only now: Do we really want to build in the vendor
>>> name to a command? "get-temp" would allow for adding an AMD implementation
>>> later on?
>>
>> AMD CPUs expose a PCI device that can be interacted with to get
>> temperatures; which is then exposed in hwmon interface of Linux. That
>> wouldn't be practical to implement hwmon interfaces in xenpm.
>
> You understand though that AMD only was an example (the most natural one)?
>

Yes, though having a generic name (e.g get-cpu-temp) could suggest that
it is expected to work for all vendors; while it will here only work for
Intel.

>>>> @@ -1354,6 +1358,131 @@ void enable_turbo_mode(int argc, char *argv[])
>>>>                    errno, strerror(errno));
>>>>    }
>>>>
>>>> +static int fetch_dts_temp(xc_interface *xch, uint32_t cpu, bool package, 
>>>> int *temp)
>>>> +{
>>>> +    xc_resource_entry_t entries[] = {
>>>> +        { .idx = package ? MSR_PACKAGE_THERM_STATUS : 
>>>> MSR_IA32_THERM_STATUS },
>>>> +        { .idx = MSR_TEMPERATURE_TARGET },
>>>> +    };
>>>> +    struct xc_resource_op ops = {
>>>> +        .cpu = cpu,
>>>> +        .entries = entries,
>>>> +        .nr_entries = ARRAY_SIZE(entries),
>>>> +    };
>>>> +    int tjmax;
>>>> +
>>>> +    int ret = xc_resource_op(xch, 1, &ops);
>>>> +
>>>> +    switch ( ret )
>>>> +    {
>>>> +    case -1:
>>>> +        /* xc_resource_op returns -1 in out of memory scenarios */
>>>> +        errno = -ENOMEM;
>>>
>>> And xc_resource_op() doesn't itself set / inherit a properly set errno?
>>> We don't want to override what the C library may have set.
>>
>> I'm not sure what to do then. I guess we want to reset errno before
>> entering xc_resource_op (so we won't report stale errno), but we would
>> still need to consider cases where xc_resource_op returns failure
>> without errno having being set ?
>
> Should xc_resource_op() perhaps better be corrected to behave consistently
> wrt the setting of errno?
>

that could be something to consider

>>>> +    case 1:
>>>> +    {
>>>> +        /*
>>>> +         * The CPU doesn't support MSR_TEMPERATURE_TARGET, we assume it's 
>>>> 100
>>>> +         * which is correct aside a few selected Atom CPUs. Check Linux
>>>> +         * kernel's coretemp.c for more information.
>>>> +         */
>>>> +        static bool has_reported_once = false;
>>>> +
>>>> +        if ( !has_reported_once )
>>>> +        {
>>>> +            fprintf(stderr, "MSR_TEMPERATURE_TARGET is not supported, 
>>>> assume "
>>>> +                            "tjmax = 100, readings may be incorrect.\n");
>>>> +            has_reported_once = true;
>>>> +        }
>>>> +
>>>> +        tjmax = 100;
>>>> +        break;
>>>> +    }
>>>> +
>>>> +    case 2:
>>>> +        tjmax = (entries[1].val >> 16) & 0xff;
>>>> +        break;
>>>> +
>>>> +    default:
>>>> +        if ( ret > 0 )
>>>> +        {
>>>> +            fprintf(stderr, "Got unexpected xc_resource_op return value: 
>>>> %d", ret);
>>>> +            errno = -EINVAL;
>>>> +        }
>>>> +        else
>>>> +            errno = ret;
>>>
>>> Why would this be? How do you know "ret" holds a value suitable for putting
>>> in errno?
>>
>> Aside -1 in out of memory situations when xc_resource_op returns -1, in
>> other failure cases, it returns a hypercall (e.g multicall_op) return
>> code, which is supposed to match a errno.
>
> As above - generally all layers want to deal with errno in a consistent
> manner. If bugs in lower layers are too intrusive to fix right away, such
> workarounds will at least want commenting upon (so it is easy to determine
> at what point they can be removed again).
>

I think once xc_resource_op returns consistent values, things would be
simpler to deal with here.

> Jan
>

Teddy


--
Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.