[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/3] mwait-idle: add support for using halt



>>> On 26.03.19 at 22:56, <Brian.Woods@xxxxxxx> wrote:
> On 3/26/19 10:54 AM, Jan Beulich wrote:
>>>>> On 19.03.19 at 17:12, <Brian.Woods@xxxxxxx> wrote:
>>> On 3/15/19 3:37 AM, Jan Beulich wrote:
>>>> Furthermore I'm then once again wondering what the gain is
>>>> over using the ACPI driver: The suggested _CST looks to exactly
>>>> match the data you enter into the table in the later patch. IOW
>>>> my fundamental concern didn't go away yet: As per the name
>>>> of the driver, it shouldn't really need to support HLT (or anything
>>>> other than MWAIT) as an entry method. Hence I think that at
>>>> the very least you need to extend the description of the change
>>>> quite a bit to explain why the ACPI driver is not suitable.
>>>>
>>>> Depending on how this comes out, it may then still be a matter
>>>> of discussing whether, rather than fiddling with mwait-idle, it
>>>> wouldn't be better to have an AMD-specific driver instead. Are
>>>> there any thoughts in similar directions for Linux?
>>>
>>> Because:
>>> #1 getting the ACPI tables from dom0 is either unreliable (PV dom0) or
>>> not possible (PVH dom0).
>>> #2 the changes to the Intel code are minimal.
>>> #3 worse case, Xen thinks it's using CC6 when it's using CC1.  Not
>>> perfect but far from fatal or breaking.
>> 
>> Having thought about this some more, I agree that an AMD-specific
>> driver would likely go too far. However, that's still no reason to fiddle
>> with the mwait-idle one - I think you could as well populate the data
>> as necessary for the ACPI driver to use, removing the dependency
>> on Dom0. After all that driver already knows of all the entry methods
>> you may want/need to use (see acpi_idle_do_entry()).
>> 
> I did a rough example of how that might work and lines of code changed 
> for adding it to cpu_idle was roughly 125.  Seeing as this doesn't 
> compile and doesn't even have comments, I'd say at least 140 lines of 
> code/change (most of those are additive too), a lot of is functionally 
> copied from mwait-idle and how it reads data out of the structures, 
> checks, and populates the cx structures.  The first set of mwait patches 
> is 87 lines changed total.
> 
> I _could_ try and refactor some of the code and get it down from 
> 125-140, but that would most likely make porting changes even harder for 
> mwait-idle.

Well, I was rather thinking about something like the change below,
taking slightly over 100 lines of new code, and not touching
mwait-idle.c at all. Otoh there are a couple of TBDs in there which
may cause the patch to further grow once addressed.

Note that this goes on top of
https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg00089.html
which sadly there still wasn't sufficient feedback on to decide where
to go with the series; all I know is that Andrew (understandably)
doesn't want to see the last patch go in without vendor confirmation
(and I'd be fine to drop that last patch if need be, but this shouldn't
block the earlier patches in the series).

Jan

x86/AMD: make C-state handling independent of Dom0

At least for more recent CPUs, following what BKDG / PPR suggest for the
BIOS to surface via ACPI we can make ourselves independent of Dom0
uploading respective data.

Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
---
TBD: Can we set local_apic_timer_c2_ok to true? I can't seem to find any
     statement in the BKDG / PPR as to whether the LAPIC timer continues
     running in CC6.
TBD: We may want to verify that HLT indeed is configured to enter CC6.
TBD: I guess we could extend this to families older then Fam15 as well.

--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -120,6 +120,8 @@ boolean_param("lapic_timer_c2_ok", local
 
 struct acpi_processor_power *__read_mostly processor_powers[NR_CPUS];
 
+static int8_t __read_mostly vendor_override;
+
 struct hw_residencies
 {
     uint64_t mc0;
@@ -1220,6 +1222,9 @@ long set_cx_pminfo(uint32_t acpi_id, str
     if ( pm_idle_save && pm_idle != acpi_processor_idle )
         return 0;
 
+    if ( vendor_override > 0 )
+        return 0;
+
     print_cx_pminfo(acpi_id, power);
 
     cpu_id = get_cpu_id(acpi_id);
@@ -1292,6 +1297,98 @@ long set_cx_pminfo(uint32_t acpi_id, str
     return 0;
 }
 
+static void amd_cpuidle_init(struct acpi_processor_power *power)
+{
+    unsigned int i, nr = 0;
+    const struct cpuinfo_x86 *c = &current_cpu_data;
+    const unsigned int ecx_req = CPUID5_ECX_EXTENSIONS_SUPPORTED |
+                                 CPUID5_ECX_INTERRUPT_BREAK;
+    const struct acpi_processor_cx *cx = NULL;
+    static const struct acpi_processor_cx fam17[] = {
+        {
+            .type = ACPI_STATE_C1,
+            .entry_method = ACPI_CSTATE_EM_FFH,
+            .address = 0,
+            .latency = 1,
+        },
+        {
+            .type = ACPI_STATE_C2,
+            .entry_method = ACPI_CSTATE_EM_HALT,
+            .latency = 400,
+        },
+    };
+
+    if ( pm_idle_save && pm_idle != acpi_processor_idle )
+        return;
+
+    if ( vendor_override < 0 )
+        return;
+
+    switch ( c->x86 )
+    {
+    case 0x17:
+        if ( cpu_has_monitor && c->cpuid_level >= CPUID_MWAIT_LEAF &&
+             (cpuid_ecx(CPUID_MWAIT_LEAF) & ecx_req) == ecx_req )
+        {
+            cx = fam17;
+            nr = ARRAY_SIZE(fam17);
+            break;
+        }
+        /* fall through */
+    case 0x15:
+    case 0x16:
+        cx = &fam17[1];
+        nr = ARRAY_SIZE(fam17) - 1;
+        break;
+
+    default:
+        vendor_override = -1;
+        return;
+    }
+
+    power->flags.has_cst = true;
+
+    for ( i = 0; i < nr; ++i )
+    {
+        if ( cx[i].type > max_cstate )
+            break;
+        power->states[i + 1] = cx[i];
+        power->states[i + 1].idx = i + 1;
+        power->states[i + 1].target_residency = cx[i].latency * latency_factor;
+    }
+
+    if ( i )
+    {
+        power->count = i + 1;
+        power->safe_state = &power->states[i];
+
+        if ( !vendor_override )
+        {
+            if ( !boot_cpu_has(X86_FEATURE_ARAT) )
+                hpet_broadcast_init();
+
+            if ( !lapic_timer_init() )
+            {
+                vendor_override = -1;
+                cpuidle_init_cpu(power->cpu);
+                return;
+            }
+
+            if ( !pm_idle_save )
+            {
+                pm_idle_save = pm_idle;
+                pm_idle = acpi_processor_idle;
+            }
+
+            dead_idle = acpi_dead_idle;
+
+            vendor_override = 1;
+        }
+    }
+    else
+        vendor_override = -1;
+}
+
 uint32_t pmstat_get_cx_nr(uint32_t cpuid)
 {
     return processor_powers[cpuid] ? processor_powers[cpuid]->count : 0;
@@ -1437,8 +1534,8 @@ static int cpu_callback(
     int rc = 0;
 
     /*
-     * Only hook on CPU_UP_PREPARE because a dead cpu may utilize the info
-     * to enter deep C-state.
+     * Only hook on CPU_UP_PREPARE / CPU_ONLINE because a dead cpu may utilize
+     * the info to enter deep C-state.
      */
     switch ( action )
     {
@@ -1447,6 +1544,12 @@ static int cpu_callback(
         if ( !rc && cpuidle_current_governor->enable )
             rc = cpuidle_current_governor->enable(processor_powers[cpu]);
         break;
+
+    case CPU_ONLINE:
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+             processor_powers[cpu] )
+            amd_cpuidle_init(processor_powers[cpu]);
+        break;
     }
 
     return !rc ? NOTIFY_DONE : notifier_from_errno(rc);



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.