[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Workings/effectiveness of the xen-acpi-processor driver



On Tue, May 01, 2012 at 06:35:45PM -0400, Boris Ostrovsky wrote:
> On 05/01/2012 04:02 PM, Konrad Rzeszutek Wilk wrote:
> >On Thu, Apr 26, 2012 at 06:25:28PM +0200, Stefan Bader wrote:
> >>On 26.04.2012 17:50, Konrad Rzeszutek Wilk wrote:
> >>>On Wed, Apr 25, 2012 at 03:00:58PM +0200, Stefan Bader wrote:
> >>>>Since there have been requests about that driver to get backported into 
> >>>>3.2, I
> >>>>was interested to find out what or how much would be gained by that.
> >>>>
> >>>>The first system I tried was an AMD based one (8 core Opteron 6128@2GHz). 
> >>>>Which
> >>>>was not very successful as the drivers bail out of the init function 
> >>>>because the
> >>>>first call to acpi_processor_register_performance() returns -ENODEV. 
> >>>>There is
> >>>>some frequency scaling when running without Xen, so I need to do some more
> >>>>debugging there.
> 
> I believe this is caused by the somewhat under-enlightened xen_apic_read():
> 
> static u32 xen_apic_read(u32 reg)
> {
>         return 0;
> }
> 
> This results in some data, most importantly
> boot_cpu_physical_apicid, not being set correctly and, in turn,
> causes x86_cpu_to_apicid to be broken.

What is the involvment of x86_cpu_to_apicid to 
acpi_processor_register_performance?
Or is this more of a stab in the dark?

Stefan, one way to debug this is to make the driver be a module and then
configure the /sys/../acpi/debug_level and debug_layer to be 0xffffffff
and try loading the module. It should print out tons of data (And the reason
it returned -Exxx).
> 
> On larger AMD systems boot processor is typically APICID=0x20 (I
> don't have Intel system handy to see how it looks there).
> 
> As a quick and dirty test you can try:
> 
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index edc2448..1f78998 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1781,6 +1781,7 @@ void __init register_lapic_address(unsigned
> long address)
>         }
>         if (boot_cpu_physical_apicid == -1U) {
>                 boot_cpu_physical_apicid  = read_apic_id();
> +               boot_cpu_physical_apicid = 32;
>                 apic_version[boot_cpu_physical_apicid] =
>                          GET_APIC_VERSION(apic_read(APIC_LVR));
>         }
> 
> 
> (Set it to whatever APICID on core0 is, I suspect it won't be zero).
> 
> -boris
> 
> 
> >>>
> >>>Did you back-port the other components - the ones that turn off the native
> >>>frequency scalling?
> >>>
> >>>       provide disable_cpufreq() function to disable the API.
> >>>   xen/acpi-processor: Do not depend on CPU frequency scaling drivers.
> >>>       xen/cpufreq: Disable the cpu frequency scaling drivers from loading
> >>>>
> >>
> >>Yes, here is the full set for reference:
> >>
> >>* xen/cpufreq: Disable the cpu frequency scaling drivers from loading.
> >>* xen/acpi: Remove the WARN's as they just create noise.
> >>* xen/acpi: Fix Kconfig dependency on CPU_FREQ
> >>* xen/acpi-processor: Do not depend on CPU frequency scaling drivers.
> >>* xen/acpi-processor: C and P-state driver that uploads said data to hyper
> >>* provide disable_cpufreq() function to disable the API.
> >
> >And (Linus just pulled it), you also need this one:
> >  df88b2d96e36d9a9e325bfcd12eb45671cbbc937 (xen/enlighten: Disable 
> > MWAIT_LEAF so that acpi-pad won't be loaded.)
> >
> >>
> >>>>The second system was an Intel one (4 core i7 920@xxxxxxx) which was
> >>>>successfully loading the driver. Via xenpm I can see the various 
> >>>>frequencies and
> >>>>also see them being changed. However the cpuidle data out of xenpm looks 
> >>>>a bit odd:
> >>>>
> >>>>#>  xenpm get-cpuidle-states 0
> >>>>Max C-state: C7
> >>>>
> >>>>cpu id               : 0
> >>>>total C-states       : 2
> >>>>idle time(ms)        : 10819311
> >>>>C0                   : transition [00000000000000000001]
> >>>>                        residency  [00000000000000005398 ms]
> >>>>C1                   : transition [00000000000000000001]
> >>>>                        residency  [00000000000010819311 ms]
> >>>>pc3                  : [00000000000000000000 ms]
> >>>>pc6                  : [00000000000000000000 ms]
> >>>>pc7                  : [00000000000000000000 ms]
> >>>>cc3                  : [00000000000000000000 ms]
> >>>>cc6                  : [00000000000000000000 ms]
> >>>>
> >>>>Also gathering samples over 30s does look like only C0 and C1 are used. 
> >>>>This
> >>>
> >>>Yes.
> >>>>might be because C1E support is enabled in BIOS but when looking at the
> >>>>intel_idle data in sysfs when running without a hypervisor will show C3 
> >>>>and C6
> >>>>for the cores. That could have been just a wrong output, so I plugged in 
> >>>>a power
> >>>>meter and compared a kernel running natively and running as dom0 (with and
> >>>>without the acpi-processor driver).
> >>>>
> >>>>Native: 175W
> >>>>dom0:   183W (with only marginal difference between with or without the
> >>>>               processor driver)
> >>>>[yes, the system has a somewhat high base consumption which I attribute 
> >>>>to a
> >>>>ridiculously dimensioned graphics subsystem to be running a text console]
> >>>>
> >>>>This I would take as C3 and C6 really not being used and the frequency 
> >>>>scaling
> >
> >So the other thing I forgot to note is that C3->C6 have a detrimental
> >effect on some Intel boxes with Xen. We haven't figured out exactly which 
> >ones
> >and the bug is definitly in the hypervisor. The bug is that when the CPU 
> >goes in
> >those states the NIC ends up being unresponsive. Its like the interrupts 
> >stopped
> >being ACKed. If I run 'xenpm set-max-cstate 2' the issue disappears.
> >
> >>>
> >>>To go in deeper modes there is also a need to backport a Xen unstable
> >>>hypercall which will allow the kernel to detect the other states besides
> >>>C0-C2.
> >>>
> >>>"XEN_SET_PDC query was implemented in c/s 23783:
> >>>     "ACPI: add _PDC input override mechanism".
> >>>
> >>
> >>I see. There is a kernel patch about enabling MWAIT that refers to that...
> >
> >Were there any special things you ran when checking the output? Just plugging
> >and looking at the results?
> >>
> >>>
> >>>>having no impact on the idle system is not that much surprising. But if 
> >>>>that was
> >>>>true it would also limit the usefulness of the turbo mode which I 
> >>>>understand
> >>>>would also be limited by the c-state of the other cores.
> >>>
> >>>Hm, I should double-check that - but somehow I thought that Xen 
> >>>independetly
> >>>checks for TurboMode and if the P-states are in, then they are activated.
> >
> >I did a bit of checking around and it does seem that is the case. From what
> >I have gathered the TurboMode kicks in when the CPU is C0 mode (which should
> >be obvious), and when the other cores are in anything but C0 mode. And sure
> >enough that seems to be the case. But I can't get the concrete details 
> >whether
> >the "but C0 mode" means that TurboMode will work better if the C mode is 
> >legacy
> >C1, C2, C3 or the CPU C-states (so MWAIT enabled). Trying to find out from
> >Len Brown more details..
> >>>
> >>Turbo mode should be enabled. I had been only looking at a generic overview
> >>about it on Intel site which sounded like it  would make more of a 
> >>difference on
> >>how much one core could get overclocked related to how many cores are active
> >>(and I translated active or not into deeper c-states or not).
> >>Looking at the verbose output of turbostat it seems not to make that much
> >>difference whether 2-4 cores are running. A single core alone could get one 
> >>more
> >>increment in clock stepping. That does not immediately sound a lot. And of
> >>course how much or long the higher clock is used depends on other factors as
> >>well and is not under OS control.
> >>
> >>In the end it is probably quite dynamic and hard to come up with hard facts 
> >>to
> >>prove its value. Though if I can lower the idle power usage by reaching a 
> >>bit
> >>further, that would greatly help to justify the effort and potential risk of
> >>backporting...
> >
> >I understand. I wish I could give you the exact percentage points by which
> >the power usage will drop. But I think the more substantial reason benefit of
> >these patches is performance gains. The ones that Ian Campbell ran and were
> >posted on Phorenix site paint that they are beneficial.
> >
> >>
> >>>>
> >>>>Do I misread the data I see? Or maybe its a known limitation? In case it 
> >>>>is
> >>>>worth doing more research I'll gladly try things and gather more data.
> >>>
> >>>Just missing some patches.
> >>>
> >>>Oh, and this one:
> >>>       xen/acpi: Fix Kconfig dependency on CPU_FREQ
> >>>
> >>>Hmm.. I think a patch disappeared somewhere.
> >
> >That was the one I referenced at the beginning of this email.
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@xxxxxxxxxxxxx
> >http://lists.xen.org/xen-devel
> >
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.