[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 00/31] CPUFreq on ARM



On Mon, Nov 13, 2017 at 5:21 PM, Andre Przywara
<andre.przywara@xxxxxxxxxx> wrote:
> Hi,
Hi Andre

>
> thanks very much for your work on this!
Thank you for your comments.

>
> On 09/11/17 17:09, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
>>
>> Hi, all.
>>
>> The purpose of this RFC patch series is to add CPUFreq support to Xen on ARM.
>> Motivation of hypervisor based CPUFreq is to enable one of the main PM 
>> use-cases in virtualized system powered by Xen hypervisor. Rationale behind 
>> this activity is that CPU virtualization is done by hypervisor and the guest 
>> OS doesn't actually know anything about physical CPUs because it is running 
>> on virtual CPUs. It is quite clear that a decision about frequency change 
>> should be taken by hypervisor as only it has information about actual CPU 
>> load.
>
> Can you please sketch your usage scenario or workloads here? I can think
> of quite different scenarios (oversubscribed server vs. partitioning
> RTOS guests, for instance). The usefulness of CPUFreq and the trade-offs
> in the design are quite different between those.
We keep embedded use-cases in mind. For example, it is a system with
several domains,
where one domain has most critical SW running on and other domain(s)
are, let say, for entertainment purposes.
I think, the CPUFreq is useful where power consumption is a question.

>
> In general I doubt that a hypervisor scheduling vCPUs is in a good
> position to make a decision on the proper frequency physical CPUs should
> run with. From all I know it's already hard for an OS kernel to make
> that call. So I would actually expect that guests provide some input,
> for instance by signalling OPP change request up to the hypervisor. This
> could then decide to act on it - or not.
Each running guest sees only part of the picture, but hypervisor has
the whole picture, it knows all about CPU, measures CPU load and able
to choose required CPU frequency to run on. I am wondering, does Xen
need additional input from guests for make a decision?
BTW, currently guest domain on ARM doesn't even know how many physical
CPUs the system has and what are these OPPs. When creating guest
domain Xen inserts only dummy CPU nodes. All CPU info, such as clocks,
OPPs, thermal, etc are not passed to guest.

>
>> Although these required components (CPUFreq core, governors, etc) already 
>> exist in Xen, it is worth to mention that they are ACPI specific. So, a part 
>> of the current patch series makes them more generic in order to make 
>> possible a CPUFreq usage on architectures without ACPI support in.
>
> Have you looked at how this is used on x86 these days? Can you briefly
> describe how this works and it's used there?
Xen supports CPUFreq feature on x86 [1]. I don't know how widely it is
used at the moment, but it is another question. So, there are two
possible modes: Domain0 based CPUFreq and Hypervisor based CPUFreq
[2]. As I understand, the second option is more popular.
Two different implementations of "Hypervisor based CPUFreq" are
present: ACPI Processor P-States Driver and AMD Architectural P-state
Driver. You can find both them in xen/arch/x86/acpi/cpufreq/ dir.

[1] 
https://wiki.xenproject.org/wiki/Xen_power_management#CPU_P-states_.28cpufreq.29
[2] 
https://wiki.xenproject.org/wiki/Xen_power_management#Hypervisor_based_cpufreq

>
>> But, the main question we have to answer is about frequency changing 
>> interface in virtualized system. The frequency changing interface and all 
>> dependent components which needed CPUFreq to be functional on ARM are not 
>> present in Xen these days. The list of required components is quite big and 
>> may change across different ARM SoC vendors. As an example, the following 
>> components are involved in DVFS on Renesas Salvator-X board which has R-Car 
>> Gen3 SoC installed: generic clock, regulator and thermal frameworks, 
>> Vendor’s CPG, PMIC, AVS, THS drivers, i2c support, etc.
>>
>> We were considering a few possible approaches of hypervisor based CPUFreqs 
>> on ARM and came to conclusion to base this solution on popular at the 
>> moment, already upstreamed to Linux, ARM System Control and Power 
>> Interface(SCPI) protocol [1]. We chose SCPI protocol instead of newer ARM 
>> System Control and Management Interface (SCMI) protocol [2] since it is 
>> widely spread in Linux, there are good examples how to use it, the range of 
>> capabilities it has is enough for implementing hypervisor based CPUFreq and, 
>> what is more, upstream Linux support for SCMI is missed so far, but SCMI 
>> could be used as well.
>>
>> Briefly speaking, the SCPI protocol is used between the System Control 
>> Processor(SCP) and the Application Processors(AP). The mailbox feature 
>> provides a mechanism for inter-processor communication between SCP and AP. 
>> The main purpose of SCP is to offload different PM related tasks from AP and 
>> one of the services that SCP provides is Dynamic voltage and frequency 
>> scaling (DVFS), it is what we actually need for CPUFreq. I will describe 
>> this approach in details down the text.
>>
>> Let me explain a bit more what these possible approaches are:
>>
>> 1. “Xen+hwdom” solution.
>> GlobalLogic team proposed split model [3], where “hwdom-cpufreq” frontend 
>> driver in Xen interacts with the “xen-cpufreq” backend driver in Linux hwdom 
>> (possibly dom0) in order to scale physical CPUs. This solution hasn’t been 
>> accepted by Xen community yet and seems it is not going to be accepted 
>> without taking into the account still unanswered major questions and proving 
>> that “all-in-Xen” solution, which Xen community considered as more 
>> architecturally cleaner option, would be unworkable in practice.
>> The other reasons why we decided not to stick to this approach are complex 
>> communication interface between Xen and hwdom: event channel, hypercalls, 
>> syscalls, passing CPU info via DT, etc and possible synchronization issues 
>> with a proposed solution.
>> Although it is worth to mention that the beauty of this approach was that 
>> there wouldn’t be a need to port a lot of things to Xen. All frequency 
>> changing interface and all dependent components which needed CPUFreq to be 
>> functional were already in place.
>
> Stefano, Julien and I were thinking about this: Wouldn't it be possible
> to come up with some hardware domain, solely dealing with CPUFreq
> changes? This could run a Linux kernel, but no or very little userland.
> All its vCPUs would be pinned to pCPUs and would normally not be
> scheduled by Xen. If Xen wants to change the frequency, it schedules the
> respective vCPU to the right pCPU and passes down the frequency change
> request. Sounds a bit involved, though, and probably doesn't solve the
> problem where this domain needs to share access to hardware with Dom0
> (clocks come to mind).
Yes, another question is how to get this Linux kernel stuff (backend,
top level driver, etc) upstreamed.

>
>> Although this approach is not used, still I picked a few already acked 
>> patches which made ACPI specific CPUFreq stuff more generic.
>>
>> 2. “all-in-Xen” solution.
>> This implies that all CPUFreq related stuff should be located in Xen.
>> Community considered this solution as more architecturally cleaner option 
>> than “Xen+hwdom” one. No layering violation comparing with the previous 
>> approach (letting guest OS manage one or more physical CPUs is more of a 
>> layering violation).
>> This solution looks better, but to be honest, we are not in favor of this 
>> solution as well. We expect enormous developing effort to get this support 
>> in (the scope of required components looks unreal) and maintain it. So, we 
>> decided not to stick to this approach as well.
>
> Yes, I even think it's not feasible to implement this. With a modern
> clock implementation there is one driver to control *all* clocks of an
> SoC, so you can't single out the CPU clock easily, for instance. One
> would probably run into synchronisation issues, at best.
>
>> 3. “Xen+SCP(ARM TF)” solution.
>> It is yet another solution based on ARM SCPI protocol. The generic idea here 
>> is that there is a firmware, which being a server runs on some dedicated IP 
>> core (server), provides different PM services (DVFS, sensors, etc). On the 
>> other side there is a CPUFreq driver in Xen, which is running on the AP 
>> (client), consumes these services. CPUFreq driver neither changes the CPU 
>> frequency/voltage by itself nor cooperates with Linux in order to do such 
>> job. It just communicates with SCP directly using SCPI protocol. As I said 
>> before, some integrated into a SoC mailbox IP need to be used for IPC 
>> (doorbell for triggering action and shared memory region for commands). 
>> CPUFreq driver doesn’t even need to know what should be physically changed 
>> for the new frequency to take effect. It is a certainly SCP’s 
>> responsibility. This all avoid CPUFreq infrastructure in Xen on ARM from 
>> diving into each supported SoC internals and as the result having a lot of 
>> code.
>>
>> The possible issue here could be in SCP, the problem is that some dedicated 
>> IP core may be absent at all or performs other than PM tasks. Fortunately, 
>> there is a brilliant solution to teach firmware running in the EL3 exception 
>> level (ARM TF) to perform SCP functions and use SMC calls for communications 
>> [4]. Exactly this transport implementation I want to bring to Xen the first. 
>> Such solution is going to be generic across all ARM platforms that do have 
>> firmware running in the EL3 exception level and don’t have candidate for 
>> being SCP.
>
> While I feel flattered that you like that idea as well ;-), you should
> mention that this requires actual firmware providing those services.
Yes, a some firmware, which provides these services, must be present
on the other end.
It is a firmware which runs on the dedicated IP core(s) in common case.
And it is a firmware which runs on the same core(s) as the hypervisor
in particular case.

> I
> am not sure there is actually *any* implementation of this at the
> moment, apart from my PoC code for Allwinner.
Your PoC is a good example for writing firmware side. So, why don't
use it as a base for
other platform.

> And from a Xen point of view I am not sure we are in the position to
> force users to use this firmware. This may be feasible in a classic
> embedded scenario, where both firmware and software are provided by the
> same entity, but that should be clearly noted as a restriction.
Agree.

>
>> Here we have completely synchronous case because of SMC calls nature. SMC 
>> triggered mailbox driver emulates a mailbox which signals transmitted data 
>> via Secure Monitor Call (SMC) instruction [5]. The mailbox receiver is 
>> implemented in firmware and synchronously returns data when it returns 
>> execution to the non-secure world again. This would allow us both to trigger 
>> a request and transfer execution to the firmware code in a safe and 
>> architected way. Like PSCI requests.
>> As you can see this method is free from synchronization issues. What is 
>> more, this solution is more architecturally cleaner solution than split 
>> model “Xen+hwdom” one. From the security point of view, I hope, everything 
>> will be much more correct since the ARM TF, which we want to see in charge 
>> of controlling CPU frequency/voltage, is a trusted SW layer. Moreover, ARM 
>> TF is responsible for enabling/disabling CPU (PSCI) and nobody complains 
>> about it, so let it do DVFS too.
>
> It should be noted that this synchronous nature of the communication can
> actually be a problem: a DVFS request usually involves regulator and PLL
> changes, which could take some time to settle in. Blocking all of this
> time (milliseconds?) in EL3 (probably busy-waiting) might not be desirable.
Agree. I haven't measured time yet to say how long is it, since I
don't have a working firmware at the moment, just an emulator,
but, yes, it will definitely take some time. The whole system won't be
blocked, only the CPU which performs SMC call.
But, if we ask hwdom to change frequency we will wait too? Or if Xen
manages PLL/regulator by itself, it will wait anyway?

>
>> I have to admit that I have checked this solution only due to a lack of 
>> candidate for being SCP. But, I hope, that other ARM SoCs where dedicated 
>> SCP is present (asynchronous case) will work too, but with some limitations. 
>> The mailbox IPs for these ARM SoCs must have TX/RX-done irqs. I have 
>> described in the corresponding patches why this limitation is present.
>>
>> To be honest I have Renesas R-Car Gen3 SoCs in mind as our nearest target, 
>> but I would like to make this solution as generic as possible. I don’t treat 
>> proposed solution as world-wide generic, but I hope, this solution may be 
>> suitable for other ARM SoCs which meet such requirements. Anyway, having 
>> something which works, but doesn’t cover all cases is better than having 
>> nothing.
>>
>> I would like to notice that the patches are POC state and I post them just 
>> to illustrate in more detail of what I am talking about. Patch series 
>> consist of the following parts:
>> 1. GL’s patches which make ACPI specific CPUFreq stuff more generic. 
>> Although these patches has been already acked by Xen community and the 
>> CPUFreq code base hasn’t changed in a last few years I drop all A-b.
>> 2. A bunch of device-tree helpers and macros.
>> 3. Direct ported SCPI protocol, mailbox infrastructure and the ARM SMC 
>> triggered mailbox driver. All components except mailbox driver are in 
>> mainline Linux.
>
> Why do you actually need this mailbox framework? Actually I just
> proposed the SMC driver the make it fit into the Linux framework. All we
> actually need for SCPI is to write a simple command into some memory and
> "press a button". I don't see a need to import the whole Linux
> framework, especially as our mailbox usage is actually just a corner
> case of the mailbox's capability (namely a "single-bit" doorbell).
> The SMC use case is trivial to implement, and I believe using the Juno
> mailbox is similarly simple, for instance.
I did a direct port for SCPI protocol. I think, it is something that
should be retained as much as possible.
Protocol relies on mailbox feature, so I ported mailbox too. I think,
it would be much more easy for me to just add
a few required commands handling with issuing SMC call and without any
mailbox infrastructure involved.
But, I want to show what is going on and what place these things come from.

What is more, I don't want to restrict a usage of this CPUFreq by only
covering single scenario where a
firmware, which provides DVFS service, is in ARM TF. I hope, that this
solution will be suitable for ARM SoCs where a standalone SCP
is present and real mailbox IP, which has asynchronous nature, is used
for IPC. Of course, this mailbox must have TX/RX-done irqs.
This is a limitation at the moment.

>
>
> So to summarize I think we need to agree on those general questions:
> 1) Shall the Xen hypervisor actually be involved in CPUFreq at all? Can
> this be left to corner-cases like pinned CPUs/guests, where guests
> requests are passed on to the hardware?
> 2) Is EL3/ATF providing SCPI services something we can build on?
> Normally I would expect we write drivers to match existing firmware.
> 3) When we go this way, do we really need to port all of the Linux
> drivers and its framework to Xen? Can't we get away with much simpler
> solutions? In the end all the SMC mailbox driver does it to trigger an
> single SMC call, embedded in a lot of glorious Linux boiler plate code.
>
> What I was *actually* thinking of when using the SMC mailbox approach is
> the ability to provide *virtual* SCPI services to guest, in a generic,
> not-SoC-specific way. The proposed SMC mailbox binding allows using
> *hvc* calls to trigger services, so Xen could pick up DVFS requests from
> guests in a generic way and act upon them.
>
> Cheers,
> Andre.
>
>> 4. Xen changes to direct ported code for making it compilable. These changes 
>> don’t change functionality.
>> 5. Some modification to direct ported code which slightly change 
>> functionality, I would say to restrict it.
>> 6. SCPI based CPUFreq driver and CPUFreq interface component.
>> 7. Misc patches mostly to ARM subsystem.
>> 8. Patch from Volodymyr Babchuk which adds SMC wrapper.
>>
>> Most important TODOs regarding the whole patch series:
>> 1. Handle devm in the direct ported code. Currently, in case of any errors 
>> previously allocated resources are left unfreed.
>> 2. Thermal management integration.
>> 3. Don't pass CPUFreq related nodes to dom0. Xen owns SCPI completely.
>> 4. Handle CPU_TURBO frequencies if they are supported by HW.
>>
>> You can find the whole patch series here:
>> repo: https://github.com/otyshchenko1/xen.git branch: cpufreq-devel1
>>
>> P.S. There is no need to modify xenpm tool. It works out of the box on ARM.
>>
>> [1]
>> Linux code:
>> http://elixir.free-electrons.com/linux/v4.14-rc6/source/drivers/firmware/arm_scpi.c
>> http://elixir.free-electrons.com/linux/v4.14-rc6/source/include/linux/scpi_protocol.h
>> http://elixir.free-electrons.com/linux/v4.14-rc6/source/Documentation/devicetree/bindings/arm/arm,scpi.txt
>>
>> Recent protocol version:
>> http://infocenter.arm.com/help/topic/com.arm.doc.dui0922g/scp_message_interface_v1_2_DUI0922G_en.pdf
>>
>> [2]
>> Xen part:
>> https://lists.xen.org/archives/html/xen-devel/2014-11/msg00940.html
>> Linux part:
>> https://lists.xen.org/archives/html/xen-devel/2014-11/msg00944.html
>>
>> [3]
>> http://infocenter.arm.com/help/topic/com.arm.doc.den0056a/DEN0056A_System_Control_and_Management_Interface.pdf
>>
>> [4]
>> http://linux-sunxi.narkive.com/qYWJqjXU/patch-v2-0-3-mailbox-arm-introduce-smc-triggered-mailbox
>>
>> [5]
>> http://infocenter.arm.com/help/topic/com.arm.doc.den0028b/ARM_DEN0028B_SMC_Calling_Convention.pdf
>>
>> Oleksandr Dmytryshyn (6):
>>   cpufreq: move cpufreq.h file to the xen/include/xen location
>>   pm: move processor_perf.h file to the xen/include/xen location
>>   pmstat: move pmstat.c file to the xen/drivers/pm/stat.c location
>>   cpufreq: make turbo settings to be configurable
>>   pmstat: make pmstat functions more generalizable
>>   cpufreq: make cpufreq driver more generalizable
>>
>> Oleksandr Tyshchenko (24):
>>   xenpm: Clarify xenpm usage
>>   xen/device-tree: Add dt_count_phandle_with_args helper
>>   xen/device-tree: Add dt_property_for_each_string macros
>>   xen/device-tree: Add dt_property_read_u32_index helper
>>   xen/device-tree: Add dt_property_count_elems_of_size helper
>>   xen/device-tree: Add dt_property_read_string_helper and friends
>>   xen/arm: Add driver_data field to struct device
>>   xen/arm: Add DEVICE_MAILBOX device class
>>   xen/arm: Store device-tree node per cpu
>>   xen/arm: Add ARM System Control and Power Interface (SCPI) protocol
>>   xen/arm: Add mailbox infrastructure
>>   xen/arm: Introduce ARM SMC based mailbox
>>   xen/arm: Add common header file wrappers.h
>>   xen/arm: Add rxdone_auto flag to mbox_controller structure
>>   xen/arm: Add Xen changes to SCPI protocol
>>   xen/arm: Add Xen changes to mailbox infrastructure
>>   xen/arm: Add Xen changes to ARM SMC based mailbox
>>   xen/arm: Use non-blocking mode for SCPI protocol
>>   xen/arm: Don't set txdone_poll flag for ARM SMC mailbox
>>   cpufreq: hack: perf->states isn't a real guest handle on ARM
>>   xen/arm: Introduce SCPI based CPUFreq driver
>>   xen/arm: Introduce CPUFreq Interface component
>>   xen/arm: Build CPUFreq components
>>   xen/arm: Enable CPUFreq on ARM
>>
>> Volodymyr Babchuk (1):
>>   arm: add SMC wrapper that is compatible with SMCCC
>>
>>  MAINTAINERS                                  |    4 +-
>>  tools/misc/xenpm.c                           |    6 +-
>>  xen/arch/arm/Kconfig                         |    2 +
>>  xen/arch/arm/Makefile                        |    1 +
>>  xen/arch/arm/arm32/Makefile                  |    1 +
>>  xen/arch/arm/arm32/smc.S                     |   32 +
>>  xen/arch/arm/arm64/Makefile                  |    1 +
>>  xen/arch/arm/arm64/smc.S                     |   29 +
>>  xen/arch/arm/cpufreq/Makefile                |    5 +
>>  xen/arch/arm/cpufreq/arm-smc-mailbox.c       |  248 ++++++
>>  xen/arch/arm/cpufreq/arm_scpi.c              | 1191 
>> ++++++++++++++++++++++++++
>>  xen/arch/arm/cpufreq/cpufreq_if.c            |  522 +++++++++++
>>  xen/arch/arm/cpufreq/mailbox.c               |  562 ++++++++++++
>>  xen/arch/arm/cpufreq/mailbox.h               |   28 +
>>  xen/arch/arm/cpufreq/mailbox_client.h        |   69 ++
>>  xen/arch/arm/cpufreq/mailbox_controller.h    |  161 ++++
>>  xen/arch/arm/cpufreq/scpi_cpufreq.c          |  328 +++++++
>>  xen/arch/arm/cpufreq/scpi_protocol.h         |  116 +++
>>  xen/arch/arm/cpufreq/wrappers.h              |  239 ++++++
>>  xen/arch/arm/smpboot.c                       |    5 +
>>  xen/arch/x86/Kconfig                         |    2 +
>>  xen/arch/x86/acpi/cpu_idle.c                 |    2 +-
>>  xen/arch/x86/acpi/cpufreq/cpufreq.c          |    2 +-
>>  xen/arch/x86/acpi/cpufreq/powernow.c         |    2 +-
>>  xen/arch/x86/acpi/power.c                    |    2 +-
>>  xen/arch/x86/cpu/mwait-idle.c                |    2 +-
>>  xen/arch/x86/platform_hypercall.c            |    2 +-
>>  xen/common/device_tree.c                     |  124 +++
>>  xen/common/sysctl.c                          |    2 +-
>>  xen/drivers/Kconfig                          |    2 +
>>  xen/drivers/Makefile                         |    1 +
>>  xen/drivers/acpi/Makefile                    |    1 -
>>  xen/drivers/acpi/pmstat.c                    |  526 ------------
>>  xen/drivers/cpufreq/Kconfig                  |    3 +
>>  xen/drivers/cpufreq/cpufreq.c                |  102 ++-
>>  xen/drivers/cpufreq/cpufreq_misc_governors.c |    2 +-
>>  xen/drivers/cpufreq/cpufreq_ondemand.c       |    4 +-
>>  xen/drivers/cpufreq/utility.c                |   13 +-
>>  xen/drivers/pm/Kconfig                       |    3 +
>>  xen/drivers/pm/Makefile                      |    1 +
>>  xen/drivers/pm/stat.c                        |  538 ++++++++++++
>>  xen/include/acpi/cpufreq/cpufreq.h           |  245 ------
>>  xen/include/acpi/cpufreq/processor_perf.h    |   63 --
>>  xen/include/asm-arm/device.h                 |    2 +
>>  xen/include/asm-arm/processor.h              |    4 +
>>  xen/include/public/platform.h                |    1 +
>>  xen/include/xen/cpufreq.h                    |  254 ++++++
>>  xen/include/xen/device_tree.h                |  158 ++++
>>  xen/include/xen/pmstat.h                     |    2 +
>>  xen/include/xen/processor_perf.h             |   69 ++
>>  50 files changed, 4822 insertions(+), 862 deletions(-)
>>  create mode 100644 xen/arch/arm/arm32/smc.S
>>  create mode 100644 xen/arch/arm/arm64/smc.S
>>  create mode 100644 xen/arch/arm/cpufreq/Makefile
>>  create mode 100644 xen/arch/arm/cpufreq/arm-smc-mailbox.c
>>  create mode 100644 xen/arch/arm/cpufreq/arm_scpi.c
>>  create mode 100644 xen/arch/arm/cpufreq/cpufreq_if.c
>>  create mode 100644 xen/arch/arm/cpufreq/mailbox.c
>>  create mode 100644 xen/arch/arm/cpufreq/mailbox.h
>>  create mode 100644 xen/arch/arm/cpufreq/mailbox_client.h
>>  create mode 100644 xen/arch/arm/cpufreq/mailbox_controller.h
>>  create mode 100644 xen/arch/arm/cpufreq/scpi_cpufreq.c
>>  create mode 100644 xen/arch/arm/cpufreq/scpi_protocol.h
>>  create mode 100644 xen/arch/arm/cpufreq/wrappers.h
>>  delete mode 100644 xen/drivers/acpi/pmstat.c
>>  create mode 100644 xen/drivers/pm/Kconfig
>>  create mode 100644 xen/drivers/pm/Makefile
>>  create mode 100644 xen/drivers/pm/stat.c
>>  delete mode 100644 xen/include/acpi/cpufreq/cpufreq.h
>>  delete mode 100644 xen/include/acpi/cpufreq/processor_perf.h
>>  create mode 100644 xen/include/xen/cpufreq.h
>>  create mode 100644 xen/include/xen/processor_perf.h
>>


-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.