[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] cpufreq implementation for OMAP under xen hypervisor.

I've intensively discussed my suggestions here and now it is transparent to me that we should not try to use Cpufreq on ARM SoCs without direct 1:1 pcpu:vcpu mapping in dom0. So if someone want to break 1:1 mapping he should forget Cpufreq.

With best regards,

On Wed, Sep 10, 2014 at 12:58 AM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote:
On Tue, 9 Sep 2014, Vitaly Chernooky wrote:
> Hi All!
> On Fri, Sep 5, 2014 at 12:56 AM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>Â Â Â ÂOn Thu, 4 Sep 2014, Oleksandr Dmytryshyn wrote:
>Â Â Â Â> Hi to all.
>Â Â Â Â>
>Â Â Â Â> I want to implement cpufreq driver in the next way:
>Â Â Â Â> 1. Cpufreq governor will be implemented in the Xen
>Â Â Â Â> 2. dom0 will only change cpu frequency and voltage of the physical cpus
>Â Â Â Â>
>Â Â Â Â> But there are some nuances:
>Â Â Â Â> 1. dom0 driver should read an information about operation points
>Â Â Â Â> (frequencies and voltages) and cpu supply source from the device tree for each
>Â Â Â Â> physical cpu. In the omap processor case this driver suspects that
>Â Â Â Â> those settings
>Â Â Â Â> located in the /cpus/cpu@0/ node. But hypervisor creates an cpu node
>Â Â Â Â> for each vcpu
>Â Â Â Â> for kernel dom0 in the device tree and those information is lost in the dom0.
>Â Â Â Â> 2. What about this case if we will have some physical cpus with different
>Â Â Â Â> operation points (for example 2 cpus) and we give only one cpu for dom0?
>Â Â Â Â>
>Â Â Â Â> How should I transfer all information from the original cpu@xxxxxx@n nodes
>Â Â Â Â> about all physical cpus to the kernel dom0 driver? Maybe an additional
>Â Â Â Â> nodes should be created by the hypervisor in the device tree for dom0
>Â Â Â Â> and named as pcpu@xxxxxxx@n?
>Â Â Â ÂIf we do that, wouldn't we require changes to the core OMAP drivers or
>Â Â Â Âcpu initialization code in Linux (to parse "pcpu" instead of "cpu"
>   Ânodes)? I don't expect they would be easy to upstream or maintain going
>Â Â Â Âforward.
>Â Â Â ÂI am trying to think of an alternative, such as passing the real cpu
>Â Â Â Ânodes to dom0 but then adding status = "disabled", but I am not sure
>Â Â Â Âwhether Linux checks the status for cpu nodes. In addition this scheme
>Â Â Â Âwouldn't support the case where dom0 has more vcpus than pcpus on the
>Â Â Â Âsystem. Granted it is not very common and might even be detrimental for
>Â Â Â Âperformances, but we should be able to support it.
> In case where dom0 has more vcpus than pcpus on the
> system, the dom0 kernel is the most bug-prone place for pcpu cpufreq governor. So I still believe that separate driver
> domain with direct 1:1 vcpu:pcpu mapping is the best place for cpufreq governor. But it also reasonable to run cpufreq
> governor as userspace daemon in dom0.
> Also what do you think about PM QoS support? On bare metal cpufreq is tightly integrated with PM QoS and intensively
> cooperate in frequency scaling.

Device PM needs to be done in Dom0.
CPU an Platform level PM architecturally belongs to Xen, but I do
understand that to do that in Xen we would need to add lots of code to
the hypervisor. There is no silver bullet here.

A driver domain with 1:1 vcpu:pcpu mapping could work, but what kernel
are you going to use for that? Linux? Wouldn't Linux be too big for a
cpufreq driver domain, especially in embedded deployments? I think it
would need at least 32MB to run.

> With best regards,
> Â
>Â Â Â ÂIan, what do you think about this?
>Â Â Â Â> Oleksandr Dmytryshyn | Product Engineering and Development
>Â Â Â Â> GlobalLogic
>Â Â Â Â> M +38.067.382.2525
>Â Â Â Â> www.globallogic.com
>Â Â Â Â>
>Â Â Â Â> http://www.globallogic.com/email_disclaimer.txt
>Â Â Â Â>
>Â Â Â Â>
>Â Â Â Â> On Tue, Sep 2, 2014 at 9:46 PM, Andrii Tseglytskyi
>Â Â Â Â> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
>Â Â Â Â> >
>Â Â Â Â> > Hi Stefano,
>Â Â Â Â> >
>Â Â Â Â> > Thank you for explanation.
>Â Â Â Â> > I think this requires more and deeper investigation, but for sure dom0
>Â Â Â Â> > must be able to do this.
>Â Â Â Â> > Let us investigate this.
>Â Â Â Â> >
>Â Â Â Â> > Thank you,
>Â Â Â Â> >
>Â Â Â Â> > Regards,
>Â Â Â Â> > Andrii
>Â Â Â Â> >
>Â Â Â Â> > On Tue, Sep 2, 2014 at 9:39 PM, Stefano Stabellini
>Â Â Â Â> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>Â Â Â Â> > > On Tue, 2 Sep 2014, Andrii Tseglytskyi wrote:
>Â Â Â Â> > >> On Tue, Sep 2, 2014 at 4:00 AM, Stefano Stabellini
>Â Â Â Â> > >> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>Â Â Â Â> > >> > On Fri, 29 Aug 2014, Andrii Tseglytskyi wrote:
>Â Â Â Â> > >> >> Hi,
>Â Â Â Â> > >> >>
>Â Â Â Â> > >> >> Stefano, Ian,
>Â Â Â Â> > >> >>
>Â Â Â Â> > >> >> Could you please clarify the following point:
>Â Â Â Â> > >> >>
>Â Â Â Â> > >> >> I agree that decision about frequency change should be taken by Xen
>Â Â Â Â> > >> >> hypervisor. But what about hardware frequency changing?
>Â Â Â Â> > >> >> In general when frequency changed to bigger value (for example from 1
>Â Â Â Â> > >> >> GHz to 1.5 GHz) for ARM kernels sequence looks like the following:
>Â Â Â Â> > >> >>
>Â Â Â Â> > >> >> 1) cpufreq governor decides that frequency should be changed. This
>Â Â Â Â> > >> >> decision is taken after analysing of CPU performance data taking in
>Â Â Â Â> > >> >> account governor policy.
>Â Â Â Â> > >> >> 2) cpufreq governor asks cpufreq driver about new frequency.
>Â Â Â Â> > >> >> 3) cpufreq driver compares current and target frequencies and asks
>Â Â Â Â> > >> >> cpufreq regulator about voltage change.
>Â Â Â Â> > >> >> 4) cpufreq regulator send i2c command to standalone microchip, which
>Â Â Â Â> > >> >> is responsible for voltage changing.
>Â Â Â Â> > >> >> 5) cpufreq driver asks clock framework about new frequency for CPU clock
>Â Â Â Â> > >> >> 6) clock framework performs frequency sanity checks, taking in account
>Â Â Â Â> > >> >> clock parents and clock divider settings, and call platform specific
>Â Â Â Â> > >> >> "set_frequency" callback.
>Â Â Â Â> > >> >> 7) platform specific callback performs proper HW registers
>Â Â Â Â> > >> >> configuration for newly selected frequency
>Â Â Â Â> > >> >>
>Â Â Â Â> > >> >> Also there are some special cases - for example for OMAP5+ when
>Â Â Â Â> > >> >> frequency is changed to 1.5 GHz+, two additional HW IPs should be
>Â Â Â Â> > >> >> triggered (ABB and DCC, if someone is familiar with OMAP5+ )
>Â Â Â Â> > >> >>
>Â Â Â Â> > >> >> So, for generic ARM kernel we have 3 entities to change frequency:
>Â Â Â Â> > >> >>
>Â Â Â Â> > >> >> - cpufreq governor
>Â Â Â Â> > >> >> - cpufreq driver
>Â Â Â Â> > >> >> - cpufreq regulator
>Â Â Â Â> > >> >>
>Â Â Â Â> > >> >> + 2 additional IP for OMAP5+
>Â Â Â Â> > >> >> - ABB
>Â Â Â Â> > >> >> - DCC
>Â Â Â Â> > >> >>
>Â Â Â Â> > >> >> Taking in account all above, it looks like it would be better to
>Â Â Â Â> > >> >> implement only Xen cpufreq governor. Xen will take a decision about
>Â Â Â Â> > >> >> new frequency, and kernel dom0 will perform other steps. Dom0 contains
>Â Â Â Â> > >> >> all generic and platform specific frameworks, needed for frequency
>Â Â Â Â> > >> >> changing.
>Â Â Â Â> > >> >>
>Â Â Â Â> > >> >> What do you think ?
>Â Â Â Â> > >> >
>Â Â Â Â> > >> > Keep in mind that the architecture must be able to handle the case where
>Â Â Â Â> > >> > dom0 has only 1 or 2 vcpus on a 4 or 8 cores system with multiple
>Â Â Â Â> > >> > physical cpus.
>Â Â Â Â> > >> > Could dom0 change the frequency of a physical core or a physical cpu is
>Â Â Â Â> > >> > not even running on? If that is not a problem, because cpus and
>Â Â Â Â> > >> > frequency changing are decoupled enough in Linux to allow it, then I am
>Â Â Â Â> > >> > OK with it. But I suspect they are not.
>Â Â Â Â> > >> >
>Â Â Â Â> > >>
>Â Â Â Â> > >> Not sure that I got your point correctly - dom0 will change frequency
>Â Â Â Â> > >> on physical CPU.
>Â Â Â Â> > >> And in case of OMAP - this changing affects on both ARM physical cpus
>Â Â Â Â> > >> - changing is coupled.
>Â Â Â Â> > >> In case of other ARM platforms - changing may be not coupled (I've
>Â Â Â Â> > >> heard that Snapdragon can change cpu freqs independently on each
>Â Â Â Â> > >> physical cpu)
>Â Â Â Â> > >
>Â Â Â Â> > > Let me explain with a concrete example.
>Â Â Â Â> > >
>Â Â Â Â> > > Let's suppose that the platform has 2 physical cpus, each cpu has 4
>   Â> > > cores. Let's also supposed that dom0 has only 2 vcpus, currently
>Â Â Â Â> > > running on core0 and core1 of cpu0.
>Â Â Â Â> > >
>Â Â Â Â> > > In this case would dom0 be able to change the frequency of core3 of
>Â Â Â Â> > > cpu1, given that is not even running on it?
>Â Â Â Â> > > If it can be done without any hacks, then we can go ahead with this
>Â Â Â Â> > > approach.
>Â Â Â Â> >
>Â Â Â Â> >
>Â Â Â Â> >
>Â Â Â Â> > --
>Â Â Â Â> >
>Â Â Â Â> > Andrii Tseglytskyi | Embedded Dev
>Â Â Â Â> > GlobalLogic
>Â Â Â Â> > www.globallogic.com
>Â Â Â Â>
>Â Â Â Â_______________________________________________
>Â Â Â ÂXen-devel mailing list
>Â Â Â ÂXen-devel@xxxxxxxxxxxxx
>Â Â Â Âhttp://lists.xen.org/xen-devel
> --
> Vitaly Chernooky |ÂSenior Developer - Product Engineering and Development
> GlobalLogic
> P +380.44.4929695 ext.1136 M +380.98.7920568 S cvv_2k
> www.globallogic.com
> http://www.globallogic.com/email_disclaimer.txt

Vitaly Chernooky |ÂSenior Developer - Product Engineering and Development
P +380.44.4929695 ext.1136 M +380.98.7920568 S cvv_2k

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.