[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/arm: introduce vwfi parameter



On 20/02/17 18:43, Stefano Stabellini wrote:
> On Mon, 20 Feb 2017, Dario Faggioli wrote:
>> On Sun, 2017-02-19 at 21:34 +0000, Julien Grall wrote:
>>> Hi Stefano,
>>>
>>> I have CCed another ARM person who has more knowledge than me on 
>>> scheduling/power.
>>>
>> Ah, when I saw this, I thought you were Cc-ing my friend Juri, which
>> also works there, and is doing that stuff. :-)
>>
>>>> In both cases the vcpus is not run until the next slot, so I don't
>>>> think
>>>> it should make the performance worse in multi-vcpus scenarios. But
>>>> I can
>>>> do some tests to double check.
>>>
>>> Looking at your answer, I think it would be important that everyone
>>> in 
>>> this thread understand the purpose of WFI and how it differs with
>>> WFE.
>>>
>>> The two instructions provides a way to tell the processor to go in 
>>> low-power state. It means the processor can turn off power on some
>>> parts 
>>> (e.g unit, pipeline...) to save energy.
>>>
>> [snip]
>>>
>>> For both instruction it is normal to have an higher latency when 
>>> receiving an interrupt. When a software is using them, it knows that 
>>> there will have an impact, but overall it will expect some power to
>>> be 
>>> saved. Whether the current numbers are acceptable is another
>>> question.
>>>
>> Ok, thanks for these useful information. I think I understand the idea
>> behind these two instructions/mechanisms now.
>>
>> What (I think) Stefano is proposing is providing the user (of Xen on
>> ARM) with a way of making them behave differently.
> 
> That's right. It's not always feasible to change the code of the guest
> the user is running. Maybe she cannot, or maybe she doesn't want to for
> other reasons. Keep in mind that the developer of the operating system
> in this example might have had very different expectations of irq
> latency, given that, even with wfi, is much lower on native. 
> 
> When irq latency is way more important than power consumption to the
> user (think of a train, or an industrial machine that needs to move
> something in a given amount of time), this option provides value to her
> at very little maintenance cost on our side.
> 
> Of course, even if we introduce this option, by no mean we should stop
> improving the irq latency in the normal cases.
> 
> 
>> Whether good or bad, I've expressed my thoughts, and it's your call in
>> the end. :-)
>> George also has a fair point, though. Using yield is a quick and *most
>> likely* effective way of achieving Linux's "idle=poll", but at the same
>> time, a rather rather risky one, as it basically means the final
>> behavior would relay on how yield() behave on the specific scheduler
>> the user is using, which may vary.
>>
>>> Now, regarding what you said. Let's imagine the scheduler is 
>>> descheduling the vCPU until the next slot, it will run the vCPU
>>> after 
>>> even if no interrupt has been received. 
>>>
>> There really are no slots. There sort of are in Credit1, but preemption
>> can happen inside a "slot", so I wouldn't call them such in there too.
>>
>>> This is a real waste of power 
>>> and become worst if an interrupt is not coming for multiple slot.
>>>
>> Undeniable. :-)
> 
> Of course. But if your app needs less than 3000ns of latency, then it's
> the only choice.
> 
> 
>>> In the case of multi-vcpu, the guest using wfi will use more slot
>>> than 
>>> it was doing before. This means less slot for vCPUs that actually
>>> have 
>>> real work to do. 
>>>
>> No, because it continuously yields. So, yes indeed there will be higher
>> scheduling overhead, but no stealing of otherwise useful computation
>> time. Not with the yield() implementations we have right now in the
>> code.
>>
>> But I'm starting to think that we probably better make a step back from
>> withing deep inside the scheduler, and think, first, whether or not
>> having something similar to Linux's idle=poll is something we want, if
>> only for testing, debugging, or very specific use cases.
>>
>> And only then, if the answer is yes, decide how to actually implement
>> it, whether or not to use yield, etc.
> 
> I think we want it, if the implementation is small and unintrusive.

But surely we want it to be per-domain, not system-wide?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.