[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)



On 09/06/2012 11:18 PM, Rafael J. Wysocki wrote:
> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>> On 09/06/2012 10:04 PM, Rafael J. Wysocki wrote:
>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>> On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
>>>>> On 09/05/2012 03:41 PM, Rafael J. Wysocki wrote:
>>>>>> On Saturday, September 01, 2012, Rafael J. Wysocki wrote:
>>>>>>> On Friday, August 31, 2012, Daniel Lezcano wrote:
>>>>>>>> On 07/24/2012 11:06 PM, Konrad Rzeszutek Wilk wrote:
>>>>>>>>> On Tue, Jul 24, 2012 at 11:12:29PM +0200, Daniel Lezcano wrote:
>>>>>>>>>> Remove the power field as it is not used.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
>>>>>>>>>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>>>>>>>>> Acked.
>>>>>>>> Hi Rafael,
>>>>>>>>
>>>>>>>> I did not see this patch going in. Is it possible to merge it ?
>>>>>>> I think so.  I'll take care of it when I get back from 
>>>>>>> LinuxCon/Plumbers Conf.
>>>>>>> (early next week).
>>>>>> Applied to the linux-next branch of the linux-pm.git tree as v3.7 
>>>>>> material.
>>>>> Thanks Rafael.
>>>>>
>>>>>> Are there any other patches you want me to consider for v3.7?
>>>>> Yes please, I have the per cpu latencies ready to be submitted but I
>>>>> want to do extra testing before. Unfortunately, the linux-pm-next hangs
>>>>> at boot time on my intel dual core (not related to the patchset).
>>>>>
>>>>> I am git bisecting right now.
>>>>
>>>> I found the culprit. This is not related to the linux-pm tree but with
>>>> net-next.
>>>> The following patch introduced the issue.
>>>>
>>>> commit 6bdb7fe31046ac50b47e83c35cd6c6b6160a475d
>>>> Author: Amerigo Wang <amwang@xxxxxxxxxx>
>>>> Date:   Fri Aug 10 01:24:50 2012 +0000
>>>>
>>>>     netpoll: re-enable irq in poll_napi()
>>>>    
>>>>     napi->poll() needs IRQ enabled, so we have to re-enable IRQ before
>>>>     calling it.
>>>>    
>>>>     Cc: David Miller <davem@xxxxxxxxxxxxx>
>>>>     Signed-off-by: Cong Wang <amwang@xxxxxxxxxx>
>>>>     Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
>>>>
>>>> AFAICS, it has been fixed by commit
>>>> 072a9c48600409d72aeb0d5b29fbb75861a06631 which is not yet in linux-pm-next.
>>>
>>> If it is present in the current Linus' tree, you can just pull this one
>>> and merge linux-pm-next into it.  It should merge without conflicts.
>>
>> Ok, thanks.
>>
>>>> I fall into this issue because NETCONSOLE is set, disabling it allowed
>>>> me to go further.
>>>>
>>>> Unfortunately I am facing to some random freeze on the system which
>>>> seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.
>>>>
>>>> Disabling one of them, make the freezes to disappear.
>>>>
>>>> Is it a known issue ?
>>>
>>> Well, there are systems having problems with this configuration, but they
>>> should be exceptional.  What system is that?
>>
>> It is a laptop T61p with a Core 2 Duo T9500. Nothing exceptional I
>> believe. Maybe someone got the same issue ?
> 
> Is it a regression for you?

Yes, I think so. The issue appears between v3.5 and v3.6-rc1.

It is not easy to reproduce but after taking some time to dig, it seems
to appear with this commit:

1e75fa8be9fb61e1af46b5b3b176347a4c958ca1 is the first bad commit
commit 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1
Author: John Stultz <john.stultz@xxxxxxxxxx>
Date:   Fri Jul 13 01:21:53 2012 -0400

    time: Condense timekeeper.xtime into xtime_sec

    The timekeeper struct has a xtime_nsec, which keeps the
    sub-nanosecond remainder.  This ends up being somewhat
    duplicative of the timekeeper.xtime.tv_nsec value, and we
    have to do extra work to keep them apart, copying the full
    nsec portion out and back in over and over.

    This patch simplifies some of the logic by taking the timekeeper
    xtime value and splitting it into timekeeper.xtime_sec and
    reuses the timekeeper.xtime_nsec for the sub-second portion
    (stored in higher res shifted nanoseconds).

    This simplifies some of the accumulation logic. And will
    allow for more accurate timekeeping once the vsyscall code
    is updated to use the shifted nanosecond remainder.

    Signed-off-by: John Stultz <john.stultz@xxxxxxxxxx>
    Reviewed-by: Ingo Molnar <mingo@xxxxxxxxxx>
    Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
    Cc: Richard Cochran <richardcochran@xxxxxxxxx>
    Cc: Prarit Bhargava <prarit@xxxxxxxxxx>
    Link:
http://lkml.kernel.org/r/1342156917-25092-5-git-send-email-john.stultz@xxxxxxxxxx
    Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

:040000 040000 4d6541ac1f6075d7adee1eef494b31a0cbda0934
dc5708bc738af695f092bf822809b13a1da104b6 M      kernel

How to reproduce: with a laptop T61p, with a Core 2 Duo. I boot the
kernel in busybox and wait some minutes before writing something in the
console. At this moment, nothing appears to the console but the
characters are echo'ed several seconds later (could be 1, 5, or 10 secs
or more).

That happens when CONFIG_CPU_IDLE and CONFIG_NO_HZ are set. Disabling
one of them, the issue does not appear.

Thanks
  -- Daniel

-- 
 <http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.