[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation

To: "peterz@xxxxxxxxxxxxx" <peterz@xxxxxxxxxxxxx>, "Valentin, Eduardo" <eduval@xxxxxxxxxx>
From: "Singh, Balbir" <sblbir@xxxxxxxxxx>
Date: Mon, 13 Jan 2020 11:43:18 +0000
Accept-language: en-GB, en-US
Cc: "konrad.wilk@xxxxxxxxx" <konrad.wilk@xxxxxxxxx>, "x86@xxxxxxxxxx" <x86@xxxxxxxxxx>, "len.brown@xxxxxxxxx" <len.brown@xxxxxxxxx>, "linux-mm@xxxxxxxxx" <linux-mm@xxxxxxxxx>, "pavel@xxxxxx" <pavel@xxxxxx>, "hpa@xxxxxxxxx" <hpa@xxxxxxxxx>, "boris.ostrovsky@xxxxxxxxxx" <boris.ostrovsky@xxxxxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, "fllinden@xxxxxxxxxx" <fllinden@xxxxxxxxxx>, "Kamata, Munehisa" <kamatam@xxxxxxxxxx>, "mingo@xxxxxxxxxx" <mingo@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "axboe@xxxxxxxxx" <axboe@xxxxxxxxx>, "linux-pm@xxxxxxxxxxxxxxx" <linux-pm@xxxxxxxxxxxxxxx>, "Agarwal, Anchal" <anchalag@xxxxxxxxxx>, "bp@xxxxxxxxx" <bp@xxxxxxxxx>, "tglx@xxxxxxxxxxxxx" <tglx@xxxxxxxxxxxxx>, "jgross@xxxxxxxx" <jgross@xxxxxxxx>, "netdev@xxxxxxxxxxxxxxx" <netdev@xxxxxxxxxxxxxxx>, "Woodhouse@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" <Woodhouse@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>, "rjw@xxxxxxxxxxxxx" <rjw@xxxxxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, "vkuznets@xxxxxxxxxx" <vkuznets@xxxxxxxxxx>, "davem@xxxxxxxxxxxxx" <davem@xxxxxxxxxxxxx>, "Woodhouse, David" <dwmw@xxxxxxxxxxxx>, "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>
Delivery-date: Mon, 13 Jan 2020 11:48:25 +0000
Ironport-sdr: R+BlWp6nTHVVhaIhMU2TJ0LB4YSRTLQsATI0sOMiyFM4iSh27yUGPwc+8iW49sp1OvtUIhwKnD k40Idn6O/1kA==
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
Thread-index: AQHVxbSKwN2FtiBCp0yRN06uTqctz6fgl2OAgAN0VQCABF3RgIAAGFiA
Thread-topic: [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation

On Mon, 2020-01-13 at 11:16 +0100, Peter Zijlstra wrote:
> On Fri, Jan 10, 2020 at 07:35:20AM -0800, Eduardo Valentin wrote:
> > Hey Peter,
> > 
> > On Wed, Jan 08, 2020 at 11:50:11AM +0100, Peter Zijlstra wrote:
> > > On Tue, Jan 07, 2020 at 11:45:26PM +0000, Anchal Agarwal wrote:
> > > > From: Eduardo Valentin <eduval@xxxxxxxxxx>
> > > > 
> > > > System instability are seen during resume from hibernation when system
> > > > is under heavy CPU load. This is due to the lack of update of sched
> > > > clock data, and the scheduler would then think that heavy CPU hog
> > > > tasks need more time in CPU, causing the system to freeze
> > > > during the unfreezing of tasks. For example, threaded irqs,
> > > > and kernel processes servicing network interface may be delayed
> > > > for several tens of seconds, causing the system to be unreachable.
> > > > The fix for this situation is to mark the sched clock as unstable
> > > > as early as possible in the resume path, leaving it unstable
> > > > for the duration of the resume process. This will force the
> > > > scheduler to attempt to align the sched clock across CPUs using
> > > > the delta with time of day, updating sched clock data. In a post
> > > > hibernation event, we can then mark the sched clock as stable
> > > > again, avoiding unnecessary syncs with time of day on systems
> > > > in which TSC is reliable.
> > > 
> > > This makes no frigging sense what so bloody ever. If the clock is
> > > stable, we don't care about sched_clock_data. When it is stable you get
> > > a linear function of the TSC without complicated bits on.
> > > 
> > > When it is unstable, only then do we care about the sched_clock_data.
> > > 
> > 
> > Yeah, maybe what is not clear here is that we covering for situation
> > where clock stability changes over time, e.g. at regular boot clock is
> > stable, hibernation happens, then restore happens in a non-stable clock.
> 
> Still confused, who marks the thing unstable? The patch seems to suggest
> you do yourself, but it is not at all clear why.
> 
> If TSC really is unstable, then it needs to remain unstable. If the TSC
> really is stable then there is no point in marking is unstable.
> 
> Either way something is off, and you're not telling me what.
> 

Hi, Peter

For your original comment, just wanted to clarify the following:

1. After hibernation, the machine can be resumed on a different but compatible
host (these are VM images hibernated)
2. This means the clock between host1 and host2 can/will be different

In your comments are you making the assumption that the host(s) is/are the
same? Just checking the assumptions being made and being on the same page with
them.

Balbir Singh.

> > > > Reviewed-by: Erik Quanstrom <quanstro@xxxxxxxxxx>
> > > > Reviewed-by: Frank van der Linden <fllinden@xxxxxxxxxx>
> > > > Reviewed-by: Balbir Singh <sblbir@xxxxxxxxxx>
> > > > Reviewed-by: Munehisa Kamata <kamatam@xxxxxxxxxx>
> > > > Tested-by: Anchal Agarwal <anchalag@xxxxxxxxxx>
> > > > Signed-off-by: Eduardo Valentin <eduval@xxxxxxxxxx>
> > > > ---
> > > 
> > > NAK, the code very much relies on never getting marked stable again
> > > after it gets set to unstable.
> > > 
> > 
> > Well actually, at the PM_POST_HIBERNATION, we do the check and set stable
> > if
> > known to be stable.
> > 
> > The issue only really happens during the restoration path under scheduling
> > pressure,
> > which takes forever to finish, as described in the commit.
> > 
> > Do you see a better solution for this issue?
> 
> I still have no clue what your actual problem is. You say scheduling
> goes wobbly because sched_clock_data is stale, but when stable that
> doesn't matter.
> 
> So what is the actual problem?
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
  - From: Andrew Cooper
- Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
  - From: Peter Zijlstra
- Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
  - From: Rafael J. Wysocki

References:
- [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
  - From: Anchal Agarwal
- Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
  - From: Peter Zijlstra
- Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
  - From: Eduardo Valentin
- Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
  - From: Peter Zijlstra

Prev by Date: [Xen-devel] [qemu-mainline test] 146043: regressions - FAIL
Next by Date: Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
Previous by thread: Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
Next by thread: Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.