|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Slow (higher scheduling latency?) system after S3 - regression 4.17 -> 4.19 ?
On 27.03.2026 13:47, Marek Marczykowski-Górecki wrote: > Some further observation, this time regarding timers: > > Just before S3: > > (XEN) [ 150.264757] Dumping timer queues: > (XEN) [ 150.265181] CPU00: > (XEN) [ 150.265497] ex= 736us timer=ffff82d040643540 > cb=drivers/char/xhci-dbc.c#dbc_uart_poll(ffff82d0404e5e78) > (XEN) [ 150.266590] ex= 460377us timer=ffff82d04065d2c0 > cb=arch/x86/time.c#time_calibration(0000000000000000) > (XEN) [ 150.267636] ex= 1377460us timer=ffff82d04065d220 > cb=arch/x86/time.c#plt_overflow(0000000000000000) > (XEN) [ 150.268651] ex= 13525100us timer=ffff82d040644c20 > cb=arch/x86/cpu/mcheck/intel-nonfatal.c#mce_work_fn(0000000000000000) > (XEN) [ 150.269820] CPU02: > (XEN) [ 150.270126] ex= 616653us timer=ffff830461931070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461931000) > (XEN) [ 150.271276] CPU04: > (XEN) [ 150.271577] ex= 630577us timer=ffff830461935070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461935000) > (XEN) [ 150.272734] ex= 634586us timer=ffff830461921070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461921000) > (XEN) [ 150.273884] CPU06: > (XEN) [ 150.274184] ex= 44474us timer=ffff830461905070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461905000) > (XEN) [ 150.275334] CPU08: > (XEN) [ 150.275633] ex= 11465us timer=ffff83046190d070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff83046190d000) > (XEN) [ 150.276783] CPU09: > (XEN) [ 150.277083] ex= 12465us timer=ffff830461909070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461909000) > (XEN) [ 150.278236] CPU10: > (XEN) [ 150.278536] ex= 14464us timer=ffff830461901070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461901000) > (XEN) [ 150.279686] CPU11: > (XEN) [ 150.279986] ex= 15464us timer=ffff830461911070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461911000) > (XEN) [ 150.281139] CPU12: > (XEN) [ 150.281438] ex= 17463us timer=ffff830461919070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461919000) > (XEN) [ 150.282592] CPU13: > (XEN) [ 150.282890] ex= 347090us timer=ffff830461941070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461941000) > (XEN) [ 150.284040] CPU14: > (XEN) [ 150.284341] ex= 20462us timer=ffff83046191d070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff83046191d000) > (XEN) [ 150.285496] CPU15: > (XEN) [ 150.285806] ex= 470us timer=ffff83046192d070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff83046192d000) > (XEN) [ 150.286972] ex= 15316us timer=ffff8304619cdaa0 > cb=drivers/cpufreq/cpufreq_ondemand.c#do_dbs_timer(ffff8304619cdae0) > (XEN) [ 150.288156] ex= 6230us timer=ffff8304619e8ed0 > cb=common/sched/core.c#s_timer_fn(0000000000000000) > (XEN) [ 150.289192] CPU16: > (XEN) [ 150.289492] ex= 633594us timer=ffff830461929070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461929000) > (XEN) [ 150.290642] CPU18: > (XEN) [ 150.290942] ex= 614601us timer=ffff830461939070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461939000) > (XEN) [ 150.292094] CPU20: > (XEN) [ 150.292394] ex= 27459us timer=ffff830461915070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461915000) > (XEN) [ 150.293553] ex= 55316us timer=ffff830461979aa0 > cb=drivers/cpufreq/cpufreq_ondemand.c#do_dbs_timer(ffff830461979ae0) > (XEN) [ 150.294744] ex= 35413us timer=ffff83046198fdc0 > cb=common/sched/core.c#s_timer_fn(0000000000000000) > (XEN) [ 150.295776] CPU21: > (XEN) [ 150.296076] ex= 617580us timer=ffff830461925070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461925000) > > Just after S3: > > (XEN) [ 164.462198] Dumping timer queues: > (XEN) [ 164.462630] CPU00: > (XEN) [ 164.462949] ex= 768us timer=ffff82d040643540 > cb=drivers/char/xhci-dbc.c#dbc_uart_poll(ffff82d0404e5e78) > (XEN) [ 164.464064] ex= 1286532us timer=ffff82d04065d220 > cb=arch/x86/time.c#plt_overflow(0000000000000000) > (XEN) [ 164.465099] ex= 395997us timer=ffff830461921070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461921000) > (XEN) [ 164.466262] ex= 15327843us timer=ffff82d040644c20 > cb=arch/x86/cpu/mcheck/intel-nonfatal.c#mce_work_fn(0000000000000000) > (XEN) [ 164.467461] CPU02: > (XEN) [ 164.467773] ex= 37756us timer=ffff830461931070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461931000) > (XEN) [ 164.468953] CPU04: > (XEN) [ 164.469267] ex= 940042us timer=ffff830461935070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461935000) > (XEN) [ 164.470444] CPU06: > (XEN) [ 164.470759] ex= 485896us timer=ffff830461905070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461905000) > (XEN) [ 164.471937] CPU08: > (XEN) [ 164.472251] ex= 459720us timer=ffff83046190d070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff83046190d000) > (XEN) [ 164.473430] CPU09: > (XEN) [ 164.473745] ex= 1563081us timer=ffff830461909070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461909000) > (XEN) [ 164.474922] CPU10: > (XEN) [ 164.475237] ex= 948204us timer=ffff830461901070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461901000) > (XEN) [ 164.476417] CPU11: > (XEN) [ 164.476732] ex= 948071us timer=ffff830461911070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461911000) > (XEN) [ 164.477912] CPU12: > (XEN) [ 164.478219] ex= 892163us timer=ffff830461919070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461919000) > (XEN) [ 164.479398] CPU13: > (XEN) [ 164.479713] ex= 308429us timer=ffff830461941070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461941000) > (XEN) [ 164.480893] CPU14: > (XEN) [ 164.481208] ex= 954493us timer=ffff83046191d070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff83046191d000) > (XEN) [ 164.482386] CPU15: > (XEN) [ 164.482700] ex= 1487692us timer=ffff83046192d070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff83046192d000) > (XEN) [ 164.483881] CPU16: > (XEN) [ 164.484195] ex= 22890us timer=ffff830461929070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461929000) > (XEN) [ 164.485373] CPU18: > (XEN) [ 164.485688] ex= 85902us timer=ffff830461939070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461939000) > (XEN) [ 164.486864] CPU20: > (XEN) [ 164.487171] ex= 895us timer=ffff830461915070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461915000) > (XEN) [ 164.488358] ex= 17895us timer=ffff830461979aa0 > cb=drivers/cpufreq/cpufreq_ondemand.c#do_dbs_timer(ffff830461979ae0) > (XEN) [ 164.489569] ex= 2914us timer=ffff83046198fdc0 > cb=common/sched/core.c#s_timer_fn(0000000000000000) > (XEN) [ 164.490633] CPU21: > (XEN) [ 164.490947] ex= 461154us timer=ffff830461925070 > cb=common/sched/core.c#vcpu_singleshot_timer_fn(ffff830461925000) > > In the latter case, I see only one cb=common/sched/core.c#s_timer_fn > (two of them before). And also only one > cb=drivers/cpufreq/cpufreq_ondemand.c#do_dbs_timer. Let's start with this one. It's active when the ondemand governor is in use, and when a CPU isn't (in the process of being put into) a C-state. There being only two instances means the system was reasonably idle at the time you dumped the timer queues ahead of entering S3, and just slightly more idle after getting back out of S3. I've just checked my Skylake: Without it ever having gone through an S3 cycle, there's only one instance there when the system is running nothing but an "xl dmesg". If you place enough load on the system to keep all CPUs busy, more instances of the timer should appear. How many of them it'll be does, however, also depend on the scope of the policy (supplied by firmware). The topology ... > Is the s_timer_fn related to runqueues? I have two of them: > > (XEN) [ 11.751480] Adding cpu 0 to runqueue 0 > (XEN) [ 11.753580] First cpu on runqueue, activating > (XEN) [ 11.755731] Adding cpu 2 to runqueue 0 > (XEN) [ 11.757818] Adding cpu 4 to runqueue 0 > (XEN) [ 11.759891] Adding cpu 6 to runqueue 0 > (XEN) [ 11.761955] Adding cpu 8 to runqueue 0 > (XEN) [ 11.764001] Adding cpu 9 to runqueue 0 > (XEN) [ 11.766029] Adding cpu 10 to runqueue 0 > (XEN) [ 11.768045] Adding cpu 11 to runqueue 0 > (XEN) [ 11.770050] Adding cpu 12 to runqueue 1 > (XEN) [ 11.772033] First cpu on runqueue, activating > (XEN) [ 11.774051] Adding cpu 13 to runqueue 1 > (XEN) [ 11.776047] Adding cpu 14 to runqueue 1 > (XEN) [ 11.777993] Adding cpu 15 to runqueue 1 > (XEN) [ 11.779913] Adding cpu 16 to runqueue 1 > (XEN) [ 11.781872] Adding cpu 18 to runqueue 1 > (XEN) [ 11.783912] Adding cpu 20 to runqueue 1 > (XEN) [ 11.785855] Adding cpu 21 to runqueue 1 ... of this system looks somewhat "interesting", judging from these CPU numbers. Assuming the system was (supposedly) equally idle before and after S3, the absence of the 2nd timer(s?) may indicate that some (background?) load that was there before S3 didn't re-appear afterwards. Such background load could be something which monitors and tunes the system. (Yet as indicated, I'm looking at the CPUfreq part only right now. I'm kind of hoping that a scheduler maintainer may look at the other aspect.) > Could the issue be related to those timer differences? Possibly. For the CPUfreq aspect you could simply turn that off to see if it makes a difference. Or you could switch to a governor different from the ondemand one. With the ondemand governor, does the pattern (will require multiple runs) change across S3 that "xenpm get-cpufreq-states" produces? For example, does load move towards CPU0? (Thinking of it, did you check whether interrupt affinities change across S3? More going to CPU0 could also be part of the problem, for example.) Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |