[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 19 Apr 2023 12:26:09 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=bdY0eKT7j39fP0NB5timUkRgE5tyBNYd1dMHw+IPysc=; b=ZOZzybQI+pUQ21Tue3lLrnTlPi1QjcBimA79Sh/jORrq0n+VF7wIO83B8tAGecK7Id9S1FKTXA9y5Zb7ME65fHJJTX8IHfszlvEA3rWteNBms3/vop8mrTxE9rURCnXQuqxysQj1oNqNuswAupsWr3M5whC6BTiCtKoUtxd0VLBujlO74bawnTt6AOptrkqLATNbyhY0e9qC/o4KqrGUxWakvmCkV9I/5nGOuxSo4eqILomQkLzkOCSEMJ41KGlcIoFGRuHGSA8S7kj26bQCHruXQdYY7r8SHTQWqJCTOTwvy0Fu/4Bf26nnYe1U1H+kz6FfY1LzqyTpeu+kO62Dag==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DQ2/0qhIuDSEBTsEExeKhojDQMvhEv00q04JlV1CpAMMIgiYyfV9i/1doNFtar6+SzmSyt7z3vhracfuq6UoLEAcu9OM7CeHWPEnbznUXWm5Wk2gi9iCXNE6Ct20ojci+WMwwzGN4bjqpPWdcZD2fN1TajgPNZHL+rhi3a3cTOPWIHbipTnUrB2d6WlPaHI+lxIrvYP7AYPtlXat/xSPRlLPKjmj11iDc4YqokJ3+DUs5XEiZFL7dU/KBrHDqoKhisB4DLR8OuduNSBylVxjJcN8i6/b0clO0oZ833hL0dOF/Iy6rE0xhI06zDHiEh/3CR4GekAkXC45+k7sl7x5qQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>, Community Manager <community.manager@xxxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 19 Apr 2023 10:26:53 +0000
  • Ironport-data: A9a23:xcatzq8IHu6PCN3YArIDDrUDBH+TJUtcMsCJ2f8bNWPcYEJGY0x3z WYaCjqHOaqIZGT0fIpyYY/k90IGvJeDx98wT1ForiE8E34SpcT7XtnIdU2Y0wF+jCHgZBk+s 5hBMImowOQcFCK0SsKFa+C5xZVE/fjUAOG6UKicYXoZqTZMEE8JkQhkl/MynrlmiN24BxLlk d7pqojUNUTNNwRcawr40Ire7kI+1BjOkGlA5AdmOakb5AS2e0Q9V/rzG4ngdxMUfaEMdgKKb 76r5K20+Grf4yAsBruN+losWhRXKlJ6FVHmZkt+A8BOsDAbzsAB+v9T2M4nQVVWk120c+VZk 72hg3ASpTABZcUgkMxFO/VR/roX0aduoNcrKlDn2SCfItGvn9IBDJyCAWlvVbD09NqbDklKy OERIz8pdyvavOLonJeqUPRtjPo8eZyD0IM34hmMzBn/JNN/GNXoZPyP4tVVmjAtmspJAPDSI dIDbiZiZwjBZBsJPUoLDJU5n6GjgXyXnz9w8QrJ4/ZopTWNilUujdABM/KMEjCObd9SkUuC4 HrP4kzyAw0ANczZwj2Amp6prraXwHKkCNJNSdVU8NZ6hVG13F1MEyYEFkqRkeiJtmm/atlAf hl8Fi0G6PJaGFaQZtv3UgC8oXWElgUBQNcWGOo/gCmtx6zO8kCmD24LZjdbbZots8pebTct0 1qUmdL1FHpqubucRn+H3qeZqyuoPioYJnNEYjULJSMH/t3irYcbnh/JCNF5H8adlcbpEDv9x zSLqikWhLgJi8MPkaKh8jjvnDaEtpXPCAkv6W3/Tm+jqw90eoOhT4ip8kTAq+ZNKp6DSVuMt 2RCnNKRhN3iFrmInS2JBe4KRbeg4q/cNCWG2AEyWZ486z6q5nivO5hK5y1zL1toNcBCfiL1Z EjUukVa45o70GaWUJKbqrmZU6wCpZUM3/y8PhwIRrKiuqRMSTI=
  • Ironport-hdrordr: A9a23:dSBMSqn1lK3qzCxsZUdRxjNct4npDfIK3DAbv31ZSRFFG/Fwwf re5sjz8SWE8Qr5P0tQ/+xoWZPwJk80kKQe3WB/B8bAYOCLgguVxeJZnO/fKl/bak/DH7VmpN 9dmsFFYbWaMbEQt7ee3ODXKbcd6ejC2Ly0g/zT1nJ8JDsaEJ2ILD0UNu9YKCBLrcV9aqbR3a Dz2vZ6
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Apr 19, 2023 at 11:18:38AM +0200, Jan Beulich wrote:
> On 19.04.2023 11:02, Roger Pau Monné wrote:
> > On Wed, Apr 19, 2023 at 09:07:41AM +0200, Jan Beulich wrote:
> >> On 18.04.2023 17:54, Andrew Cooper wrote:
> >>> On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
> >>>> The addition of the flags field in the vcpu_set_singleshot_timer in
> >>>> 505ef3ea8687 is an ABI breakage, as the size of the structure is
> >>>> increased.
> >>>>
> >>>> Remove such field addition and drop the implementation of the
> >>>> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
> >>>> value just inject the timer interrupt.
> >>>>
> >>>> Bump the Xen interface version, and keep the flags field and
> >>>> VCPU_SSHOTTMR_future available for guests using the old interface.
> >>>>
> >>>> Note the removal of the field from the vcpu_set_singleshot_timer
> >>>> struct allows removing the compat translation of the struct.
> >>>>
> >>>> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
> >>>> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >>>> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >>>
> >>> While everything said is true, this isn't the reason to to get rid of
> >>> VCPU_SSHOTTMR_future
> >>>
> >>> It 505ef3ea8687 does appear to have been an ABI break, that's
> >>> incidental.  It dates from 2007 so whatever we have now is the de-facto
> >>> ABI, whether we like it or not.
> >>>
> >>> The reason to delete this is because it is a monumentality and entirely
> >>> stupid idea which should have been rejected outright at the time, and
> >>> the only guest we can find which uses it also BUG_ON()'s in response to
> >>> -ETIME.
> >>
> >> The instance in Linux (up to 4.6) that I could find was
> >>
> >>    BUG_ON(ret != 0 && ret != -ETIME);
> >>
> >> i.e. not really dying when getting back -ETIME. (And if there really was
> >> a BUG_ON(ret) somewhere despite setting the flag, it would be a bug there,
> >> not something to "fix" in Xen.) I'm afraid I also disagree on "stupid
> >> idea" as well as ...
> > 
> > The logic in old Linux is indeed 'fine' in the sense that it doesn't
> > hit a BUG_ON.
> > 
> > The problem we are seeing is that when logdirty is enabled on a guest
> > with 32vCPUs (and without any kind of logdirty hardware assistance)
> > the contention on the p2m lock is so high that by the time
> > VCPUOP_set_singleshot_timer has copied the hypercall data from HVM
> > context the provided timeout has already expired, leading to:
> > 
> > [   65.543736] hrtimer: interrupt took 10817714 ns
> > [   65.514171] CE: xen increased min_delta_ns to 150000 nsec
> > [   65.514171] CE: xen increased min_delta_ns to 225000 nsec
> > [   65.514171] CE: xen increased min_delta_ns to 337500 nsec
> > [   65.566495] CE: xen increased min_delta_ns to 150000 nsec
> > [   65.514171] CE: xen increased min_delta_ns to 506250 nsec
> > [   65.573088] CE: xen increased min_delta_ns to 150000 nsec
> > [   65.572884] CE: xen increased min_delta_ns to 150000 nsec
> > [   65.514171] CE: xen increased min_delta_ns to 759375 nsec
> > [   65.638644] CE: xen increased min_delta_ns to 150000 nsec
> > [   65.566495] CE: xen increased min_delta_ns to 225000 nsec
> > [   65.514171] CE: xen increased min_delta_ns to 1000000 nsec
> > [   65.572884] CE: xen increased min_delta_ns to 225000 nsec
> > [   65.573088] CE: xen increased min_delta_ns to 225000 nsec
> > [   65.630224] CE: xen increased min_delta_ns to 150000 nsec
> > ...
> > 
> > xenrt1062821 login: [   82.752788] CE: Reprogramming failure. Giving up
> > [   82.779470] CE: xen increased min_delta_ns to 1000000 nsec
> > [   82.793075] CE: Reprogramming failure. Giving up
> > [   82.779470] CE: Reprogramming failure. Giving up
> > [   82.821864] CE: xen increased min_delta_ns to 506250 nsec
> > [   82.821864] CE: xen increased min_delta_ns to 759375 nsec
> > [   82.821864] CE: xen increased min_delta_ns to 1000000 nsec
> > [   82.821864] CE: Reprogramming failure. Giving up
> > [   82.856256] CE: Reprogramming failure. Giving up
> > [   84.566279] CE: Reprogramming failure. Giving up
> > [   84.649493] Freezing user space processes ... 
> > [  130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} 
> > (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
> > [  130.604032] Task dump for CPU 14:
> > [  130.604032] swapper/14      R  running task        0     0      1 
> > 0x00000000
> > [  130.604032] Call Trace:
> > [  130.604032]  [<ffffffff90160f5d>] ? 
> > rcu_eqs_enter_common.isra.30+0x3d/0xf0
> > [  130.604032]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> > [  130.604032]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> > [  130.604032]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> > [  130.604032]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> > [  130.604032]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> > [  549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} 
> > (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
> > [  549.655463] Task dump for CPU 26:
> > [  549.655463] swapper/26      R  running task        0     0      1 
> > 0x00000000
> > [  549.655463] Call Trace:
> > [  549.655463]  [<ffffffff90160f5d>] ? 
> > rcu_eqs_enter_common.isra.30+0x3d/0xf0
> > [  549.655463]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> > [  549.655463]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> > [  549.655463]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> > [  549.655463]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> > [  549.655463]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> > [  821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} 
> > (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
> > [  821.888596] Task dump for CPU 26:
> > [  821.888622] swapper/26      R  running task        0     0      1 
> > 0x00000000
> > [  821.888677] Call Trace:
> > [  821.888712]  [<ffffffff90160f5d>] ? 
> > rcu_eqs_enter_common.isra.30+0x3d/0xf0
> > [  821.888771]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> > [  821.888818]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> > [  821.888865]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> > [  821.888917]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> > [  821.888966]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> > 
> > At some point Linux simply gives up trying to reprogram the timer, and
> > that obviously lead to CPU stalls.
> 
> And that's all with old enough Linux then, I suppose?

That's Linux 3.10.

> > Ignoring the VCPU_SSHOTTMR_future flag allows the guest to survive, by
> > not returning ETIME and just injecting the expired interrupt.
> > 
> > Overall I'm not sure how useful VCPU_SSHOTTMR_future is at least when
> > implemented as done currently in Linux.
> > 
> > Instead of trying to reprogram the timer Linux should do the
> > equivalent of self-inject a timer interrupt in order to cope with the
> > fact that the selected timeout has already expired.
> 
> Indeed - that's what I was expecting would be happening. But I didn't
> go check their code ... Yet them getting it wrong still isn't a reason
> to ignore the request, at least not unconditionally. OSes could be
> getting it right, and they could then benefit from the avoided event.

Well, the reason to ignore would be because the introduction of the
flags field and the VCPU_SSHOTTMR_future option did break the ABI.

If we care about that behavior we should introduce a new hypercall,
either that behaves in such way, or that has a flags field in order to
implement it.

> As to "unconditionally": Introducing a per-guest control is likely too
> much overhead for something that, aiui, isn't commonly used (anymore).

No, I don't think any guest I've looked at (Linux, NetBSD, FreeBSD)
use the VCPU_SSHOTTMR_future flag.

> But tying this to a command line option might make sense - engaging it
> shouldn't (hopefully) lead to misbehavior in guests properly using the
> flag, so ought to be okay to enable in a system-wide manner.

I personally don't think we should go to those lengths in order to
keep this behavior, because ignoring VCPU_SSHOTTMR_future (and thus
never returning -ENOTIME) is compatible with the current
implementation.  The Linux implementation shows that even
it's current/past user didn't know how the flag should be used anyway.

As said above, if there's a willingness to have this behavior (which
based on the current public implementations it seems it's not) it can
always be implemented as a separate hypercall.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.