[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/vcpu: remove vcpu_set_singleshot_timer flags field


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 19 Apr 2023 11:02:53 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=OwvsNZ5/ZkN+OMBd1mx4Zzs6EhYGMZawkAZsJAaZYQY=; b=H4jpzKhCZfDR2nfzCbzysO1fhw0BMxVTp3fP+hrBy9Z9gOyqmbFZyy89DUUqNn8lzcigHPresMCpocrCEVdqK6wDu0k31P9XRzWvVN2R81c5kd3i7J8QquCFtW30DKZb7IonhuSbaNWcxxdPeLiCoTdd2+it4WNqHZcK39Q91XibLI6QpKipp/R0LgfXAqun3tl9GRXMb5UkR/7lFPX1V5v2Cn5RNfcIVKfazdHuJxYY8QA2pejZhuyuKTGvr5ETN1YI5yfvgO0Th2/Mp9BCgTc4fyMxKGYk1UB4C3aIuDE0e9jxkh30YBxmwIay2ip0ak7XLP1528WFYfJPcRodSA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XqrLH3mLaVC/lmc9z5JQEeBNY0t2dO111OkQWB4aa3bFkK2JNWYMslHlCSS+zB1HHaB8WI6dz3V+rukf+lRCaWirF9hLWpjNL1usRDyYMBCyCANrlsP9IIY0cmUED1WkEmNX+oroqxHOxBqaTKZpkGbRCarXT7H9sPzuc9n1rYqi10kaYmc4kQXN7DiUgyOu7AFBu02zGb9wOxmG1dZNDRYmRVOxPSP1SgF5zUrNz61+rZEOjjZGu17XM68/dL4ppjkoSk0CQs4K1oo/DCoe/JuXNdl3o+BOHPgPyQRhRgKb3feMAjDH0y+IAnKFtopdc3fkkGaCx0OHlgtS6GUZtg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Henry Wang <Henry.Wang@xxxxxxx>, Community Manager <community.manager@xxxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 19 Apr 2023 09:03:23 +0000
  • Ironport-data: A9a23:qcxRZaJg9PZUoio7FE+RVZQlxSXFcZb7ZxGr2PjKsXjdYENSgWAEx jZMDz/VbvyOZzT0LYolO9yzo0kB78DWm9BhG1BlqX01Q3x08seUXt7xwmUcnc+xBpaaEB84t ZV2hv3odp1coqr0/0/1WlTZhSAgk/rOHvykU7Ss1hlZHWdMUD0mhQ9oh9k3i4tphcnRKw6Ws Jb5rta31GWNglaYCUpJrfPSwP9TlK6q4mhA4gVuPasjUGL2zBH5MrpOfcldEFOgKmVkNrbSb /rOyri/4lTY838FYj9yuu+mGqGiaue60Tmm0hK6aYD76vRxjnVaPpIAHOgdcS9qZwChxLid/ jnvWauYEm/FNoWU8AgUvoIx/ytWZcWq85efSZSzXFD6I+QrvBIAzt03ZHzaM7H09c52KEMXy KUjMgsEVR263tCwnKyDENNj05FLwMnDZOvzu1lG5BSAVbMKZM6GRK/Ho9hFwD03m8ZCW+7EY NYUYiZuaxKGZABTPlAQC9Q1m+LAanvXKmUE7g7K4/dnpTGLlGSd05C0WDbRUsaNSshP2F6Ru 0rN/njjAwFcP9uaodaA2iv02LOfxXylB+r+EpW287lgmFKI2FYVBR5MUVSfsMSig2OhDoc3x 0s8v3BGQbIJ3E6hQ8T5Xha4iGWZpRNaUN1Ve8Ua5QeX2+zr6gCWLmEeS3hKb9lOnMQxQDk30 F6VjpXsDDpmv7CPYWKQ8K+OqjG/MjRTKnUNDQcGUA8E7t/LsIw1yBXVQb5LC7Wph9f4HTXxx TGiryUkgbgXy8kR2M2T4lTvkz+q4J/TQWYd9gjRG26o8A59TIqkfJCzr0jW6+5aK4SURUXHu 2IL8/Vy98gLBJCJ0SCIHuMEGejx4+7faWWEx1lyA5Mm6jKhvWa5epxd6y1/I0EvNdsYfTjuY wnYvgY5CIJvAUZGpJRfO+qZY/nGB4C6fTg5fpg4tuZzX6U=
  • Ironport-hdrordr: A9a23:AxV+Z6PXBORv6cBcTuGjsMiBIKoaSvp037BL7TESdfUxSKalfq +V8cjzuSWZtN9pYgBFpTniAtjifZq/z/9ICOAqVN+ftW/d11dAR7sD0WKN+VPd8hrFh4tgPP dbGJSW0OeAdmSSV/yKmTVQzuxQp+VvLJrY/ds2EU0dNz2DBMlbnmFENjo=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Apr 19, 2023 at 09:07:41AM +0200, Jan Beulich wrote:
> On 18.04.2023 17:54, Andrew Cooper wrote:
> > On 18/04/2023 4:42 pm, Roger Pau Monne wrote:
> >> The addition of the flags field in the vcpu_set_singleshot_timer in
> >> 505ef3ea8687 is an ABI breakage, as the size of the structure is
> >> increased.
> >>
> >> Remove such field addition and drop the implementation of the
> >> VCPU_SSHOTTMR_future flag.  If a timer provides an expired timeout
> >> value just inject the timer interrupt.
> >>
> >> Bump the Xen interface version, and keep the flags field and
> >> VCPU_SSHOTTMR_future available for guests using the old interface.
> >>
> >> Note the removal of the field from the vcpu_set_singleshot_timer
> >> struct allows removing the compat translation of the struct.
> >>
> >> Fixes: 505ef3ea8687 ('Add flags field to VCPUOP_set_singlsehot_timer.')
> >> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > 
> > While everything said is true, this isn't the reason to to get rid of
> > VCPU_SSHOTTMR_future
> > 
> > It 505ef3ea8687 does appear to have been an ABI break, that's
> > incidental.  It dates from 2007 so whatever we have now is the de-facto
> > ABI, whether we like it or not.
> > 
> > The reason to delete this is because it is a monumentality and entirely
> > stupid idea which should have been rejected outright at the time, and
> > the only guest we can find which uses it also BUG_ON()'s in response to
> > -ETIME.
> 
> The instance in Linux (up to 4.6) that I could find was
> 
>       BUG_ON(ret != 0 && ret != -ETIME);
> 
> i.e. not really dying when getting back -ETIME. (And if there really was
> a BUG_ON(ret) somewhere despite setting the flag, it would be a bug there,
> not something to "fix" in Xen.) I'm afraid I also disagree on "stupid
> idea" as well as ...

The logic in old Linux is indeed 'fine' in the sense that it doesn't
hit a BUG_ON.

The problem we are seeing is that when logdirty is enabled on a guest
with 32vCPUs (and without any kind of logdirty hardware assistance)
the contention on the p2m lock is so high that by the time
VCPUOP_set_singleshot_timer has copied the hypercall data from HVM
context the provided timeout has already expired, leading to:

[   65.543736] hrtimer: interrupt took 10817714 ns
[   65.514171] CE: xen increased min_delta_ns to 150000 nsec
[   65.514171] CE: xen increased min_delta_ns to 225000 nsec
[   65.514171] CE: xen increased min_delta_ns to 337500 nsec
[   65.566495] CE: xen increased min_delta_ns to 150000 nsec
[   65.514171] CE: xen increased min_delta_ns to 506250 nsec
[   65.573088] CE: xen increased min_delta_ns to 150000 nsec
[   65.572884] CE: xen increased min_delta_ns to 150000 nsec
[   65.514171] CE: xen increased min_delta_ns to 759375 nsec
[   65.638644] CE: xen increased min_delta_ns to 150000 nsec
[   65.566495] CE: xen increased min_delta_ns to 225000 nsec
[   65.514171] CE: xen increased min_delta_ns to 1000000 nsec
[   65.572884] CE: xen increased min_delta_ns to 225000 nsec
[   65.573088] CE: xen increased min_delta_ns to 225000 nsec
[   65.630224] CE: xen increased min_delta_ns to 150000 nsec
...

xenrt1062821 login: [   82.752788] CE: Reprogramming failure. Giving up
[   82.779470] CE: xen increased min_delta_ns to 1000000 nsec
[   82.793075] CE: Reprogramming failure. Giving up
[   82.779470] CE: Reprogramming failure. Giving up
[   82.821864] CE: xen increased min_delta_ns to 506250 nsec
[   82.821864] CE: xen increased min_delta_ns to 759375 nsec
[   82.821864] CE: xen increased min_delta_ns to 1000000 nsec
[   82.821864] CE: Reprogramming failure. Giving up
[   82.856256] CE: Reprogramming failure. Giving up
[   84.566279] CE: Reprogramming failure. Giving up
[   84.649493] Freezing user space processes ... 
[  130.604032] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected 
by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
[  130.604032] Task dump for CPU 14:
[  130.604032] swapper/14      R  running task        0     0      1 0x00000000
[  130.604032] Call Trace:
[  130.604032]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[  130.604032]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[  130.604032]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[  130.604032]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[  130.604032]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[  130.604032]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
[  549.654536] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected 
by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
[  549.655463] Task dump for CPU 26:
[  549.655463] swapper/26      R  running task        0     0      1 0x00000000
[  549.655463] Call Trace:
[  549.655463]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[  549.655463]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[  549.655463]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[  549.655463]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[  549.655463]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[  549.655463]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
[  821.888478] INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected 
by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
[  821.888596] Task dump for CPU 26:
[  821.888622] swapper/26      R  running task        0     0      1 0x00000000
[  821.888677] Call Trace:
[  821.888712]  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
[  821.888771]  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
[  821.888818]  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
[  821.888865]  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
[  821.888917]  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
[  821.888966]  [<ffffffff900000d5>] ? start_cpu+0x5/0x14

At some point Linux simply gives up trying to reprogram the timer, and
that obviously lead to CPU stalls.

Ignoring the VCPU_SSHOTTMR_future flag allows the guest to survive, by
not returning ETIME and just injecting the expired interrupt.

Overall I'm not sure how useful VCPU_SSHOTTMR_future is at least when
implemented as done currently in Linux.

Instead of trying to reprogram the timer Linux should do the
equivalent of self-inject a timer interrupt in order to cope with the
fact that the selected timeout has already expired.

> > It can literally only be used to shoot yourself in the foot with, and
> > more recent Linuxes have dropped their use of it.
> 
> ... this: If used correctly, it can avoid injection of a pointless event.
> Clearly the Linux change dropping use of the flag indicates that its use
> wasn't correct (anymore?), likely because of not properly dealing with
> -ETIME up the call stack. I'm willing to trust Jeremy / Keir that at the
> time of its introduction such a problem didn't exist.

I'm afraid Linux didn't implement this properly originally, as it
attempted to reprogram the timer with a bigger timeout rather than
just doing the equivalent of self-injecting a timer interrupt.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.