[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] xen/vcpu: ignore VCPU_SSHOTTMR_future


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 3 May 2023 12:59:32 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=t1WAO9qDhaUZ+sqChhfahS0xja8lxmtg8lYiM5KDuO4=; b=FptxSn79Q4ffQXiK8i7+2SP0UtZVEUf8RLoq+D4wLHVV5ufQ8R5Xw0Y6k8OTLcR53P5og9ZpfGo+iETbDFuiFtOs4vJsOcIqgV77pTdTwhkvca407wEJZmJGqR0XxtPLUsgi6XEQ9W6pxIhwcMpHTuWr5exaCjvfFK2iMGGdsUo0qW4Cb8z2wC/6oRD+b+ED1P7F7OBDkiex1hHp3A8Hg6vL/9ffkCSDSX+1vId7BS/ik+RQDhWkj+vIk4V6IFZH2ECAX70Mczwxn2NnqQMBMzWRN+xdcAKA+Fa9XN796Pmk+4Jr6l9udVAjl5MBAR3fbjcg8yDcPr8JR6JW2+N7ww==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CxZ3/7zv6RX6YN8Ziz4xBdR50HNBzdunqDCzWnXipIgzdnCImQsRD4qTrfgMZw7teuQnjEfk4u3qLIDFL1B/fwaRi/WISIjsqOYzj9tOteiS2Y4j0AMgvgPgzqfYYkpPkye103biQ0fbGWcwLgJhzp2K5FvBlO+TmfdUjbia6qrEf4iFvziDY9qp5JGsK923GBy/AzobxbXMjfK7EMrujjydOK0PKBH4DA/JrOFbb90VYz+zB1hUAeJ3uZdTymec6QEcbKVf7QfGLSYKCVKHIKDI9ONB1MLGY378j/QEiNODGDJ4PgFMMf8nDIWXTc1vWYw/rXoKfFsEcLjCtjlhSQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Henry Wang <Henry.Wang@xxxxxxx>, Community Manager <community.manager@xxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 03 May 2023 10:59:54 +0000
  • Ironport-data: A9a23:dgedVqM8IBHZRmrvrR0hlsFynXyQoLVcMsEvi/4bfWQNrUom1zAHx zdLWGGGMv2JYTbxc4x2YIjkph5TvpLRyYIwQQto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CU6jufQAOKnUoYoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGj9SuvPrRC9H5qyo42tF5gdmP5ingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0uF1E39op P05E3MMYA+y2rqRnbOaZtA506zPLOGzVG8ekldJ6GiDSNwAEdXESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PdxujaDpOBy+OGF3N79YNuFSN8Thk+Fj mnH4374ElcRM9n3JT+tqyr837eTxHyqMG4UPIWZ5vJ3vUzI/XRJAVofc2uj++i602frDrqzL GRRoELCt5Ma9kamU938VB2Qu2Ofs1gXXN84O8037hucjJXd5QmxD3IBCDVGbbQOv8gzQCEs1 0OY2dbgAzVgvae9WX+b7q2Trz65JW4SN2BqTS0ZSQoI5fHzrYd1iQjAJv54C7K8hNDxHTD2w hiJoTI4irFVitQEv42k+XjXjjTqoYLGJiYl6wOSUm+74wdRYI++e5fu+VXd9exHLouSUh+Gp ndspiSFxOUHDJXInirdRuwIReut/6zcbm2ahkNzFZ488Tjr42SkYY1b/DB5IgFuL9oAfjjqJ kTUvGu9+aNuAZdjVocvC6rZNijg5fGI+QjNPhwMUudzXw==
  • Ironport-hdrordr: A9a23:NsTmwal+0o1KkjCV5pE1VCV4g3LpDfII3DAbv31ZSRFFG/Fwwf re+cjzsiWE9Ar5OUtQ5OxoXZPrfZqyz+8T3WB8B8bFYOCighrSEGgA1+rfKl/baknDH4dmvM 8KAstD4Z/LfCBHZK7BkWuF+r0bsaC6Gc6T5ds3uh1WPHtXgmJbgzuRyDz3LqS7fmZ77FMCeq ah2g==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, May 03, 2023 at 12:38:59PM +0200, Jan Beulich wrote:
> On 19.04.2023 16:31, Roger Pau Monne wrote:
> > The usage of VCPU_SSHOTTMR_future in Linux prior to 4.7 is bogus.
> > When the hypervisor returns -ETIME (timeout in the past) Linux keeps
> > retrying to setup the timer with a higher timeout instead of
> > self-injecting a timer interrupt.
> > 
> > On boxes without any hardware assistance for logdirty we have seen HVM
> > Linux guests < 4.7 with 32vCPUs give up trying to setup the timer when
> > logdirty is enabled:
> > 
> > CE: Reprogramming failure. Giving up
> > CE: xen increased min_delta_ns to 1000000 nsec
> > CE: Reprogramming failure. Giving up
> > CE: Reprogramming failure. Giving up
> > CE: xen increased min_delta_ns to 506250 nsec
> > CE: xen increased min_delta_ns to 759375 nsec
> > CE: xen increased min_delta_ns to 1000000 nsec
> > CE: Reprogramming failure. Giving up
> > CE: Reprogramming failure. Giving up
> > CE: Reprogramming failure. Giving up
> > Freezing user space processes ...
> > INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, 
> > t=60002 jiffies, g=4006, c=4005, q=14130)
> > Task dump for CPU 14:
> > swapper/14      R  running task        0     0      1 0x00000000
> > Call Trace:
> >  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> >  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> >  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> >  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> >  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> >  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> > INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, 
> > t=60002 jiffies, g=6922, c=6921, q=7013)
> > Task dump for CPU 26:
> > swapper/26      R  running task        0     0      1 0x00000000
> > Call Trace:
> >  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> >  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> >  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> >  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> >  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> >  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> > INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, 
> > t=60002 jiffies, g=8499, c=8498, q=7664)
> > Task dump for CPU 26:
> > swapper/26      R  running task        0     0      1 0x00000000
> > Call Trace:
> >  [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
> >  [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
> >  [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
> >  [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
> >  [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
> >  [<ffffffff900000d5>] ? start_cpu+0x5/0x14
> > 
> > Thus leading to CPU stalls and a broken system as a result.
> > 
> > Workaround this bogus usage by ignoring the VCPU_SSHOTTMR_future in
> > the hypervisor.  Old Linux versions are the only ones known to have
> > (wrongly) attempted to use the flag, and ignoring it is compatible
> > with the behavior expected by any guests setting that flag.
> > 
> > Note the usage of the flag has been removed from Linux by commit:
> > 
> > c06b6d70feb3 xen/x86: don't lose event interrupts
> > 
> > Which landed in Linux 4.7.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > Acked-by: Henry Wang <Henry.Wang@xxxxxxx> # CHANGELOG
> 
> A little hesitantly, but since no-one else appears to show any interest:
> Acked-by: Jan Beulich <jbeulich@xxxxxxxx>

Thanks.

Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.