[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Xen crashes on ASSERT on suspend/resume, suggested fix


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Stefano Stabellini <stefano.stabellini@xxxxxxx>
  • Date: Wed, 24 May 2023 16:51:03 -0700
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=citrix.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iA7i5SpS5XOBzknTlY79r/o2KC7fLSJ3ksCljteN7JY=; b=XMnVA2t49t0AXh7GFL1EvZSKzKObSwwkMSxAcvqw7GhKzKwsErCMnWOUY2LupuWiB+N7XHBGqnCBDubkiyj12kS7J9kT3qDfVsuizpuTzQFPh1cD0QBn5f3Wsr6qGbdGOnP2I8Drm61uT2AP2BICNUQ/SF6canhs208Gs4JaHM58qad8w7kmIF1icMNqzrY38R4y6+H7ykdvF/bznPhiVyScC0sfWLsBePfgRGontwHO+CmixMiGNdWdYV1ohE60VXw0qFjRMg2lG1myFg42K96QI4+sF4B07UJGNX0pLOqZnXYP8HKf10+qdx9xB5Sfb6g0UmnA4b0EY5l3oglbYA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UG1uQHf1PS6aFhM68Cz+ovRWYTp1o7KBU0gWE5szrB/v0i7Bgm2Fgw+a8S75IBMaAR86Vwu+5KVe1JSzGCb1NUG/fP5BnUHDiwfCOsRXNb385O2fm0QMu01dXWu8G11lv+3Woxvw7Dkf1sh4+a7M8njUbOiwYYCeDNqGla6Fr5p9tCIh38YI0JQ3y1FMh/hIhdYZ9A1pBEFAHYDcL+fYtwnTgwMvf5ymfg5ixe1NIvlbnoG343/46mkMTj7AyVnnjoXY2ZfcYJ8eXtsmqPBY3eRFIYG8pqU1Uo8P03r6t6vkShVXi0lMJcIsbcc0tJlGp6ga3N2omJIPsu679p1skw==
  • Cc: Stefano Stabellini <stefano.stabellini@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>, <jbeulich@xxxxxxxx>, <andrew.cooper3@xxxxxxxxxx>, <xenia.ragiadakou@xxxxxxx>
  • Delivery-date: Wed, 24 May 2023 23:51:28 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, 23 May 2023, Roger Pau Monné wrote:
> On Tue, May 23, 2023 at 03:54:36PM +0200, Roger Pau Monné wrote:
> > On Thu, May 18, 2023 at 04:44:53PM -0700, Stefano Stabellini wrote:
> > > Hi all,
> > > 
> > > After many PVH Dom0 suspend/resume cycles we are seeing the following
> > > Xen crash (it is random and doesn't reproduce reliably):
> > > 
> > > (XEN) [555.042981][<ffff82d04032a137>] R 
> > > arch/x86/irq.c#_clear_irq_vector+0x214/0x2bd
> > > (XEN) [555.042986][<ffff82d04032a74c>] F destroy_irq+0xe2/0x1b8
> > > (XEN) [555.042991][<ffff82d0403276db>] F msi_free_irq+0x5e/0x1a7
> > > (XEN) [555.042995][<ffff82d04032da2d>] F unmap_domain_pirq+0x441/0x4b4
> > > (XEN) [555.043001][<ffff82d0402d29b9>] F 
> > > arch/x86/hvm/vmsi.c#vpci_msi_disable+0xc0/0x155
> > > (XEN) [555.043006][<ffff82d0402d39fc>] F vpci_msi_arch_disable+0x1e/0x2b
> > > (XEN) [555.043013][<ffff82d04026561c>] F 
> > > drivers/vpci/msi.c#control_write+0x109/0x10e
> > > (XEN) [555.043018][<ffff82d0402640c3>] F vpci_write+0x11f/0x268
> > > (XEN) [555.043024][<ffff82d0402c726a>] F 
> > > arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7
> > > (XEN) [555.043029][<ffff82d0402c6682>] F 
> > > hvm_process_io_intercept+0x203/0x26f
> > > (XEN) [555.043034][<ffff82d0402c6718>] F hvm_io_intercept+0x2a/0x4c
> > > (XEN) [555.043039][<ffff82d0402b6353>] F 
> > > arch/x86/hvm/emulate.c#hvmemul_do_io+0x29b/0x5f6
> > > (XEN) [555.043043][<ffff82d0402b66dd>] F 
> > > arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a
> > > (XEN) [555.043048][<ffff82d0402b7bde>] F hvmemul_do_pio_buffer+0x33/0x35
> > > (XEN) [555.043053][<ffff82d0402c7042>] F handle_pio+0x6d/0x1b4
> > > (XEN) [555.043059][<ffff82d04029ec20>] F svm_vmexit_handler+0x10bf/0x18b0
> > > (XEN) [555.043064][<ffff82d0402034e5>] F svm_stgi_label+0x8/0x18
> > > (XEN) [555.043067]
> > > (XEN) [555.469861]
> > > (XEN) [555.471855] ****************************************
> > > (XEN) [555.477315] Panic on CPU 9:
> > > (XEN) [555.480608] Assertion 'per_cpu(vector_irq, cpu)[old_vector] == 
> > > irq' failed at arch/x86/irq.c:233
> > > (XEN) [555.489882] ****************************************
> > > 
> > > Looking at the code in question, the ASSERT looks wrong to me.
> > > 
> > > Specifically, if you see send_cleanup_vector and
> > > irq_move_cleanup_interrupt, it is entirely possible to have old_vector
> > > still valid and also move_in_progress still set, but only some of the
> > > per_cpu(vector_irq, me)[vector] cleared. It seems to me that this could
> > > happen especially when an MSI has a large old_cpu_mask.
> > 
> > i guess the only way to get into such situation would be if you happen
> > to execute _clear_irq_vector() with a cpu_online_map smaller than the
> > one in old_cpu_mask, at which point you will leave old_vector fields
> > not updated.
> > 
> > Maybe somehow you get into this situation when doing suspend/resume?
> > 
> > Could you try to add a:
> > 
> > ASSERT(cpumask_equal(tmp_mask, desc->arch.old_cpu_mask));
> > 
> > Before the `for_each_cpu(cpu, tmp_mask)` loop?
> 
> I see that the old_cpu_mask is cleared in release_old_vec(), so that
> suggestion is not very useful.
> 
> Does the crash happen at specific points, for example just after
> resume or before suspend?

Hi Roger, Jan,

Unfortunately we are all completely blind on this issue. It is not
reproducible. It only happened once. We only have our wits to solve this
problem. However, looking at the codebase I think there are a few
possible races. Here is my analysis and suggested fix.


---
xen/irq: fix races between send_cleanup_vector and _clear_irq_vector

It is possible that send_cleanup_vector and _clear_irq_vector are
running at the same time on different CPUs. In that case we have a race
as both _clear_irq_vector and irq_move_cleanup_interrupt are trying to
clear old_vector.

This patch fixes 3 races:

1) As irq_move_cleanup_interrupt is running on multiple CPUs at the
same time, and also _clear_irq_vector is running, it is possible that
only some per_cpu(vector_irq, cpu)[old_vector] are valid but not all.
So, turn the ASSERT in _clear_irq_vector into an if.

2) It is possible that _clear_irq_vector is running at the same time as
release_old_vec, called from irq_move_cleanup_interrupt. At the moment,
it is possible for _clear_irq_vector to read a valid old_cpu_mask but an
invalid old_vector (because it is being set to invalid by
release_old_vec). To avoid this problem in release_old_vec move clearing
old_cpu_mask before setting old_vector to invalid. This way, we know that
in _clear_irq_vector if old_vector is invalid also old_cpu_mask is zero
and we don't enter the loop.

3) It is possible that release_old_vec is running twice at the same time
for the same old_vector. Change the code in release_old_vec to make it
OK to call it twice. Remove both ASSERTs. With those gone, it should be
possible now to call release_old_vec twice in a row for the same
old_vector.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxx>
---
 xen/arch/x86/irq.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 20150b1c7f..d9c139cf1b 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -112,16 +112,11 @@ static void release_old_vec(struct irq_desc *desc)
 {
     unsigned int vector = desc->arch.old_vector;
 
-    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
     cpumask_clear(desc->arch.old_cpu_mask);
+    desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
 
-    if ( !valid_irq_vector(vector) )
-        ASSERT_UNREACHABLE();
-    else if ( desc->arch.used_vectors )
-    {
-        ASSERT(test_bit(vector, desc->arch.used_vectors));
+    if ( desc->arch.used_vectors )
         clear_bit(vector, desc->arch.used_vectors);
-    }
 }
 
 static void _trace_irq_mask(uint32_t event, int irq, int vector,
@@ -230,9 +225,11 @@ static void _clear_irq_vector(struct irq_desc *desc)
 
         for_each_cpu(cpu, tmp_mask)
         {
-            ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq);
-            TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu);
-            per_cpu(vector_irq, cpu)[old_vector] = ~irq;
+            if ( per_cpu(vector_irq, cpu)[old_vector] == irq )
+            {
+                TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu);
+                per_cpu(vector_irq, cpu)[old_vector] = ~irq;
+            }
         }
 
         release_old_vec(desc);
-- 
2.25.1

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.