Xen project Mailing List

Re: [RFC] Xen crashes on ASSERT on suspend/resume, suggested fix

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

From: Stefano Stabellini <stefano.stabellini@xxxxxxx>

Date: Wed, 24 May 2023 16:51:03 -0700

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=citrix.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iA7i5SpS5XOBzknTlY79r/o2KC7fLSJ3ksCljteN7JY=; b=XMnVA2t49t0AXh7GFL1EvZSKzKObSwwkMSxAcvqw7GhKzKwsErCMnWOUY2LupuWiB+N7XHBGqnCBDubkiyj12kS7J9kT3qDfVsuizpuTzQFPh1cD0QBn5f3Wsr6qGbdGOnP2I8Drm61uT2AP2BICNUQ/SF6canhs208Gs4JaHM58qad8w7kmIF1icMNqzrY38R4y6+H7ykdvF/bznPhiVyScC0sfWLsBePfgRGontwHO+CmixMiGNdWdYV1ohE60VXw0qFjRMg2lG1myFg42K96QI4+sF4B07UJGNX0pLOqZnXYP8HKf10+qdx9xB5Sfb6g0UmnA4b0EY5l3oglbYA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UG1uQHf1PS6aFhM68Cz+ovRWYTp1o7KBU0gWE5szrB/v0i7Bgm2Fgw+a8S75IBMaAR86Vwu+5KVe1JSzGCb1NUG/fP5BnUHDiwfCOsRXNb385O2fm0QMu01dXWu8G11lv+3Woxvw7Dkf1sh4+a7M8njUbOiwYYCeDNqGla6Fr5p9tCIh38YI0JQ3y1FMh/hIhdYZ9A1pBEFAHYDcL+fYtwnTgwMvf5ymfg5ixe1NIvlbnoG343/46mkMTj7AyVnnjoXY2ZfcYJ8eXtsmqPBY3eRFIYG8pqU1Uo8P03r6t6vkShVXi0lMJcIsbcc0tJlGp6ga3N2omJIPsu679p1skw==

Cc: Stefano Stabellini <stefano.stabellini@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>, <jbeulich@xxxxxxxx>, <andrew.cooper3@xxxxxxxxxx>, <xenia.ragiadakou@xxxxxxx>

Delivery-date: Wed, 24 May 2023 23:51:28 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, 23 May 2023, Roger Pau Monné wrote: > On Tue, May 23, 2023 at 03:54:36PM +0200, Roger Pau Monné wrote: > > On Thu, May 18, 2023 at 04:44:53PM -0700, Stefano Stabellini wrote: > > > Hi all, > > > > > > After many PVH Dom0 suspend/resume cycles we are seeing the following > > > Xen crash (it is random and doesn't reproduce reliably): > > > > > > (XEN) [555.042981][<ffff82d04032a137>] R > > > arch/x86/irq.c#_clear_irq_vector+0x214/0x2bd > > > (XEN) [555.042986][<ffff82d04032a74c>] F destroy_irq+0xe2/0x1b8 > > > (XEN) [555.042991][<ffff82d0403276db>] F msi_free_irq+0x5e/0x1a7 > > > (XEN) [555.042995][<ffff82d04032da2d>] F unmap_domain_pirq+0x441/0x4b4 > > > (XEN) [555.043001][<ffff82d0402d29b9>] F > > > arch/x86/hvm/vmsi.c#vpci_msi_disable+0xc0/0x155 > > > (XEN) [555.043006][<ffff82d0402d39fc>] F vpci_msi_arch_disable+0x1e/0x2b > > > (XEN) [555.043013][<ffff82d04026561c>] F > > > drivers/vpci/msi.c#control_write+0x109/0x10e > > > (XEN) [555.043018][<ffff82d0402640c3>] F vpci_write+0x11f/0x268 > > > (XEN) [555.043024][<ffff82d0402c726a>] F > > > arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7 > > > (XEN) [555.043029][<ffff82d0402c6682>] F > > > hvm_process_io_intercept+0x203/0x26f > > > (XEN) [555.043034][<ffff82d0402c6718>] F hvm_io_intercept+0x2a/0x4c > > > (XEN) [555.043039][<ffff82d0402b6353>] F > > > arch/x86/hvm/emulate.c#hvmemul_do_io+0x29b/0x5f6 > > > (XEN) [555.043043][<ffff82d0402b66dd>] F > > > arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a > > > (XEN) [555.043048][<ffff82d0402b7bde>] F hvmemul_do_pio_buffer+0x33/0x35 > > > (XEN) [555.043053][<ffff82d0402c7042>] F handle_pio+0x6d/0x1b4 > > > (XEN) [555.043059][<ffff82d04029ec20>] F svm_vmexit_handler+0x10bf/0x18b0 > > > (XEN) [555.043064][<ffff82d0402034e5>] F svm_stgi_label+0x8/0x18 > > > (XEN) [555.043067] > > > (XEN) [555.469861] > > > (XEN) [555.471855] **************************************** > > > (XEN) [555.477315] Panic on CPU 9: > > > (XEN) [555.480608] Assertion 'per_cpu(vector_irq, cpu)[old_vector] == > > > irq' failed at arch/x86/irq.c:233 > > > (XEN) [555.489882] **************************************** > > > > > > Looking at the code in question, the ASSERT looks wrong to me. > > > > > > Specifically, if you see send_cleanup_vector and > > > irq_move_cleanup_interrupt, it is entirely possible to have old_vector > > > still valid and also move_in_progress still set, but only some of the > > > per_cpu(vector_irq, me)[vector] cleared. It seems to me that this could > > > happen especially when an MSI has a large old_cpu_mask. > > > > i guess the only way to get into such situation would be if you happen > > to execute _clear_irq_vector() with a cpu_online_map smaller than the > > one in old_cpu_mask, at which point you will leave old_vector fields > > not updated. > > > > Maybe somehow you get into this situation when doing suspend/resume? > > > > Could you try to add a: > > > > ASSERT(cpumask_equal(tmp_mask, desc->arch.old_cpu_mask)); > > > > Before the `for_each_cpu(cpu, tmp_mask)` loop? > > I see that the old_cpu_mask is cleared in release_old_vec(), so that > suggestion is not very useful. > > Does the crash happen at specific points, for example just after > resume or before suspend? Hi Roger, Jan, Unfortunately we are all completely blind on this issue. It is not reproducible. It only happened once. We only have our wits to solve this problem. However, looking at the codebase I think there are a few possible races. Here is my analysis and suggested fix. --- xen/irq: fix races between send_cleanup_vector and _clear_irq_vector It is possible that send_cleanup_vector and _clear_irq_vector are running at the same time on different CPUs. In that case we have a race as both _clear_irq_vector and irq_move_cleanup_interrupt are trying to clear old_vector. This patch fixes 3 races: 1) As irq_move_cleanup_interrupt is running on multiple CPUs at the same time, and also _clear_irq_vector is running, it is possible that only some per_cpu(vector_irq, cpu)[old_vector] are valid but not all. So, turn the ASSERT in _clear_irq_vector into an if. 2) It is possible that _clear_irq_vector is running at the same time as release_old_vec, called from irq_move_cleanup_interrupt. At the moment, it is possible for _clear_irq_vector to read a valid old_cpu_mask but an invalid old_vector (because it is being set to invalid by release_old_vec). To avoid this problem in release_old_vec move clearing old_cpu_mask before setting old_vector to invalid. This way, we know that in _clear_irq_vector if old_vector is invalid also old_cpu_mask is zero and we don't enter the loop. 3) It is possible that release_old_vec is running twice at the same time for the same old_vector. Change the code in release_old_vec to make it OK to call it twice. Remove both ASSERTs. With those gone, it should be possible now to call release_old_vec twice in a row for the same old_vector. Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxx> --- xen/arch/x86/irq.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 20150b1c7f..d9c139cf1b 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -112,16 +112,11 @@ static void release_old_vec(struct irq_desc *desc) { unsigned int vector = desc->arch.old_vector; - desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED; cpumask_clear(desc->arch.old_cpu_mask); + desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED; - if ( !valid_irq_vector(vector) ) - ASSERT_UNREACHABLE(); - else if ( desc->arch.used_vectors ) - { - ASSERT(test_bit(vector, desc->arch.used_vectors)); + if ( desc->arch.used_vectors ) clear_bit(vector, desc->arch.used_vectors); - } } static void _trace_irq_mask(uint32_t event, int irq, int vector, @@ -230,9 +225,11 @@ static void _clear_irq_vector(struct irq_desc *desc) for_each_cpu(cpu, tmp_mask) { - ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq); - TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu); - per_cpu(vector_irq, cpu)[old_vector] = ~irq; + if ( per_cpu(vector_irq, cpu)[old_vector] == irq ) + { + TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu); + per_cpu(vector_irq, cpu)[old_vector] = ~irq; + } } release_old_vec(desc); -- 2.25.1

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.