Xen project Mailing List

Re: [RFC] Xen crashes on ASSERT on suspend/resume, suggested fix

To: Stefano Stabellini <stefano.stabellini@xxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 23 May 2023 16:50:06 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iewsP3RDwBj7upNajhGPj9pL1aILmljqTlcpQ2xqdk0=; b=KX9OlMp8khVhj9zo7DfB22N1LdtzOyhx3hI1asm8dkiUXdc4IGmSXUSU1iNcqxPIZm3/BFuwXI6T7iO7BsCx2kbKv/DfHIkBiNbFaVyyY5Knbu2Wcmdxw8iylvAFR9h+uSvN0qEgrxN/XMng3wtZ6Ld1u6jF/Z0oGQOi+FGIK8SwH+P8DsTcQm0BJd+1S9/j8Gd6WAa0C3rBud6Emy/QnViX7fN+sooGWkFWv0yGY46QrrYaJxQ5oQ/6ztOlbSCJIraljFKJmx9Y7VAcQyERsLkOsr4xyJH8WW5Wf3G3enwxdYwWGUt2GIGHjGR8PjXvnNFOE7h+JVYIgwj1VjeKFA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hiSGoMs2Kgd0TpOEU9nliTjs+TD5WAPz8tma5wH4q+gjDJYjeJkj/sq4AzpP8zqsDzph4RSWdiu0NSyjdvBhVBSp8rPKU6MU7AjThJZMt4wx+3dlBfUzXrAO9p1QjS7C1shEYzp79J0W6TsBObrKu8iC1edlI8tgyNp3j5qv3SoJSpRbiHpDLPdv1xYZFFq9keVSejoJrE36JFVTSWHMV+rGeMebP79BK4icQ/6ASqTqSi+fznYNDB7sBvaKFLoh1KCHTf1x9FVQd53zZuFxEVTiFBWPktLwF3wA829koJ2e9vFtc+WAcnQqYZD54G6hopI5DN4qjyT3igp/d0nBow==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, jbeulich@xxxxxxxx, andrew.cooper3@xxxxxxxxxx, xenia.ragiadakou@xxxxxxx

Delivery-date: Tue, 23 May 2023 14:50:39 +0000

Ironport-data: A9a23:PKTpYKxt3EXqTfB8F8h6t+f/xyrEfRIJ4+MujC+fZmUNrF6WrkUBm 2JLWDiEM6rfZ2GnfNx0Ptyx9xhQ7ZfSztAxS1RopCAxQypGp/SeCIXCJC8cHc8wwu7rFxs7s ppEOrEsCOhuExcwcz/0auCJQUFUjP3OHfykTrafYEidfCc8IA85kxVvhuUltYBhhNm9Emult Mj75sbSIzdJ4RYtWo4vw/zF8EsHUMja4mtC5QRjP6sT5jcyqlFOZH4hDfDpR5fHatE88t6SH 47r0Ly/92XFyBYhYvvNfmHTKxBirhb6ZGBiu1IOM0SQqkEqSh8ai87XAME0e0ZP4whlqvgqo Dl7WT5cfi9yVkHEsLx1vxC1iEiSN4UekFPMCSDXXcB+UyQq2pYjqhljJBheAGEWxgp4KWJx8 NgXOhE2Ui+83+Op8KiVWvk13Mt2eaEHPKtH0p1h5RfwKK9/BLrlE+DN79Ie2yosjMdTG/qYf 9AedTdkcBXHZVtIJ0sTD5U92uyvgxETcRUB8A7T+fVxvjiVlVIhuFTuGIO9ltiiX8Jak1zev mvb12/4HgsbJJqUzj/tHneE37eezH2qB9hPfFG+3uBmvk2c6l0qMQExVWSE/dDhs0WeSc0Kf iT4/QJr98De7neDVcXwURS+pzifohcWVt5UEus7wAiIxuzf5APxLngJSHtNZcIrsOcyRCc2z RmZktXxHzttvbaJD3WH+d+8rzm/JCwUJm8qfjIfQE0O5NyLiJE+iBPGCMxqH6+8gtT2HizYy jWG6iM5gt0uYdUj0qy6+RXNhWKqr52QFwotvFyJDiSi8x9zY5Oja8qw81/H4P1cLYGfCF6co HwDnMvY5+cLZX2QqBGwrCw2NOnBz5643Pf02zaDw7FJG+yRxkOe

Ironport-hdrordr: A9a23:gqxvpKp54EXVJOt2oHPgSqQaV5r5eYIsimQD101hICG9E/bo7f xG+c5x6faaslgssR0b9Oxoe5PhfZqkz+8S3WBJB8baYOCEggqVxeNZnPDfKlTbckWVygc378 hdmsZFZOEYQmIK7vrS0U2UH9Mh39Wd4MmT9ILjJxAEd3ATV0jM1XYcNu/eKDwQeCBWQZ40Do CV6MZkqyrIQwV0UviG

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, May 23, 2023 at 03:54:36PM +0200, Roger Pau Monné wrote: > On Thu, May 18, 2023 at 04:44:53PM -0700, Stefano Stabellini wrote: > > Hi all, > > > > After many PVH Dom0 suspend/resume cycles we are seeing the following > > Xen crash (it is random and doesn't reproduce reliably): > > > > (XEN) [555.042981][<ffff82d04032a137>] R > > arch/x86/irq.c#_clear_irq_vector+0x214/0x2bd > > (XEN) [555.042986][<ffff82d04032a74c>] F destroy_irq+0xe2/0x1b8 > > (XEN) [555.042991][<ffff82d0403276db>] F msi_free_irq+0x5e/0x1a7 > > (XEN) [555.042995][<ffff82d04032da2d>] F unmap_domain_pirq+0x441/0x4b4 > > (XEN) [555.043001][<ffff82d0402d29b9>] F > > arch/x86/hvm/vmsi.c#vpci_msi_disable+0xc0/0x155 > > (XEN) [555.043006][<ffff82d0402d39fc>] F vpci_msi_arch_disable+0x1e/0x2b > > (XEN) [555.043013][<ffff82d04026561c>] F > > drivers/vpci/msi.c#control_write+0x109/0x10e > > (XEN) [555.043018][<ffff82d0402640c3>] F vpci_write+0x11f/0x268 > > (XEN) [555.043024][<ffff82d0402c726a>] F > > arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7 > > (XEN) [555.043029][<ffff82d0402c6682>] F > > hvm_process_io_intercept+0x203/0x26f > > (XEN) [555.043034][<ffff82d0402c6718>] F hvm_io_intercept+0x2a/0x4c > > (XEN) [555.043039][<ffff82d0402b6353>] F > > arch/x86/hvm/emulate.c#hvmemul_do_io+0x29b/0x5f6 > > (XEN) [555.043043][<ffff82d0402b66dd>] F > > arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a > > (XEN) [555.043048][<ffff82d0402b7bde>] F hvmemul_do_pio_buffer+0x33/0x35 > > (XEN) [555.043053][<ffff82d0402c7042>] F handle_pio+0x6d/0x1b4 > > (XEN) [555.043059][<ffff82d04029ec20>] F svm_vmexit_handler+0x10bf/0x18b0 > > (XEN) [555.043064][<ffff82d0402034e5>] F svm_stgi_label+0x8/0x18 > > (XEN) [555.043067] > > (XEN) [555.469861] > > (XEN) [555.471855] **************************************** > > (XEN) [555.477315] Panic on CPU 9: > > (XEN) [555.480608] Assertion 'per_cpu(vector_irq, cpu)[old_vector] == irq' > > failed at arch/x86/irq.c:233 > > (XEN) [555.489882] **************************************** > > > > Looking at the code in question, the ASSERT looks wrong to me. > > > > Specifically, if you see send_cleanup_vector and > > irq_move_cleanup_interrupt, it is entirely possible to have old_vector > > still valid and also move_in_progress still set, but only some of the > > per_cpu(vector_irq, me)[vector] cleared. It seems to me that this could > > happen especially when an MSI has a large old_cpu_mask. > > i guess the only way to get into such situation would be if you happen > to execute _clear_irq_vector() with a cpu_online_map smaller than the > one in old_cpu_mask, at which point you will leave old_vector fields > not updated. > > Maybe somehow you get into this situation when doing suspend/resume? > > Could you try to add a: > > ASSERT(cpumask_equal(tmp_mask, desc->arch.old_cpu_mask)); > > Before the `for_each_cpu(cpu, tmp_mask)` loop? I see that the old_cpu_mask is cleared in release_old_vec(), so that suggestion is not very useful. Does the crash happen at specific points, for example just after resume or before suspend? Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.