[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (Debian) Bug#1051862: server flooded with xen_mc_flush warnings with xen 4.17 + linux 6.1

To: Hans van Kranenburg <hans@xxxxxxxxxxx>, 1051862@xxxxxxxxxxxxxxx, Juergen Gross <jgross@xxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
From: Radoslav Bodó <radoslav.bodo@xxxxxxxxxxx>
Date: Thu, 14 Sep 2023 09:46:00 +0200
Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=igalileo.cz; dmarc=pass action=none header.from=igalileo.cz; dkim=pass header.d=igalileo.cz; arc=none
Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yYNse05wDlxB3wdYktaKvIja6iO9Z3jkSPXa1biJFr0=; b=CoLjicG281W+/Mj5svfE1pVu8MdGi6QS+aK4hDP1RyvyIIcr8lbqUrjZd6nGOoA/fZFZEp09wFvPwAWNlS//zGv0xaC4grFJvFrfOlrJtcE5zTzxIYYZ6cQAZJCQP2twkXWKxKaJgTVQNUgWi/ifnKqTpZ9nVL9GpYQMuYV1LrIOjQF1aqSjZi3/USPUVQNkPemMNr7kcSi9+TEL7KGG+OZqcteSURC3gFw1aIeOfU6iX/N52qxBaUDkwk2tkcq2+IBCLD2WPaUY2g4zDNq9/x/WRGJWcf6/+RPtyX67ma8G8TkD8+O9rnQiFGL/1hAxEnxYRjvmhOiVxyshbLZTxw==
Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KMDBYuC7zpJgU0zCb7/YxfZgN8HdpwIRdYDniUuyvRjeua/5esMQ2Mt0P3FGLwX3y0xmHGKFS5d8bnCJz2IpT/Hezupz9hbpOAj0owuI6K2CuabC3bxzVJyVUnxUSgHgqC4ukLEFSNk84CBaX1lNy8sWDmHCSMAf3vaY/M+5w184o444DCjsVx0PdfOHJ8w+XEKtxCIdJrOK0gT+xV67T/cUoGte8y8RRm97rYR7xgOJuylrwQKjTIDJs8O5KQ7ANRR2uybPM+rzdbgcnif23YY9rH0aFpenwxYg8SUtR0xkSYkjZovT8l1YLkrh9ua60banUzi1YqozNAkA4yOWJg==
Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=igalileo.cz;
Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 14 Sep 2023 07:49:22 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi all,

hopefully it's ok to reply-all at this point


On 9/13/23 23:38, Hans van Kranenburg wrote:

I have a few quick additional questions already:

1. For clarification.. From your text, I understand that only this one
single server is showing the problem after the Debian version upgrade.
Does this mean that this is the only server you have running with
exactly this combination of hardware (and BIOS version, CPU microcode
etc etc)? Or, is there another one with same hardware which does not
show the problem?

This is the unique HW combination in terms of server type Dell R750xsand CPU type 'Intel Xeon Silver 4310'

2. Can you reply with the output of 'xl dmesg' when the problem happens?
Or, if the system gets unusable too quick, do you have a serial console
connection to capture the output?


in attachment

3. To confirm... I understand that there are many of these messages.
Since you pasted only one, does that mean that all of them look exactly
the same, with "1 of 1 multicall(s) failed: cpu 10" "call  1: op=1
arg=[ffff8888a1a9eb10] result=-22"? Or are there variations? If so, can
you reply with a few different ones?


all looks exacly same, only 1 of 1 multicalls failed with same result



On 9/14/23 07:43, Juergen Gross wrote:
>>> kernel: [   99.768181] Call Trace:
>>> kernel: [   99.768436]  <TASK>
>>> kernel: [   99.768691]  ? __warn+0x7d/0xc0
>>> kernel: [   99.768947]  ? xen_mc_flush+0x196/0x220
>>> kernel: [   99.769204]  ? report_bug+0xe6/0x170
>>> kernel: [   99.769460]  ? handle_bug+0x41/0x70
>>> kernel: [   99.769713]  ? exc_invalid_op+0x13/0x60
>>> kernel: [   99.769967]  ? asm_exc_invalid_op+0x16/0x20
>>> kernel: [   99.770223]  ? xen_mc_flush+0x196/0x220
>>> kernel: [   99.770478]  xen_mc_issue+0x6d/0x70
>>> kernel: [   99.770726]  xen_set_pmd_hyper+0x54/0x90
>>> kernel: [   99.770965]  do_set_pmd+0x188/0x2a0
>
> This looks like an attempt to map a hugepage, which isn't supported
> when running as a Xen PV guest (this includes dom0).
>
> Are transparent hugepages enabled somehow? In a Xen PV guest there

> should be no /sys/kernel/mm/transparent_hugepage directory. Depending> on the presence of that directory either hugepage_init() has a bug,or > a test for hugepages being supported is missing infilemap_map_pages() > or do_set_pmd().

>
>>> kernel: [   99.771200]  filemap_map_pages+0x1a9/0x6e0
>>> kernel: [   99.771434]  xfs_filemap_map_pages+0x41/0x60 [xfs]
>>> kernel: [   99.771714]  do_fault+0x1a4/0x410
>>> kernel: [   99.771947]  __handle_mm_fault+0x660/0xfa0

in faulty state (linux 6.1) and also in good state (linux 5.10), thedirectory /sys/kernel/mm/transparent_hugepage is not present

we have also tried to boot with 'transparent_hugepage=never', but itmake no difference



best regards
bodik

Attachment: xen-mcflush-xldmesg.txt
Description: Text document

References:
- Re: (Debian) Bug#1051862: server flooded with xen_mc_flush warnings with xen 4.17 + linux 6.1
  - From: Hans van Kranenburg

Prev by Date: Re: [XEN PATCH] xen/libelf: address violations of MISRA C:2012 RUles 8.2 and 8.3
Next by Date: [linux-linus test] 182999: regressions - trouble: broken/fail/pass
Previous by thread: Re: (Debian) Bug#1051862: server flooded with xen_mc_flush warnings with xen 4.17 + linux 6.1
Next by thread: [linux-linus test] 182992: regressions - trouble: blocked/broken/fail/pass
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.