[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (Debian) Bug#1051862: server flooded with xen_mc_flush warnings with xen 4.17 + linux 6.1


  • To: Hans van Kranenburg <hans@xxxxxxxxxxx>, 1051862@xxxxxxxxxxxxxxx, Juergen Gross <jgross@xxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
  • From: Radoslav Bodó <radoslav.bodo@xxxxxxxxxxx>
  • Date: Thu, 14 Sep 2023 09:46:00 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=igalileo.cz; dmarc=pass action=none header.from=igalileo.cz; dkim=pass header.d=igalileo.cz; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yYNse05wDlxB3wdYktaKvIja6iO9Z3jkSPXa1biJFr0=; b=CoLjicG281W+/Mj5svfE1pVu8MdGi6QS+aK4hDP1RyvyIIcr8lbqUrjZd6nGOoA/fZFZEp09wFvPwAWNlS//zGv0xaC4grFJvFrfOlrJtcE5zTzxIYYZ6cQAZJCQP2twkXWKxKaJgTVQNUgWi/ifnKqTpZ9nVL9GpYQMuYV1LrIOjQF1aqSjZi3/USPUVQNkPemMNr7kcSi9+TEL7KGG+OZqcteSURC3gFw1aIeOfU6iX/N52qxBaUDkwk2tkcq2+IBCLD2WPaUY2g4zDNq9/x/WRGJWcf6/+RPtyX67ma8G8TkD8+O9rnQiFGL/1hAxEnxYRjvmhOiVxyshbLZTxw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KMDBYuC7zpJgU0zCb7/YxfZgN8HdpwIRdYDniUuyvRjeua/5esMQ2Mt0P3FGLwX3y0xmHGKFS5d8bnCJz2IpT/Hezupz9hbpOAj0owuI6K2CuabC3bxzVJyVUnxUSgHgqC4ukLEFSNk84CBaX1lNy8sWDmHCSMAf3vaY/M+5w184o444DCjsVx0PdfOHJ8w+XEKtxCIdJrOK0gT+xV67T/cUoGte8y8RRm97rYR7xgOJuylrwQKjTIDJs8O5KQ7ANRR2uybPM+rzdbgcnif23YY9rH0aFpenwxYg8SUtR0xkSYkjZovT8l1YLkrh9ua60banUzi1YqozNAkA4yOWJg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=igalileo.cz;
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 14 Sep 2023 07:49:22 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi all,

hopefully it's ok to reply-all at this point


On 9/13/23 23:38, Hans van Kranenburg wrote:
I have a few quick additional questions already:

1. For clarification.. From your text, I understand that only this one
single server is showing the problem after the Debian version upgrade.
Does this mean that this is the only server you have running with
exactly this combination of hardware (and BIOS version, CPU microcode
etc etc)? Or, is there another one with same hardware which does not
show the problem?

This is the unique HW combination in terms of server type Dell R750xs and CPU type 'Intel Xeon Silver 4310'


2. Can you reply with the output of 'xl dmesg' when the problem happens?
Or, if the system gets unusable too quick, do you have a serial console
connection to capture the output?

in attachment


3. To confirm... I understand that there are many of these messages.
Since you pasted only one, does that mean that all of them look exactly
the same, with "1 of 1 multicall(s) failed: cpu 10" "call  1: op=1
arg=[ffff8888a1a9eb10] result=-22"? Or are there variations? If so, can
you reply with a few different ones?

all looks exacly same, only 1 of 1 multicalls failed with same result



On 9/14/23 07:43, Juergen Gross wrote:
>>> kernel: [   99.768181] Call Trace:
>>> kernel: [   99.768436]  <TASK>
>>> kernel: [   99.768691]  ? __warn+0x7d/0xc0
>>> kernel: [   99.768947]  ? xen_mc_flush+0x196/0x220
>>> kernel: [   99.769204]  ? report_bug+0xe6/0x170
>>> kernel: [   99.769460]  ? handle_bug+0x41/0x70
>>> kernel: [   99.769713]  ? exc_invalid_op+0x13/0x60
>>> kernel: [   99.769967]  ? asm_exc_invalid_op+0x16/0x20
>>> kernel: [   99.770223]  ? xen_mc_flush+0x196/0x220
>>> kernel: [   99.770478]  xen_mc_issue+0x6d/0x70
>>> kernel: [   99.770726]  xen_set_pmd_hyper+0x54/0x90
>>> kernel: [   99.770965]  do_set_pmd+0x188/0x2a0
>
> This looks like an attempt to map a hugepage, which isn't supported
> when running as a Xen PV guest (this includes dom0).
>
> Are transparent hugepages enabled somehow? In a Xen PV guest there
> should be no /sys/kernel/mm/transparent_hugepage directory. Depending > on the presence of that directory either hugepage_init() has a bug, or > a test for hugepages being supported is missing in filemap_map_pages() > or do_set_pmd().
>
>>> kernel: [   99.771200]  filemap_map_pages+0x1a9/0x6e0
>>> kernel: [   99.771434]  xfs_filemap_map_pages+0x41/0x60 [xfs]
>>> kernel: [   99.771714]  do_fault+0x1a4/0x410
>>> kernel: [   99.771947]  __handle_mm_fault+0x660/0xfa0

in faulty state (linux 6.1) and also in good state (linux 5.10), the directory /sys/kernel/mm/transparent_hugepage is not present

we have also tried to boot with 'transparent_hugepage=never', but it make no difference


best regards
bodik

Attachment: xen-mcflush-xldmesg.txt
Description: Text document


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.