[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: User domain starts with a crash loop when memory configured is above 500GB


  • To: Robert Polasek <polasekr@xxxxxxxxx>, xen-users@xxxxxxxxxxxxxxxxxxxx
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Tue, 19 Sep 2023 16:35:24 +0200
  • Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
  • Delivery-date: Tue, 19 Sep 2023 14:36:13 +0000
  • List-id: Xen user discussion <xen-users.lists.xenproject.org>

On 18.09.23 17:34, Robert Polasek wrote:
Hi everybody,

I have a server with 760GB of RAM. I have only domain 0 running there with 16GB of ram assigned to it.

Here is a configuration for my user domain:

name = "node01"
kernel = "/boot/vmlinuz-5.15.0-82-generic"
root = "/dev/xvda"
memory = 614400
maxmem = 614400
vcpus = 32
maxvcpus = 32
disk = ['file:/vserver/images/node01.img,xvda,w']
vif = ['bridge=virbr0,mac=00:16:3e:01:01:02']
iommu = "soft"
swiotlb = "force"
pci_permissive = 1
pci = ['0000:3e:00.0','0000:3f:00.0','0000:40:00.0','0000:41:00.0','0000:b1:00.0','0000:b2:00.0']

nics = 1
dhcp = "off"
ip = "192.168.122.15"
netmask = "255.255.255.0"
gateway = "192.168.122.1"
hostname = "node01"

extra="3"

When I try to start the domain, it spins in a crash loop with following error messages:

[ 6864.140170] WARNING: CPU: 2 PID: 266 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x197/0x200
[ 6864.140183] Modules linked in:
[ 6864.140190] CPU: 2 PID: 266 Comm: xen-balloon Tainted: G      D W  5.15.0-82-generic #91-Ubuntu
[ 6864.140203] RIP: e030:xen_mc_flush+0x197/0x200
[ 6864.140212] Code: 77 65 89 c0 48 c1 e0 05 48 05 00 20 00 81 ff d0 0f 1f 00 49 89 45 18 48 85 c0 0f 89 17 ff ff ff 45 8b 4d 00 41 bf 01 00 00 00 <0f> 0b 48 c7 c7 f0 8e 5b 82 44 89 ca 44 89 fe 45 31 f6 65 8b 0d e8
[ 6864.140234] RSP: e02b:ffffc90041027b88 EFLAGS: 00010002
[ 6864.140243] RAX: 0000000000000001 RBX: 0000000000000040 RCX: 0000000000000000
[ 6864.140253] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff89009809e310
[ 6864.140264] RBP: ffffc90041027bb8 R08: ffff888168dc0000 R09: 0000000000000002
[ 6864.140275] R10: 0000000000000200 R11: ffff8900980b7690 R12: 0000000000000000
[ 6864.140286] R13: ffff89009809e300 R14: 0000000000000002 R15: 0000000000000001
[ 6864.140303] FS:  0000000000000000(0000) GS:ffff890098080000(0000) knlGS:0000000000000000
[ 6864.140315] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6864.140324] CR2: 0000000000000000 CR3: 0000000002e10000 CR4: 0000000000050660
[ 6864.140339] Call Trace:
[ 6864.140344]  <TASK>
[ 6864.140349]  ? __raw_callee_save_xen_make_pte+0x15/0x27
[ 6864.140359]  xen_mc_issue+0x61/0x80
[ 6864.140367]  xen_alloc_pte+0xd8/0x290
[ 6864.140376]  pmd_populate_kernel.constprop.0+0x4b/0xa0
[ 6864.140387]  vmemmap_pmd_populate+0x69/0x79
[ 6864.140395]  vmemmap_populate_basepages+0x68/0xb3
[ 6864.140405]  vmemmap_populate+0x2a/0xa9
[ 6864.140412]  __populate_section_memmap+0x3c/0x57
[ 6864.140422]  sparse_add_section+0x12b/0x1dc
[ 6864.140431]  __add_pages+0xac/0x150
[ 6864.140440]  add_pages+0x17/0x70
[ 6864.140447]  arch_add_memory+0x45/0x60
[ 6864.140455]  add_memory_resource+0x12c/0x320
[ 6864.140467]  reserve_additional_memory+0x10f/0x160
[ 6864.140476]  balloon_thread+0x337/0x500
[ 6864.140483]  ? wait_woken+0x70/0x70
[ 6864.140492]  ? reserve_additional_memory+0x160/0x160
[ 6864.140501]  kthread+0x127/0x150
[ 6864.140509]  ? set_kthread_struct+0x50/0x50
[ 6864.140518]  ret_from_fork+0x1f/0x30
[ 6864.140528]  </TASK>
[ 6864.140533] ---[ end trace 3bca9737718a46b2 ]---
[ 6864.140541] 1 of 2 multicall(s) failed: cpu 2
[ 6864.140549]   call  2: op=26 arg=[ffff89009809eb10] result=-22

Any suggestion what I am doing wrong? There should be plenty of RAM to start 600GB domain. I can start  user domain with 500GB no problem. Thank you in advance for your help and suggestions.

I think your kernel has been configured with CONFIG_XEN_512GB.

You should try to add "xen_512gb_limit=0" to your guest's command line.

Even if this is fixing your boot issue, the guest shouldn't show the error
you are seeing.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.