[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen 4.12 DomU hang / freeze / stall under high network/disk load


  • To: Glen <glenbarney@xxxxxxxxx>
  • From: Sarah Newman <srn@xxxxxxxxx>
  • Date: Thu, 13 Feb 2020 19:06:05 -0800
  • Cc: Xen-users <xen-users@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 14 Feb 2020 03:06:58 +0000
  • Dkim-filter: OpenDKIM Filter v2.11.0 mail.prgmr.com 86BEE7200BE
  • List-id: Xen user discussion <xen-users.lists.xenproject.org>

On 2/13/20 6:26 PM, Glen wrote:

I tried both xl network-detach followed by a network-attach (feeding
back in the parameters from my guest machine.)

OK. Were you able to check if the network device went away in the domU? It 
should have, but you won't see anything in dmesg necessarily.

(XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Initializing Credit2 scheduler

This is something that changed by default from xen 4.11 to 4.12:

https://xenproject.org/2019/04/02/whats-new-in-xen-4-12/

You could try the old scheduler:

https://xenbits.xen.org/docs/unstable/features/sched_credit.html

I am skeptical this is the problem, but you could try the old one.

maxmem=90112
vcpus=26
This is fairly large.
Have you tried both fewer cpus and less memory? If you can reproduce with 
iperf, which probably will reproduce more quickly, can you reproduce with
memory=2048 and vcpus=1 or vcpus=2 for example? FYI the domU might not boot at 
all with vcpus=1 with some kernel versions.

I... have not.... and please pardon my ignorance here, but my guest
machine runs a lot of different things for our client, and definitely
needs the RAM (and I *think* needs the CPUs, although I confess that
I'm not sure how vcpus translate to available compute power.)  I can
try the smaller numbers, but have not because to me it's off-point,
since my guest requires the larger number of resources we've
traditionally allocated.

Then set up another VM for testing?

Anything about your setup that's out of the ordinary is a reasonable place to start looking for problems. It may not solve your immediate issue but if it means a developer can reproduce, that gives you a chance of the bug actually getting fixed.

So, going back to your "ideally by changing one thing at a time"
comment, here's kind of how I'm proceeding:

I'd recommend you start by attempting to reproduce the problem as fast as possible, with the setup as-is, before changing anything. 4 days is too long to have any certainty.

BTW, if it's the domU network load - you would probably reproduce fastest by 
running testing between 2 domUs on the same dom0, if you can.

--Sarah

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.