[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Xen 4.12 DomU hang / freeze / stall under high network/disk load
On Thu, Feb 13, 2020 at 7:06 PM Sarah Newman <srn@xxxxxxxxx> wrote: > > I tried both xl network-detach followed by a network-attach (feeding > > back in the parameters from my guest machine.) > OK. Were you able to check if the network device went away in the domU? It > should have, but you won't see anything in dmesg necessarily. Alas no. The guest was unresponsive when I did this, and console functionality was very limited (as in, "sync" might have worked, but nothing else did.) I can only report that the commands didn't seem to help, or fix the problem, or have any visible impact on the guest. > You could try the old scheduler: > https://xenbits.xen.org/docs/unstable/features/sched_credit.html > I am skeptical this is the problem, but you could try the old one. Okay, noted, and added to my list. > Anything about your setup that's out of the ordinary is a reasonable place to > start looking for problems. It may not solve your immediate issue but if > it means a developer can reproduce, that gives you a chance of the bug > actually getting fixed. Absolutely, and that's what I want. I obviously want to solve my immediate problem and just get my setup to be stable, even if that means running on older Xen... but I take Xen seriously, and even if I get my situation stabilized, I will still work on this as long as anyone here wants to listen to me. :-) > I'd recommend you start by attempting to reproduce the problem as fast as > possible, with the setup as-is, before changing anything. 4 days is too long > to have any certainty. Right. In this case, a failure becomes good (for debugging) I've got 20 simultaneous tar processes running against my guest right now (which is way more than I've ever needed or attempted - because I'm grumpy and trying to do exactly what you say - make the guest crash as fast as possible so I can eliminate possibilities), with that tsc_mode="always_emulate" setting. It's survived that for 10 hours so far, which is far more than I expected. I can't imagine that *that* might solve this, but... I'll continue to watch it as long as I can, and report either way in the morning to see how it does after 24 hours. I can leave the host alone and test against that setting more, to see if I can crash the guest without it faster (again) at the higher load. I feel like downgrading to Xen 4.10 will probably fix *my* problem, but mask *the* problem, and I really want both fixed. :-) > BTW, if it's the domU network load - you would probably reproduce fastest by > running testing between 2 domUs on the same dom0, if you can. Not under my current setup, no. This is a huge guest and it (almost) maxes the host. I've got a pair of servers committed to this already, just for testing. But even if I can solve my immediate issue, I'll still have another pair (the current production group) of servers I can mess with, and I'll have more flexibility to change things then. THANK YOU THANK YOU! Glen _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |