[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Xen 4.12 DomU hang / freeze / stall under high network/disk load
Hi Sarah - On Fri, Feb 14, 2020 at 6:22 PM Sarah Newman <srn@xxxxxxxxx> wrote: > I would personally guess it just means that something didn't get to run for a > long time. It might be worth using xl list / xl vcpu-list <domain> when > it's hung to see if it's running or blocked and how many cpu times are going > up or not. Okay, good. I'm adding that to my list of other things to pull the next time a guest freezes. Thank you for this! > Well, that gets you security support through December 2020. With zero sarcasm intended, all I really want is for it to get me the ability to sleep through the night. :-) I absolutely plan to keep pushing on this as long as I can even if I get stability on 4.10 - so I hope that by December 2020 we/they will have figured this out and fixed it. I'll take what I can get! :-) > I've gotten very useful data from debug builds of both Linux and Xen. It will > massively slow down your system and you don't want to run them in > production. > --Sarah That also might be beyond my ability but I will try. So here's where I am at right now. Loosely speaking (and using Xen version numbers since I'm on the Xen list) what I have is a 4.9 production guest, and a 4.9 hot backup guest, and a 4.12 test guest. All are running on separate, Dell-based, 4.12 hosts. I don't have the luxury of stress-testing the production guest, but I don't need to: It stalls every 3-5 days. When it stalls, it's either during the day, in which my "test time" is limited, or it's during the middle of the night, in which case my cognitive ability is limited. :-) The next time it stalls, I'm going to do the sysrq things and the xl list/vcpu-list and other stuff similar to it, and try to capture it all. I'll then post it here. As a kind of informal fallback, today I downgraded the hot backup guest's host to 4.9 again. The downgrade went fine, that guest is now running on 4.9 all the way, so it "should" never stall again. As a hot backup, it's traffic is much lower, so it's only stalled once, this was really about just seeing if I could successfully downgrade production if I needed to. Today I also downgraded the test host to 4.10 (my only convenient option <4.12). I then launched the guest and started the stress testing. This next statement is interesting but otherwise useless: Under the same amount of stress testing the guests' CPU load average is about half what is was (hovering around 2-3, wheras before it was hovering around 6.) This is just for entertainment value only. If Tomas' experience applies to me, this should mean that my test guest will not stall anymore. I am going to let it run for 7 days under this load, and report back either at that time, or sooner if it stalls. At that time, I'll also have a map to proceed. If the guest survives, I'm going to roll my client forward to this configuration, because they need to be on the new OS for a number of other reasons. So if this proves to be "stable enough", we'll go forward. If this guest does NOT survive, I'm going to downgrade the current production host back to 4.9, putting us back to the place we were before the trouble started. Either way, I'll then be left with a pair of machines that are broken (whichever pair my client "Leaves behind") and then I can start much more aggressively testing everything you've asked me to - because my client will be happy, and I will be able to (literally) sleep at night. In the meantime, if anything else happens, I will report, and if you or anyone has other thoughts, please tell me. I definitely want to get this resolved and fixed for everyone, to the extent I can help - I just have to deal with the paying client first before I can turn back to this completely. (Test guest load average just touched 1.94 before coming back to 2.36 - no idea what this means but I have hope!) Thank you thank you! Glen _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |