[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Xen 4.12 DomU hang / freeze / stall under high network/disk load
On 2/14/20 9:00 AM, Glen wrote: On Fri, Feb 14, 2020 at 12:07 AM Tomas Mozes <hydrapolic@xxxxxxxxx> wrote:Hello Glen, thanks for your report.Thank you!The symptoms seem similar: - xen 4.12 - 2 cpu - high load - dom0 is ok, domU stallsYes, same here.I've just upgraded one of my machines to xen 4.12 (again) with sched=credit, I'll report back if it helps.Thanks. Right now I'm focused on tsc_mode, since it's what I was working on before Sarah's responses yesterday. My guest machine survived 24 hours under a very high load test using tsc_mode="always_emulate" in the guest machine config. I realize that 24 hours is hardly conclusive, but given Sarah's suggestion to try to eliminate things quickly, But how long does it take to reproduce under the original conditions?If you want confidence that something staying up for 24 hours is a fix, you want a test that takes much less than 24 hours to reproduce the failure repeatedly under the original conditions. If it takes 24 hours on average to reproduce, I would say you want to run for at least 48 hours, if not significantly longer, to have any confidence in a fix. You could use some probability theory to put some better numbers here. I've now switched the guest to what I think is the opposite tsc_mode="native", and I'm trying it there for a little while to see if it crashes. (Neither of these are the default - the default seems to be a hybrid of the two - but I will return and test that once more if I can.) If you haven't read the entirety of https://xenbits.xen.org/docs/unstable/man/xen-tscmode.7.html you probably want to then. In addition to my normal test methods, I have at Sarah's suggestion thrown in an iperf3 at maximum speed, continuous repeat from the host to the guest, just to see if it helps stall the machine faster. It's pushing data at 14GBps right now, so we'll see. If it's the vif rate limit which is the issue - that affects data outbound from the guest, not inbound to the guest. It's easy enough to confirm the direction by checking the interface tx/rx counters. I was thinking that if it was network load, you might be able to reproduce within a few minutes using iperf, which would make subsequent testing much more easy. --Sarah _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |