[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Debian 10, xen 4.11 reliability
On 7/16/20 2:34 PM, Hans van Kranenburg wrote: You're not running Debian Xen packages apparently, so I can't say muchabout that part. But, is that Linux 4.9 in the dom0? Begin by eliminating that. We've been running Linux 4.9 for a long time, though we plan to upgrade soon. The timing does not correlate, and far less than one percent of our users are having issues. Our milage may vary, but at work, we skipped from Jessie to Buster (well, actually to our own strech-backports) because I really could not get anything working with Linux 4.9 as dom0 kernel after the whole Spectre/Meltdown stuff unfolded. We never got to the bottom of it, due to a big lack of time and kernel debugging knowledge/experience, but what I have seen is random Oopses, disk corruption and other things. There were panics in the dom0 which I traced to a network driver, and I fixed it. This is the first time we've had complaints of file system corruption. Are you using live migration? Not so recently that it would have affected the two systems with problems. So, why not get those dom0s to latest Xen 4.11 packages from Debian and Linux 4.19? It's flying here, with several clusters of dozens of servers and a few dozen TiB of mems, running thousands of domUs, without any problem. Are your dom0's running the latest kernel version? Are they running ext3? What uptime have they had? What about the domU's? I agree with Ben that using ext3 nowadays should be discouraged because of the amount of usage and testing decreasing. Yes. I think Debian and Ubuntu are the only distributions where we might have users who are using an old file system with a new kernel, which is why I'm focused on ext3. But I can't say for certain. But, I might have the luxury of working with a setup where we manage all of it and have customers look at some GUI and have no idea about the actual underlying systems. Having customers run anything they want is a different slice of bread... It very much is. Anyway, the above is just some thinking out loud. I know that it's very difficult to debug these kinds of things, because you need more failures happening to be able to correlate, and a reliable reproduction scenario would be the ultimate thing as a start to figure out what's actually going wrong, but these are really difficult time consuming tasks. We're trying. Thanks, Sarah
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |