[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen 4.12 DomU hang / freeze / stall under high network/disk load

On Mon, Feb 17, 2020 at 10:49 AM Sarah Newman <srn@xxxxxxxxx> wrote:
> On 2/17/20 10:33 AM, Tomas Mozes wrote:
> > Just a quick note - no stall after switching to credit scheduler on xen  
> > 4.12 after 3 days.
> That's great news. By 4.12 do you mean release 4.12.1, 4.12.2, or something 
> else?
> I'm assuming when "PGNet Dev" reported 4.12 being bad and 4.13 being good, 
> they were using the default scheduler of credit 2.

I hope they respond with details!  :-)

> It's worth asking on xen-devel if there's a known bug in the credit 2 
> scheduler that's been fixed. It looks like there were some significant changes
> to the scheduling code in between Xen 4.12 and Xen 4.13, and if one was a fix 
> I'm not sure it would have been recognized as being so.

Sarah, Tomas -

Is that something one of you wants to do?  If not, I'm happy to take
that task, but don't want to step on toes.

In light of this report, I've added sched=credit to my bootloader, for
the *next* time, on my 4.12 production host.   The guest on that host
- which is my production machine and which I am not stress testing -
has now been up for 8 days (typical when not stress testing, it lasts
for 3-14 days).  Rather than rebooting to sched=credit now, I'm still
hoping it will stall again, so I can run the commands Sarah asked....
although I wonder if it's still worth it given what we're finding???

Sarah -

If you do feel it's worth it, I'm happy to wait.  Here are the
commands I have lined up to run on the physical host (current guest
id=10) when the guest stalls next:

xl sysrq 10 l
xl sysrq 10 x
xl debug-keys q
xl dmesg
xl info

Is this right?  Are there any other debugging commands I can/should
run on the host or guest when it stalls next?  Anything that might be
useful I'm happy to grab, but since it might be 2AM I want to line
them all up in a file (as I have above) so I don't have to hunt while
trying to stay awake.  :-)

After it stalls next and I grab the debugging output suggested, I'll
reboot the physical host into sched=credit for the production guest.

My test host/guest I'm going to leave on 15.0/4.10 for now - since
it's my future production host - until I do more testing on that
configuration and/or until we get this nailed down.


Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.