On Fri, 2021-01-15 at 15:14 +0000, Lengyel, Tamas wrote:
> > 2) "scheduler broken" bugs.  We've had 4 or 5 reports of Xen not
> > working,
> > and very little investigation on whats going on.  Suspicion is that
> > there
> > might be two bugs, one with smt=0 on recent AMD hardware, and one
> > more general "some workloads cause negative credit" and might or
> > might
> > not be specific to credit2 (debugging feedback differs - also might
> > be 3
> > underlying issue).
> We've also ran into intermittent Xen lockups requiring power-cycling
> servers. We switched back to credit1 and had no issues since. 
Ah, that's interesting... Among the issues that I listed in my other
email, when trying to do a quick summary, "only" number 1 is about
Credit working when Credit2 does not. This one you're mentioning here
may be the second... or it may be the same! :-O

As said there, my theory so far is that there's a bug somewhere, not
necessarily in scheduling code, to which the two algorithms react
differently. Of course this is a theory, and I've not been able to
confirm it yet (otherwise I also would have fixed the problem. :-P).

But really, it would be interesting to double check if at least the
symptoms are the same than the ones of the issue reported here.

> Hard to tell if it was related to the scheduler or the pile of other
> experimental stuff we are running with but right now we have stable
> systems across the board with credit1.
Well, sure, that's understandable. :-) Which is why it's tricky at
times to debug these issue. In fact, I cannot reproduce them myself,
and users, rightfully, move on if they found a workaround.

Anyway, if at some point you decide to investigate, I'll be happy to

