[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ANNOUNCE] Xen 4.15 - call for notification/status of significant bugs



Dario Faggioli writes ("Re: [ANNOUNCE] Xen 4.15 - call for notification/status 
of significant bugs"):
> On Thu, 2021-02-04 at 12:12 +0000, Ian Jackson wrote:
> > I reviewed a thread about this and it is not clear to me where we are
> > with this.
.
> Ok, let me try to summarize the current status.

Thanks.

> - BUG: credit=sched2 machine hang when using DRAKVUF
> 
>   https://lists.xen.org/archives/html/xen-devel/2020-05/msg01985.html
>   https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01561.html
>   https://bugzilla.opensuse.org/show_bug.cgi?id=1179246
> 
>   99% sure that it's a Credit2 scheduler issue.
>   I'm actively working on it.
>   "Seems a tricky one; I'm still in the analysis phase"
> 
>   Manifests only with certain combination of hardware and workload. 
>   I'm not reproducing, but there are multiple reports of it (see 
>   above). I'm investigating and trying to come up at least with 
>   debug patches that one of the reporter should be able and willing to 
>   test.

I think this is a clear blocker for 4.15.  I will call it "F".

> - Null scheduler and vwfi native problem
> 
>   https://lists.xenproject.org/archives/html/xen-devel/2021-01/msg01634.html
> 
>   RCU issues, but manifests due to scheduler behavior (especially   
>   NULL scheduler, especially on ARM).
>   I'm actively working on it.
> 
>   Patches that should solve the issue for ARM posted already. They 
>   will need to be slightly adjusted to cover x86 as well. Waiting a 
>   couple days more for a confirmation from the reporter that the
>   patches do help, at least on ARM.

I'm not sure whether this is a blocker but it looks like it is going
to be fixed so I will keep it on my list.  I will call it "G".


> - Xen crash after S3 suspend - Xen 4.13
> 
>   https://lists.xen.org/archives/html/xen-devel/2020-03/msg01251.html
>   https://lists.xen.org/archives/html/xen-devel/2021-01/msg02620.html
> 
>   S3 suspend issue, but root cause seems to be in the scheduler.
> 
>   Marek is, as usual, providing good info and feedback. It comes as 
>   third in my list (below the two above, basically), but I will look
>   into it.

This is not a blocker so I won't track it explicitly but I would
very much welcome a fix if it is simple or comes quickly.


> - Ryzen 4000 (Mobile) Softlocks/Micro-stutters
> 
>   https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg00966.html
> 
>   Seems could be scheduling, but amount of info is limited.
> 
>   What we know is that with `dom0_max_vcpus=1 dom0_vcpus_pin`, all 
>   schedulers seem to work fine. Without those params, Credit2 is the 
>   "least bad", although not satisfactory. Other schedulers don't even 
>   boot.
>   Fact is, it is reported to occure on QubesOS, which has its own 
>   downstream patches, plus there are no logs.
>   There's a feeling that this (together with others) hints at SMT off 
>   having issues on AMD (Ryzen?), but again, it's not crystal clear to 
>   me whether this is the issue (or an issue at all) and, if yes, in 
>   what subsystem the problem lays.
>   I can try to have a look, mostly for trying to understand whether or 
>   not it is really the case that some AMDs have issues with SMT=off.
>   But that probably will be after I'll be done with the other issues 
>   I've mentioned before (above) this one.

I'm not sure whether you are saying (a) our current code is not
useable on this hardware because of this issue, or on the other hand
(b) you think the issue is specific to downstream patches ?

Do you think I should consider this a blocker for 4.15 ?


> - Recent upgrade of 4.13 -> 4.14 issue
> 
>   https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01800.html 
> 
>   To my judgment, It's not at all clear whether or not this is a 
>   scheduler issue. And at least with the amount of info that we have 
>   so far, I'd lean toward "no, it's not". I'm happy to help with it 
>   anyway, of course, but it comes after the others.

Again, I think this is not a regression so not a blocker for 4.15.


> So, Ian, was this any helpful?

Yes, very much so, thank you.

Ian.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.