[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ANNOUNCE] Xen 4.15 release update - still in feature freeze



[Adding George, since it's scheduling]

On Mon, 2021-03-15 at 12:18 +0000, Ian Jackson wrote:
> 
> OPEN ISSUES AND BLOCKERS
> ========================
> 
> [...]
> 
> SCHEDULER ISSUES NOT MAKING PROCESS ?
> -------------------------------------
> 
Yeah... let's try.

> BUG: credit=sched2 machine hang when using DRAKVUF
> 
> Information from
>   Dario Faggioli <dfaggioli@xxxxxxxx>
> References
>   https://lists.xen.org/archives/html/xen-devel/2020-05/msg01985.html
>    
> https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01561.html
>   https://bugzilla.opensuse.org/show_bug.cgi?id=1179246
> 
So, this is mostly about the third issue, the one described in the
openSUSE bug, which was however also reported here, by different
people.

As I've just wrote there (on the bug), I've been working on trying to
reproduce the problem on a variety of different machines. Seems AMD
seemed to be the most impacted, I've lately focused on hardware from
such vendor.

I have been, however, unable to re-create a situation where the
symptoms described in the reports occur. I specifically looked for
hardware that was the same, or similar enough, and I replayed the dom0
vcpu pinning configuration and the creation of domUs, both PV and HVM,
but the problem did not show up for me. The only difference between
what I've done so far and what is described, e.g., in the bug is that
I've not been able to check Windows guests yet. (I'll try that as soon
as I can, but if this would really be a scheduling issue, which OS runs
in the guest should not really matter much, I think).

Code inspection for something that comes from and/or affects the
scheduler and is both:
- CPU-vendor specific, and
- guest-type specific

also led me pretty much nowhere.

I produced a debug patch (I attach two versions of it, one for staging
and one for v4.13.2) that should help me tell whether or not the
scheduler is being invoked every time it should be and whether or not
there are vcpus that manages to run for longer than how the scheduler
would want them to.

But as you can imagine, a debug patch is not really helpful if it can't
be used within the scenario it is meant to debug, i.e., without a
reproducer.

I did manage to find an actual bug in Credit2, but that's totally
unrelated to the problem at hand (and that will hence be discussed in
another email).

So, that's the status. I definitely was hoping for things to be better
at this point of the release cycle. Sorry they're not. And of course
I'll keep digging, but unless I find a way to reproduce, I don't expect
big breakthrough. :-/

> G. Null scheduler and vwfi native problem
> 
> Information from
>   Dario Faggioli <dfaggioli@xxxxxxxx>
> 
> References
>    
> https://lists.xenproject.org/archives/html/xen-devel/2021-01/msg01634.html
> 
> Quoting Dario:
> > RCU issues, but manifests due to scheduler behavior (especially   
> > NULL scheduler, especially on ARM).
> > 
> > Patches that should solve the issue for ARM posted already. They
> > will need to be slightly adjusted to cover x86 as well.
> 
> As of last update from Dario 29.1.21:
> waiting for test report from submitter.
> 
For this, I made progress toward making an actual patch that works for
both ARM and x86, but I've been sidetracked by a number of things, and
have not finished it.

The ARM-only fix has been tested successfully and would be ready
already. The full solution may not be ready in time for 4.15.

So, I'd say we can either merge the ARM part (ARM is where the issue
manifests most of the times and more severely) or wait for a full
solution during 4.16 development, which we will then backport.

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: xen-sched-suspect-debug.patch
Description: Text Data

Attachment: xen-sched-suspect-debug_4.13.2.patch
Description: Text Data

Attachment: signature.asc
Description: This is a digitally signed message part


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.