[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Recent upgrade of 4.13 -> 4.14 issue

  • To: Juergen Gross <JGross@xxxxxxxx>, "George.Dunlap@xxxxxxxxxx" <George.Dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Dario Faggioli <dfaggioli@xxxxxxxx>
  • Date: Mon, 26 Oct 2020 16:31:01 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aB/GevxpENHG7YFJITbPgqvDU4lP4uZ9P5yFjCmyIns=; b=K+oM/aS0eBVfDYpbwBdUzVGeNmNXzPOdxYpi5hMvziF+01dl8q9zGveL3zTnSshsAZvDUBaqB2qMVe4NWeQeUFEWlZ0XMd3mbuiP8Vf/qrEKKxlPfzzZLqymvwcWjKnSOHHIidwwWpX1wLC6xTm8z4jitL9arI165XIrC69WjM6/ZuoclB7t6vD6PDiFiSv3SRuwPK4DXRJxOBjDJoMYgvSPjxP7u+ah2uAfwK+fii++UpuFri2MTKkmHzetzIlSHCdv6jYagEVzCXqcXeeUEmcwhFTWfHvflNNmzWNR5KzdJWDwZN38/AA529B/cnTseXiYKGCvxLL/PFHb7HiRlg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NQCfcUp4U/H9cPg9+Ij41KFMbNdMqdDj6vJIX0iNgdrUYcmvy2EA39i1qhdyjOH53Eh823JYaotV9d+zNttsYETZH52TISqe+YnzoRQM7dsMBd8VewfIP4IaBe+B6bQ3ZP+wUq+2rtNDeXbBll96YA2hlrqCfJ32SNNuZ9+0bffo6lLvr/qEylP5faL0Vq9D0idadJlatOBeFr/dEObTuCHKIe4Mll8I2o7Dsa33pzNQJ/biC7bp8jT2sQCAoTCfAw2Fw3m24ZMtJT21bFQziZyUBuR73Pj78sVSW9Ak3/1nASXE8frG2YDoKghRnb+r5ljLm9oPVbtxpwxmw11bPw==
  • Authentication-results: suse.com; dkim=none (message not signed) header.d=none;suse.com; dmarc=none action=none header.from=suse.com;
  • Cc: "marmarek@xxxxxxxxxxxxxxxxxxxxxx" <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, "frederic.pierret@xxxxxxxxxxxx" <frederic.pierret@xxxxxxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>
  • Delivery-date: Mon, 26 Oct 2020 16:31:19 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHWq6SD/uyafOnyXU2hV/eab0QUF6mqE3EA
  • Thread-topic: Recent upgrade of 4.13 -> 4.14 issue

On Mon, 2020-10-26 at 15:30 +0100, Jürgen Groß wrote:
> On 26.10.20 14:54, Andrew Cooper wrote:
> > On 26/10/2020 13:37, Frédéric Pierret wrote:
> > > 
> > > If anyone would have any idea of what's going on, that would be
> > > very
> > > appreciated. Thank you.
> > 
> > Does booting Xen with `sched=credit` make a difference?
> Hmm, I think I have spotted a problem in credit2 which could explain
> the
> hang:
> csched2_unit_wake() will NOT put the sched unit on a runqueue in case
> it
> has CSFLAG_scheduled set. This bit will be reset only in
> csched2_context_saved().
Exactly, it does not put it back there. However, if it finds a vCPU
with the CSFLAG_scheduled flag set, It should set
CSFLAG_delayed_runq_add flag.

Unless curr_on_cpu(cpu)==unit or unit_on_runq(svc)==true... which
should not be the case. Or where you saying that we actually are in one
of this situations?

In fact...

> So in case a vcpu (and its unit, of course) is blocked and there has
> been no other vcpu active on its physical cpu but the idle vcpu,
> there
> will be no call of csched2_context_saved(). This will block the vcpu
> to become active in theory for eternity, in case there is no need to
> run another vcpu on the physical cpu.
...I maybe am not seeing what exact situation and sequence of events
you're exactly thinking to. What I see is this: [*]

- vCPU V is running, i.e., CSFLAG_scheduled is set
- vCPU V blocks
- we enter schedule()
  - schedule calls do_schedule() --> csched2_schedule()
    - we pick idle, so CSFLAG_delayed_runq_add is set for V
  - schedule calls sched_context_switch()
    - sched_context_switch() calls context_switch()
      - context_switch() calls sched_context_switched()
        - sched_context_switched() calls:
          - vcpu_context_saved()
          - unit_context_saved()
            - unit_context_saved() calls sched_context_saved() -->    
              - csched2_context_saved():
                - clears CSFLAG_scheduled
                - checks (and clear) CSFLAG_delayed_runq_add

[*] this assumes granularity 1, i.e., no core-scheduling and no 
    rendezvous. Or was core-scheduling actually enabled?

And if CSFLAG_delayed_runq_add is set **and** the vCPU is runnable, the
task is added back to the runqueue.

So, even if we don't do the actual context switch (i.e., we don't call
__context_switch() ) if the next vCPU that we pick when vCPU V blocks
is the idle one, it looks to me that we go get to call

And it also looks to me that, when we get to that, if the vCPU is
runnable, even if it has the CSFLAG_scheduled still set, we do put it
back to the runqueue.

And if the vCPU blocked, but csched2_unit_wake() run while
CSFLAG_scheduled was still set, it indeed should mean that the vCPU
itself will be runnable again when we get to csched2_context_saved().

Or did you have something completely different in mind, and I'm missing

Dario Faggioli, Ph.D
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.