[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v1 3/5] xen: sched: null: deassign offline vcpus from pcpus


  • To: "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Dario Faggioli <dfaggioli@xxxxxxxx>
  • Date: Wed, 17 Jul 2019 18:39:07 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=suse.com;dmarc=pass action=none header.from=suse.com;dkim=pass header.d=suse.com;arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=cWveNn9+qCvKUuizUHQE0XYcZ9Hh6FHGcf6KF5suvcI=; b=QkkWH4vWbOnkPdlwwlLgQdYVdJCjmAuu/6v5+thK/gr1HAlOUYyVIY1+5LWxc3gc8yZLgpHQLg6DIw//bHzMjwhFOKEBq6+mf/P3h9WThQFqxTqBa606fa7lB3VV9lIG0KdvvdUX10zIi1qZtAxgAAkOkH4xXvdL9WD2lK08RxJjTWsj4vSlTkLKAS0RiSiq7uLFkcYspMa8cL+jQ+by8+JasLsQBJow1Lw9mzh2vgMRmODSmpA+ihK7MMJkkezFI+ZZM0ZBg3e5EM6QzLVpVWCxxzAv2grYjBl4zwap8rk+bfR2zQx5xmFe/uPTDyOL03KbZBoRQ64sd5sbjeFo9g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=b6TGlc/eFiecVgLxoZ0ATGGOX8sxTzwjBjHasm18ZLkpHd0PcRbQmN7M1W0JoVKaDlBmLdicMpNfdqQWQwLJ0Faeqga7HGj9j6wHw3fDGLCqKFnbo2UsUc+atJ7qxkZnBFH316KAIFhVEvpq/sNtG8aPvJsadrsUfrEeXk/cD98OvWs7OWGPAuEx7F9vHnLC8qXm905nFGP8ONK9yHHnHI8rr5WgMsC5vIreZEh0a9CazcJP2lr+3Dq4PJOxCG8w+NOd6HyV+PJfitfQEg+8NeFf6pxo33uojomWpfvJPd/SIkc9VzeObEJUJWap1p32aCyLBKtqxf0eff+Lx00Dfw==
  • Authentication-results: spf=none (sender IP is ) smtp.mailfrom=dfaggioli@xxxxxxxx;
  • Cc: "george.dunlap@xxxxxxxxxxxxx" <george.dunlap@xxxxxxxxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>
  • Delivery-date: Wed, 17 Jul 2019 18:40:56 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHVPLldBnQTmgAVAEKMCBR/SPQnEabPJF+A
  • Thread-topic: [Xen-devel] [PATCH v1 3/5] xen: sched: null: deassign offline vcpus from pcpus

On Wed, 2019-07-17 at 17:04 +0100, George Dunlap wrote:
> On 8/25/18 1:21 AM, Dario Faggioli wrote:
> > If a pCPU has been/is being offlined, we don't want it to be
> > neither
> > assigned to any pCPU, nor in the wait list.
> > 
> > Therefore, when we detect that a vcpu is going offline, remove it
> > from
> > both places.
> 
> Hmm, this commit message wasn't very informative.
> 
Ok, we can certainly improve that. :-)

> It looks like what you really mean to do is:
> 

> > @@ -518,6 +521,14 @@ static void null_vcpu_remove(const struct
> > scheduler *ops, struct vcpu *v)
> >  
> >      lock = vcpu_schedule_lock_irq(v);
> >  
> > +    /* If offline, the vcpu shouldn't be assigned, nor in the
> > waitqueue */
> > +    if ( unlikely(!is_vcpu_online(v)) )
> > +    {
> > +        ASSERT(per_cpu(npc, v->processor).vcpu != v);
> > +        ASSERT(list_empty(&nvc->waitq_elem));
> > +        goto out;
> > +    }
> 
> * Handle the case of an offline vcpu being removed (ASSERTing that
> it's
> neither on a processor nor on the waitqueue)
> 
So, IIRC (sorry, it's been a while :-D ), this is for dealing with
remove_vcpu() being called on a vcpu which is offline. So, yes,
basically what you said. :-)

Point is the work of removing such vCPU from any CPU and from the wait
list has been done already, in null_vcpu_sleep(), while the vCPU was
going offline. So, here, we only need to make sure that we don't do
anything, i.e., that we don't call _vcpu_remove().

> But wait, isn't this fixing a important regression in patch 2?  If
> after
> patch 2 but before patch 3, a VM is created with offline vcpus, and
> then
> destroyed, won't the offline vcpus reach here neither on the waitlist
> nor on a vcpu?
> 
I'm not sure I understand the point you're trying to make here, sorry.

In general, considering what we've said in other mails, if you think
that patch 2 and 3 should be merged, we can do that.

My reasoning, when putting together the series, was the one I already
stated: this is broken already, so no big deal breaking it "more", and
I continue to see it that way.

But I appreciate you seeing it differently, while I don't have a too
strong opinion, so I'd be fine merging the patches (or doing other
series rearrangements, if you feel strongly that they're necessary).

Or is it something completely different that you meant?

> > @@ -567,11 +578,31 @@ static void null_vcpu_wake(const struct
> > scheduler *ops, struct vcpu *v)
> >  
> >  static void null_vcpu_sleep(const struct scheduler *ops, struct
> > vcpu *v)
> >  {
> > +    struct null_private *prv = null_priv(ops);
> > +    unsigned int cpu = v->processor;
> > +    bool tickled = false;
> > +
> >      ASSERT(!is_idle_vcpu(v));
> >  
> > +    /* We need to special case the handling of the vcpu being
> > offlined */
> > +    if ( unlikely(!is_vcpu_online(v)) )
> > +    {
> > +        struct null_vcpu *nvc = null_vcpu(v);
> > +
> > +        printk("YYY d%dv%d going down?\n", v->domain->domain_id,
> > v->vcpu_id);
> > +        if ( unlikely(!list_empty(&nvc->waitq_elem)) )
> > +        {
> > +            spin_lock(&prv->waitq_lock);
> > +            list_del_init(&nvc->waitq_elem);
> > +            spin_unlock(&prv->waitq_lock);
> > +        }
> > +        else if ( per_cpu(npc, cpu).vcpu == v )
> > +            tickled = vcpu_deassign(prv, v);
> > +    }
> 
> * Handle the unexpected(?) case of a vcpu being put to sleep as(?)
> it's
> offlined
> 
Well, that printk, really shouldn't be there! :-(

> If it's not unexpected, then why the printk?
> 
Because it was a debugging aid, for while I was working on the series,
and I apparently failed at killing it before sending.

Sorry. :-(

> And if it is unexpected, what is the expected path for a cpu going
> offline to be de-assigned from a vcpu (which is what the title seems
> to
> imply this patch is about)?
> 
This is the vCPU going down, when do_vcpu_op(VCPU_down) is invoked on
it, which then calls vcpu_sleep_nosync() with _VPF_down set in
pause_flags (which means vcpu_is_online() would be false.

So it is indeed the _expected_ path, and sorry again if the stray
debugging printk made you think otherwise.

> > @@ -615,12 +646,12 @@ static void null_vcpu_migrate(const struct
> > scheduler *ops, struct vcpu *v,
> >       *
> >       * In the latter, there is just nothing to do.
> >       */
> > -    if ( likely(list_empty(&nvc->waitq_elem)) )
> > +    if ( likely(per_cpu(npc, v->processor).vcpu == v) )
> >      {
> >          vcpu_deassign(prv, v);
> >          SCHED_STAT_CRANK(migrate_running);
> >      }
> > -    else
> > +    else if ( !list_empty(&nvc->waitq_elem) )
> >          SCHED_STAT_CRANK(migrate_on_runq);
> 
> * Teach null_vcpu_migrate() that !on_waitqueue != on_vcpu.
> 
> It looks like the comment just above this hunk is now out of date:
> 
> "v is either assigned to a pCPU, or in the waitqueue."
> 
Yes it does.

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.