[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted interrupts




> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Thursday, September 10, 2015 4:28 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Dario Faggioli; George Dunlap; Tian, Kevin;
> xen-devel@xxxxxxxxxxxxx; Keir Fraser
> Subject: RE: [PATCH v6 16/18] vmx: Add some scheduler hooks for VT-d posted
> interrupts
> 
> >>> On 10.09.15 at 04:07, <feng.wu@xxxxxxxxx> wrote:
> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> Sent: Wednesday, September 09, 2015 6:27 PM
> >> >>> On 09.09.15 at 10:56, <feng.wu@xxxxxxxxx> wrote:
> >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> >> Sent: Monday, September 07, 2015 8:55 PM
> >> >> >>> On 25.08.15 at 03:57, <feng.wu@xxxxxxxxx> wrote:
> >> >> > +
> >> >> > +        /*
> >> >> > +         * Delete the vCPU from the related block list
> >> >> > +         * if we are resuming from blocked state.
> >> >> > +         */
> >> >> > +        if ( v->arch.hvm_vmx.pi_block_cpu != -1 )
> >> >> > +        {
> >> >> > +            spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> >> >> > +                              v->arch.hvm_vmx.pi_block_cpu),
> >> >> flags);
> >> >> > +            list_del_init(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> >> >>
> >> >> Shouldn't you set .pi_block_cpu back to -1 along with removing
> >> >> the vCPU from the list? Which then ought to allow using just
> >> >> list_del() here?
> >> >
> >> > Here is the story about using list_del_init() instead of list_del(), (the
> >> > same reason for using list_del_init() in pi_wakeup_interrupt() ).
> >> > If we use list_del(), that means we need to set .pi_block_cpu to
> >> > -1 after removing the vCPU from the block list, there are two
> >> > places where the vCPU is removed from the list:
> >> > - arch_vcpu_wake_prepare()
> >> > - pi_wakeup_interrupt()
> >> >
> >> > That means we also need to set .pi_block_cpu to -1 in
> >> > pi_wakeup_interrupt(), which seems problematic to me, since
> >> > .pi_block_cpu is used to get the per-CPU spin lock, it is not safe
> >> > to change it in pi_wakeup_interrupt(), maybe someone is using
> >> > it at the same time.
> >> >
> >> > And the check "if ( v->arch.hvm_vmx.pi_block_cpu != -1 )" here
> >> > is only used when .pi_block_cpu is set to -1 at the initial time.
> >> >
> >> > If you have any good suggestions about this, I will be all ears.
> >>
> >> Latching pi_block_cpu into a local variable would take care of that
> >> part of the problem. Of course you then may also need to check
> >> that while waiting to acquire the lock pi_block_cpu didn't change.
> >
> > Thanks for the suggestion! But I think it is not easy to "check
> > "that while waiting to acquire the lock pi_block_cpu didn't change".
> > Let's take the following pseudo code as an example:
> >
> > int local_pi_block_cpu = v->arch.hvm_vmx.pi_block_cpu;
> >
> > ......
> >
> > spin_lock_irqsave(&per_cpu(pi_blocked_vcpu_lock,
> >                local_pi_block_cpu), flags);
> > list_del(&v->arch.hvm_vmx.pi_blocked_vcpu_list);
> > spin_unlock_irqrestore(&per_cpu(pi_blocked_vcpu_lock,
> >                    local_pi_block_cpu), flags);
> >
> > If .pi_block_cpu is changed to -1 by others during calling list_del()
> > above, that means the vCPU is removed by others, then calling
> > list_del() here again would be problematic. I think there might be
> > other corner cases for this. So I suggest adding some comments
> > for list_del_init() as you mentioned below.
> 
> That's why I said "check that while waiting to acquire the lock
> pi_block_cpu didn't change." The code you present does not do
> this.

I didn't see we need implement it as the above code, I just list it
here the show it is hard to do that. 
First, how to check it while waiting to acquire the lock .pi_block_cpu
didn't change?
Secondly, even if we can check it, what should we do if .pi_block_cpu
is changed after acquiring the lock as I mentioned above?

> 
> >> >> > +        do {
> >> >> > +            old.control = new.control = pi_desc->control;
> >> >> > +
> >> >> > +            /* Should not block the vCPU if an interrupt was posted
> for
> >> it.
> >> >> */
> >> >> > +            if ( pi_test_on(&old) )
> >> >> > +            {
> >> >> > +
> spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock,
> >> >> gflags);
> >> >> > +                vcpu_unblock(v);
> >> >>
> >> >> Calling vcpu_unblock() in the middle of context_switch()? Why? And
> >> >> is this safe?
> >> >
> >> > I cannot see anything unsafe so far, can some scheduler maintainer
> >> > help to confirm it? Dario? George?
> >>
> >> But first and foremost you ought to answer the "why".
> >
> > I think if you think this solution is unsafe, you need point out where it
> > is, not
> > just ask "is this safe ?", I don't think this is an effective comments.
> 
> It is my general understanding that people wanting code to be
> included have to prove their code is okay, not reviewers to prove
> the code is broken.
> 
> > Anyway, I go through the main path of the code as below, I really don't find
> > anything unsafe here.
> >
> > context_switch() -->
> >     pi_ctxt_switch_from() -->
> >         vmx_pre_ctx_switch_pi() -->
> >             vcpu_unblock() -->
> >                 vcpu_wake() -->
> >                     SCHED_OP(VCPU2OP(v), wake, v) -->
> >                         csched_vcpu_wake() -->
> >                             __runq_insert()
> >                             __runq_tickle()
> >
> > If you continue to think it is unsafe, pointing out the place will be
> > welcome!
> 
> Once again - I didn't say it's unsafe (and hence can't point at a
> particular place). What bothers me is that vcpu_unblock() affects
> scheduler state, and altering scheduler state from inside the
> context switch path looks problematic at the first glance. I'd be
> happy if I was told there is no such problem, but be aware that
> this would have to include latent ones: Even if right now things
> are safe, having such feedback have the potential of becoming
> an actual problem with a later scheduler change is already an
> issue, as finding the problem is then likely going to be rather hard
> (and I suspect it's not going to be you to debug this).

What I can say is that after investigation, I don't find anything bad
for this vcpu_unblock(), I tested it in my experiment. So that is why
I'd like to ask some ideas from scheduler expects.

> 
> >> >> > +static void vmx_post_ctx_switch_pi(struct vcpu *v)
> >> >> > +{
> >> >> > +    struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> >> >> > +
> >> >> > +    if ( !has_arch_pdevs(v->domain) )
> >> >> > +        return;
> >> >> > +
> >> >> > +    if ( x2apic_enabled )
> >> >> > +        write_atomic(&pi_desc->ndst,
> cpu_physical_id(v->processor));
> >> >> > +    else
> >> >> > +        write_atomic(&pi_desc->ndst,
> >> >> > +                     MASK_INSR(cpu_physical_id(v->processor),
> >> >> > +                     PI_xAPIC_NDST_MASK));
> >> >> > +
> >> >> > +    pi_clear_sn(pi_desc);
> >> >> > +}
> >> >>
> >> >> So you alter where notifications go, but not via which vector? How
> >> >> is the vCPU going to get removed from the blocked list then?
> >> >
> >> > vmx_post_ctx_switch_pi() is called when the vCPU is about to run,
> >> > in that case, the vector has been set properly and it has been removed
> >> > from the block list if it was blocked before.
> >>
> >> So why do you two updates then (first [elsewhere] to set the vector
> >> and then here to set the destination)?
> >
> > When the vCPU is unblocked and then removed from the blocking list, we
> > need to change the vector to the normal notification vector, since the
> > wakeup vector is only used when the vCPU is blocked and hence in the
> > blocked list. And here is the place we can decide which pCPU the vCPU
> > will be scheduled on, so we change the destination field here.
> 
> Right, that's what I understood. But isn't the state things are in
> between these two updates inconsistent? I.e. - where does the
> notification go if one arrives in this window?

Before arriving here, the SN is set, no need to send notification event and
hence no notification at all.

 Particularly - if it went
> to the right place, why would this second update be needed at all?

So we need to read the ndst from the PI descriptor and compare the
cpu_physical_id(v->processor), if they are different, update it? I don't
think this check is needed, since we need at least a read operation,
an if check, then maybe a write operation.

Thanks,
Feng

> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.