[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 4/6] Pause/Unpause the domain before/after assigning PI hooks




> -----Original Message-----
> From: Wu, Feng
> Sent: Monday, September 5, 2016 11:12 AM
> To: Jan Beulich <JBeulich@xxxxxxxx>
> Cc: andrew.cooper3@xxxxxxxxxx; dario.faggioli@xxxxxxxxxx;
> george.dunlap@xxxxxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>; xen-
> devel@xxxxxxxxxxxxx; Wu, Feng <feng.wu@xxxxxxxxx>
> Subject: RE: [PATCH v3 4/6] Pause/Unpause the domain before/after assigning
> PI hooks
> 
> 
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > Sent: Friday, September 2, 2016 9:55 PM
> > To: Wu, Feng <feng.wu@xxxxxxxxx>
> > Cc: andrew.cooper3@xxxxxxxxxx; dario.faggioli@xxxxxxxxxx;
> > george.dunlap@xxxxxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>; xen-
> > devel@xxxxxxxxxxxxx
> > Subject: RE: [PATCH v3 4/6] Pause/Unpause the domain before/after assigning
> > PI hooks
> >
> > >>> On 02.09.16 at 15:15, <feng.wu@xxxxxxxxx> wrote:
> >
> > >
> > >> -----Original Message-----
> > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > >> Sent: Friday, September 2, 2016 6:46 PM
> > >> To: Wu, Feng <feng.wu@xxxxxxxxx>
> > >> Cc: andrew.cooper3@xxxxxxxxxx; dario.faggioli@xxxxxxxxxx;
> > >> george.dunlap@xxxxxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>; xen-
> > >> devel@xxxxxxxxxxxxx
> > >> Subject: RE: [PATCH v3 4/6] Pause/Unpause the domain before/after
> > assigning
> > >> PI hooks
> > >>
> > >> >>> On 02.09.16 at 12:30, <feng.wu@xxxxxxxxx> wrote:
> > >>
> > >> >
> > >> >> -----Original Message-----
> > >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > >> >> Sent: Friday, September 2, 2016 5:26 PM
> > >> >> To: Wu, Feng <feng.wu@xxxxxxxxx>
> > >> >> Cc: andrew.cooper3@xxxxxxxxxx; dario.faggioli@xxxxxxxxxx;
> > >> >> george.dunlap@xxxxxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>; xen-
> > >> >> devel@xxxxxxxxxxxxx
> > >> >> Subject: RE: [PATCH v3 4/6] Pause/Unpause the domain before/after
> > >> assigning
> > >> >> PI hooks
> > >> >>
> > >> >> >>> On 02.09.16 at 10:40, <feng.wu@xxxxxxxxx> wrote:
> > >> >>
> > >> >> >
> > >> >> >> -----Original Message-----
> > >> >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > >> >> >> Sent: Friday, September 2, 2016 4:16 PM
> > >> >> >> To: Wu, Feng <feng.wu@xxxxxxxxx>
> > >> >> >> Cc: andrew.cooper3@xxxxxxxxxx; dario.faggioli@xxxxxxxxxx;
> > >> >> >> george.dunlap@xxxxxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>;
> > xen-
> > >> >> >> devel@xxxxxxxxxxxxx
> > >> >> >> Subject: RE: [PATCH v3 4/6] Pause/Unpause the domain before/after
> > >> >> assigning
> > >> >> >> PI hooks
> > >> >> >>
> > >> >> >> >>> On 02.09.16 at 09:31, <feng.wu@xxxxxxxxx> wrote:
> > >> >> >>
> > >> >> >> >
> > >> >> >> >> -----Original Message-----
> > >> >> >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > >> >> >> >> Sent: Friday, September 2, 2016 3:04 PM
> > >> >> >> >> To: Wu, Feng <feng.wu@xxxxxxxxx>
> > >> >> >> >> Cc: andrew.cooper3@xxxxxxxxxx; dario.faggioli@xxxxxxxxxx;
> > >> >> >> >> george.dunlap@xxxxxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>;
> > >> xen-
> > >> >> >> >> devel@xxxxxxxxxxxxx
> > >> >> >> >> Subject: RE: [PATCH v3 4/6] Pause/Unpause the domain
> > before/after
> > >> >> >> assigning
> > >> >> >> >> PI hooks
> > >> >> >> >>
> > >> >> >> >> >>> On 02.09.16 at 03:46, <feng.wu@xxxxxxxxx> wrote:
> > >> >> >> >>
> > >> >> >> >> >
> > >> >> >> >> >> -----Original Message-----
> > >> >> >> >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > >> >> >> >> >> Sent: Thursday, September 1, 2016 4:30 PM
> > >> >> >> >> >> To: Wu, Feng <feng.wu@xxxxxxxxx>
> > >> >> >> >> >> Cc: andrew.cooper3@xxxxxxxxxx; dario.faggioli@xxxxxxxxxx;
> > >> >> >> >> >> george.dunlap@xxxxxxxxxxxxx; Tian, Kevin
> > <kevin.tian@xxxxxxxxx>;
> > >> >> xen-
> > >> >> >> >> >> devel@xxxxxxxxxxxxx
> > >> >> >> >> >> Subject: Re: [PATCH v3 4/6] Pause/Unpause the domain
> > >> before/after
> > >> >> >> >> assigning
> > >> >> >> >> >> PI hooks
> > >> >> >> >> >>
> > >> >> >> >> >> >>> On 31.08.16 at 05:56, <feng.wu@xxxxxxxxx> wrote:
> > >> >> >> >> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > >> >> >> >> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > >> >> >> >> >> > @@ -219,8 +219,19 @@ void vmx_pi_hooks_assign(struct
> > domain
> > >> *d)
> > >> >> >> >> >> >
> > >> >> >> >> >> >      ASSERT(!d->arch.hvm_domain.vmx.vcpu_block);
> > >> >> >> >> >> >
> > >> >> >> >> >> > +    /*
> > >> >> >> >> >> > +     * Pausing the domain can make sure the vCPU is not
> > >> >> >> >> >> > +     * running and hence calling the hooks simultaneously
> > >> >> >> >> >> > +     * when deassigning the PI hooks. This makes sure that
> > >> >> >> >> >> > +     * all the appropriate state of PI descriptor is 
> > >> >> >> >> >> > actually
> > >> >> >> >> >> > +     * set up for all vCPus before leaving this function.
> > >> >> >> >> >> > +     */
> > >> >> >> >> >> > +    domain_pause(d);
> > >> >> >> >> >> > +
> > >> >> >> >> >> >      d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
> > >> >> >> >> >> >      d->arch.hvm_domain.vmx.pi_do_resume =
> > vmx_pi_do_resume;
> > >> >> >> >> >> > +
> > >> >> >> >> >> > +    domain_unpause(d);
> > >> >> >> >> >> >  }
> > >> >> >> >> >>
> > >> >> >> >> >> First of all I'm missing a word on whether the race 
> > >> >> >> >> >> mentioned in
> > >> >> >> >> >> the description and comment can actually happen. Device
> > >> >> >> >> >> (de)assignment should already be pretty much serialized (via
> > >> >> >> >> >> the domctl lock, and maybe also via the pcidevs one).
> > >> >> >> >> >
> > >> >> >> >> > The purpose of this patch is to address the race condition 
> > >> >> >> >> > that
> > >> >> >> >> > the _vCPU_ is running while we are installing these hooks. Do
> you
> > >> >> >> >> > think this cannot happen?  This patch is trying to fix the 
> > >> >> >> >> > issue
> > >> >> >> >> > described at:
> > >> >> >> >> > http://www.gossamer-threads.com/lists/xen/devel/433229
> > >> >> >> >> > Consider that the other two hooks were installed when the VM
> > >> >> >> >> > is created, seems no such race condition. However, according
> > >> >> >> >> > to the discussion about patch 1 and patch 2 of series, we need
> > >> >> >> >> > to install the other two hooks here as well,
> > >> >> >> >>
> > >> >> >> >> I don't think we've agreed that the creation time installation 
> > >> >> >> >> of
> > >> >> >> >> those hooks is actually necessary. In fact your most recent
> > >> >> >> >> response to patch 1 makes me think you now agree we don't
> > >> >> >> >> need to do so. And hence with that precondition not holding
> > >> >> >> >> anymore, I don't think the conclusion does.
> > >> >> >> >
> > >> >> >> > I think there might be some confusion. Let me explain what I
> > >> >> >> > am think of to make sure we are on the same page:
> > >> >> >> > 1. We need install all the four hooks when the first device is
> > >> >> >> > assigned.
> > >> >> >> > 2. If _1_ is true, the issue described in
> > >> >> >> > http://www.gossamer-threads.com/lists/xen/devel/433229
> > >> >> >> > exists.
> > >> >> >>
> > >> >> >> If you mean this
> > >> >> >>
> > >> >> >> * vcpu 0 starts running on a pcpu
> > >> >> >> * a device is assigned, causing the hooks to be set
> > >> >> >> * an interrupt from the device is routed to vcpu 0, but it is not
> > >> >> >> actually delivered properly, since ndst is not pointing to the 
> > >> >> >> right
> > >> >> >> processor.
> > >> >> >>
> > >> >> >> raised by George, then I'm not convinced it can happen (after all, 
> > >> >> >> the
> > >> >> >> hooks get set _before_ the device gets assigned, and hence before
> > >> >> >> the device can raise an interrupt destined at the guest). And if 
> > >> >> >> it can
> > >> >> >> happen, then rather than pausing the guest I don't see why, along
> > >> >> >> with setting the hooks, any possibly affected NDST field can't be
> > >> >> >> programmed correctly. ISTR having recommended something like
> > >> >> >> this already during review of the series originally introducing PI.
> > >> >> >
> > >> >> > Actually here is the scenario I am concerned about:
> > >> >> > 1. ' vmx_vcpu_block ' is installed while vCPU is running 
> > >> >> > vcpu_block()
> > >> >> > and then vmx_vcpu_block().
> > >> >> > 2. There is a ASSERT() about 'NDST' field in vmx_vcpu_block(), I 
> > >> >> > think
> > >> >> > we may hit the ASSERT() since 'NDST' may not have been set to the
> > >> >> > current processor yet.
> > >> >> >
> > >> >> > My previous solution in v2 is to delete that ASSERT(), but seems you
> > >> >> > guys don't like it. So here I use this new method in v3 to make sure
> > >> >> > the vCPU is running while we are installing the hooks.
> > >> >>
> > >> >> Indeed, deleting the assertion doesn't seem right. But then why
> > >> >> can't vmx_vcpu_block() bail early when the domain has no devices
> > >> >> assigned? That would allow for
> > >> >>
> > >> >> 1) set blocking hook
> > >> >> 2) set up PI state
> > >> >> 3) actually assign device
> > >> >
> > >> > Do you mean we check has_arch_pdev() in vmx_vcpu_block(), if it is
> > >> > false, we return early? But has_arch_pdev() needs hold
> > >> > _pcidevs_lock, right?
> > >>
> > >> I don't think you strictly need to: list_empty() will reliably return
> > >> true in the case of interest here. And possible races when the last
> > >> device gets removed are - afaics - benign (i.e. it doesn't matter
> > >> what it returns at that point in time).
> > >
> > > I remind of another case:
> > > 1. The four hooks are installed.
> > > 2. vmx_vcpu_block() gets called before other three hooks gets called,
> > > even if a device has already been assigned to the domain. We may still
> > > hit the ASSERT() in vmx_vcpu_block() since 'NDST' field is changed in
> > > the other hooks.
> >
> > I don't understand: Step 2) in what I've outline above would make
> > sure NDST is set correctly. Perhaps one should even reverse 2) and
> > 1).
> 
> I still have trouble to fully understand this, please see the following
> scenario:
> 
> 1) Set 'NDST' to the pCPU0 (which vCPU is currently running on, but
> it may changes since vCPU scheduling happens behind us, so how to
> decide which pCPU for 'NDST'?)
> 2) Install all four hooks and vCPU is re-scheduled before
> 'vmx_pi_switch_to()' gets installed, so the pCPU of the vCPU might
> be changed to pCPU1 without 'NDST' gets updated.
> 4) vmx_vcpu_block() gets called and we hit the ASSERT()
> 
> Maybe I missed something, It would be appreciated if you can
> correct me if my understanding is wrong.

My email system had some problems, hence some emails didn't go
to my index, but I found your replay from the Internet as below:

" Well, the problem is that you keep trying to find issues with the
(simplified) sequence of things I'm proposing to investigate as
an alternative. The above scenario _again_ can be dealt with
without having to pause the guest: Install the context switch
hook(s) first, then set NDST, then set the block hook. So what
I'd really like to ask you to do is first try to find a model without
pausing, and only if you figure that's impossible, then explain
why and we'll go with the pausing approach.

Jan"

Then I tried to implement the function like the following:

/* This function is called when pcidevs_lock is held */
void vmx_pi_hooks_assign(struct domain *d)
{
    if ( !iommu_intpost || !has_hvm_container_domain(d) )
        return;

    ASSERT(!d->arch.hvm_domain.vmx.vcpu_block);

    /*
     * We carefully handle the timing here:
     * - Install the context switch first
     * - Then set the NDST field
     * - Install the block and resume hooks in the end
     *
     * This can make sure the PI (especially the NDST feild) is
     * in proper state when we call vmx_vcpu_block().
     */
    d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
    d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;

    for_each_vcpu ( d, v )
    {
        unsigned int dest = cpu_physical_id(v->processor);
        struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;

        /* spot 1 */

        write_atomic(&pi_desc->ndst,
                     x2apic_enabled ? dest : MASK_INSR(dest, 
PI_xAPIC_NDST_MASK));
    }

    d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
    d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
}

However, I still think it is problematic. Consider the _spot 1_ above, we get
the pCPU of the vCPU and update the NDST, but the vCPU can be happily
rescheduled to another pCPU before updating the NDST. The problem is
that it is not safe to update NDST here, since vCPU can be scheduled
behind us at any time. And if this is the case, it is hard to safely do this
without guarantee the vCPU is in a known state. Any further advice
on this? Thanks a lot!

Thanks,
Feng

> 
> Thanks,
> Feng
> 
> >
> > > And that is another reason I use pause/unpause here, it can address
> > > all the races. And this is an one-time action (Only occurs at the first
> > > device gets assigned), do you think it is that unacceptable?
> >
> > No, I've never said it's unacceptable. I just want such to not be
> > added without good reason.
> >
> > Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.