[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/hvm: don't rely on shared ioreq state for completion handling



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: 11 August 2015 17:03
> To: Paul Durrant
> Cc: Andrew Cooper; xen-devel@xxxxxxxxxxxxxxxxxxxx; Keir (Xen.org)
> Subject: RE: [PATCH] x86/hvm: don't rely on shared ioreq state for
> completion handling
> 
> >>> On 11.08.15 at 17:49, <Paul.Durrant@xxxxxxxxxx> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> Sent: 11 August 2015 16:46
> >> To: Paul Durrant
> >> Cc: Andrew Cooper; xen-devel@xxxxxxxxxxxxxxxxxxxx; Keir (Xen.org)
> >> Subject: RE: [PATCH] x86/hvm: don't rely on shared ioreq state for
> >> completion handling
> >>
> >> >>> On 11.08.15 at 17:32, <Paul.Durrant@xxxxxxxxxx> wrote:
> >> >>  -----Original Message-----
> >> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> >> Sent: 11 August 2015 16:20
> >> >> To: Paul Durrant
> >> >> Cc: Andrew Cooper; xen-devel@xxxxxxxxxxxxxxxxxxxx; Keir (Xen.org)
> >> >> Subject: Re: [PATCH] x86/hvm: don't rely on shared ioreq state for
> >> >> completion handling
> >> >>
> >> >> >>> On 31.07.15 at 17:34, <paul.durrant@xxxxxxxxxx> wrote:
> >> >> > Both hvm_io_pending() and hvm_wait_for_io() use the shared (with
> >> >> emulator)
> >> >> > ioreq structure to determined whether there is a pending I/O. The
> latter
> >> >> > will
> >> >> > misbehave if the shared state is driven to STATE_IOREQ_NONE by
> the
> >> >> emulator,
> >> >> > or when the shared ioreq page is cleared for re-insertion into the
> guest
> >> >> > P2M when the ioreq server is disabled (STATE_IOREQ_NONE == 0)
> >> because
> >> >> it
> >> >> > will terminate its wait without calling hvm_io_assist() to adjust 
> >> >> > Xen's
> >> >> > internal I/O emulation state. This may then lead to an io completion
> >> >> > handler finding incorrect internal emulation state and calling
> >> >> > domain_crash().
> >> >> >
> >> >> > This patch fixes the problem by adding a pending flag to the ioreq
> >> server's
> >> >> > per-vcpu structure which cannot be directly manipulated by the
> >> emulator
> >> >> > and thus can be used to determine whether an I/O is actually
> pending
> >> for
> >> >> > that vcpu on that ioreq server. If an I/O is pending and the shared
> state
> >> >> > is seen to go to STATE_IOREQ_NONE then it can be treated as an
> >> abnormal
> >> >> > completion of emulation (hence the data placed in the shared
> structure
> >> >> > is not used) and the internal state is adjusted as for a normal
> >> completion.
> >> >> > Thus, when a completion handler subsequently runs, the internal
> state
> >> is as
> >> >> > expected and domain_crash() will not be called.
> >> >> >
> >> >> > Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
> >> >> > Reported-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
> >> >> > Tested-by: Roger Pau Monnà <roger.pau@xxxxxxxxxx>
> >> >>
> >> >> I realize this went in already, but ...
> >> >>
> >> >> > --- a/xen/arch/x86/hvm/hvm.c
> >> >> > +++ b/xen/arch/x86/hvm/hvm.c
> >> >> > @@ -412,44 +412,57 @@ bool_t hvm_io_pending(struct vcpu *v)
> >> >> >                            &d->arch.hvm_domain.ioreq_server.list,
> >> >> >                            list_entry )
> >> >> >      {
> >> >> > -        ioreq_t *p = get_ioreq(s, v);
> >> >> > +        struct hvm_ioreq_vcpu *sv;
> >> >> >
> >> >> > -        if ( p->state != STATE_IOREQ_NONE )
> >> >> > -            return 1;
> >> >> > +        list_for_each_entry ( sv,
> >> >> > +                              &s->ioreq_vcpu_list,
> >> >> > +                              list_entry )
> >> >> > +        {
> >> >> > +            if ( sv->vcpu == v && sv->pending )
> >> >> > +                return 1;
> >> >> > +        }
> >> >>
> >> >> ... while from the review of the original series I recall that doing the
> >> >> outer loop without any lock is fine (due to using domain_pause()
> >> >> when registering servers) I'm not convinced this extends to the
> >> >> inner loop. Can you clarify please? (There are a couple more such
> >> >> loops that I can't immediately see being protected by a lock.)
> >> >
> >> > Yes, I think you are right. If a cpu were to disappear then the list walk
> >> > would be compromised. It should either be locked or rcu in all places.
> >>
> >> I don't think we need to be concerned of vCPU-s disappearing,
> >> since that doesn't happen during the lifetime of a VM. And the
> >> hvm_do_resume() path is used only for domains still alive. Of
> >> course, if any of the lockless loops sit on paths reachable after
> >> a domain got marked dying, that would need fixing.
> >>
> >> What I'm more concerned about are list manipulations behind
> >> the back of a list traversing CPU. Or do those happen only upon
> >> vCPU creation/destruction?
> >
> > Theoretically we could do vcpu hot remove, couldn't we? That's the case I
> > was thinking of.
> 
> But that only removes the vCPU from what the guest sees. The
> hypervisor never de-allocates a vCPU prior to domain death.
> 

Oh, in that case I think the loops are safe.

  Paul

> Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.