[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 for-4.19 2/3] pirq_cleanup_check() leaks



On Mon, 2024-07-01 at 15:21 +0200, Jan Beulich wrote:
> On 01.07.2024 13:13, Roger Pau Monné wrote:
> > On Mon, Jul 01, 2024 at 11:47:34AM +0200, Jan Beulich wrote:
> > > On 01.07.2024 10:55, Roger Pau Monné wrote:
> > > > On Thu, Jul 27, 2023 at 09:38:29AM +0200, Jan Beulich wrote:
> > > > > Its original introduction had two issues: For one the
> > > > > "common" part of
> > > > > the checks (carried out in the macro) was inverted.
> > > > 
> > > > Is the current logic in evtchn_close() really malfunctioning?
> > > 
> > > First: I'm getting the impression that this entire comment
> > > doesn't relate
> > > to the part of the description above, but to the 2nd paragraph
> > > further
> > > down. Otherwise I'm afraid I may not properly understand your
> > > question,
> > > and hence my response below may not make any sense at all.
> > > 
> > > > pirq->evtchn = 0;
> > > > pirq_cleanup_check(pirq, d1); <- cleanup for PV domains
> > > > if ( is_hvm_domain(d1) && domain_pirq_to_irq(d1, pirq->pirq) >
> > > > 0 )
> > > >     unmap_domain_pirq_emuirq(d1, pirq->pirq); <- cleanup for
> > > > HVM domains
> > > > 
> > > > It would seem to me the pirq_cleanup_check() call just after
> > > > setting
> > > > evtchn = 0 was done to account for PV domains, while the second
> > > > (hidden) pirq_cleanup_check() call in
> > > > unmap_domain_pirq_emuirq() would
> > > > do the cleanup for HVM domains.
> > > > 
> > > > Maybe there's something that I'm missing, I have to admit the
> > > > PIRQ
> > > > logic is awfully complicated, even more when we mix the HVM
> > > > PIRQ
> > > > stuff.
> > > 
> > > If you look at pirq_cleanup_check() you'll notice that it takes
> > > care
> > > of one HVM case as well (the not emuirq one, i.e. particularly
> > > PVH,
> > > but note also how physdev_hvm_map_pirq() calls
> > > map_domain_emuirq_pirq()
> > > only conditionally). Plus the crucial aspect of the 2nd paragraph
> > > of
> > > the description is that past calling pirq_cleanup_check() it is
> > > not
> > > really valid anymore to (blindly) de-reference the struct pirq
> > > pointer
> > > we hold in hands. The is_hvm_domain() qualification wasn't
> > > enough,
> > > since - as said - it's only one of the possibilities that would
> > > allow
> > > the pirq to remain legal to use past the call, when having taken
> > > the
> > > function's
> > > 
> > >         if ( pirq->arch.hvm.emuirq != IRQ_UNBOUND )
> > >             return;
> > > 
> > > path. A 2nd would be taking the
> > > 
> > >         if ( !pt_pirq_cleanup_check(&pirq->arch.hvm.dpci) )
> > >             return;
> > > 
> > > path (i.e. a still in use pass-through IRQ), but the 3rd would
> > > still
> > > end in the struct pirq being purged even for HVM.
> > 
> > Right, I was missing that if pirq is properly freed then further
> > usages of it after the pirq_cleanup_check() would be use after
> > free.
> > 
> > > > > And then after
> > > > > removal from the radix tree the structure wasn't scheduled
> > > > > for freeing.
> > > > > (All structures still left in the radix tree would be freed
> > > > > upon domain
> > > > > destruction, though.)
> > > > 
> > > > So if my understanding is correct, we didn't have a leak due to
> > > > the
> > > > missing free_pirq_struct() because the inverted check in
> > > > pirq_cleanup_check() macro prevented the removal from the radix
> > > > tree,
> > > > and so stale entries would be left there and freed at domain
> > > > destruction?
> > > 
> > > That's the understanding I had come to, yes. What I wasn't
> > > entirely
> > > sure about (see the 2nd post-commit-message remark) is why the
> > > entry
> > > being left in the radix tree never caused any problems.
> > > Presumably
> > > that's a result of pirq_get_info() first checking whether an
> > > entry is
> > > already there, allocating a new one only for previously empty
> > > slots.
> > 
> > Yes, I came to the same conclusion, that not freeing wasn't an
> > issue
> > as Xen would re-use the old entry.  Hopefully it's clean enough to
> > not
> > cause issues when re-using.
> > 
> > > > > --- a/xen/common/event_channel.c
> > > > > +++ b/xen/common/event_channel.c
> > > > > @@ -711,9 +711,10 @@ int evtchn_close(struct domain *d1, int
> > > > >              if ( !is_hvm_domain(d1) )
> > > > >                  pirq_guest_unbind(d1, pirq);
> > > > >              pirq->evtchn = 0;
> > > > > -            pirq_cleanup_check(pirq, d1);
> > > > > -            if ( is_hvm_domain(d1) && domain_pirq_to_irq(d1,
> > > > > pirq->pirq) > 0 )
> > > > > -                unmap_domain_pirq_emuirq(d1, pirq->pirq);
> > > > > +            if ( !is_hvm_domain(d1) ||
> > > > > +                 domain_pirq_to_irq(d1, pirq->pirq) <= 0 ||
> > > > > +                 unmap_domain_pirq_emuirq(d1, pirq->pirq) <
> > > > > 0 )
> > > > 
> > > > pirq_cleanup_check() already calls pirq_cleanup_check()
> > > > itself.  Could
> > > > you please add a comment to note that
> > > > unmap_domain_pirq_emuirq()
> > > > succeeding implies the call to pirq_cleanup_check() has already
> > > > been
> > > > done?
> > > > 
> > > > Otherwise the logic here looks unbalanced by skipping the
> > > > pirq_cleanup_check() when unmap_domain_pirq_emuirq() succeeds.
> > > 
> > > Sure, added:
> > > 
> > >                 /*
> > >                  * The successful path of
> > > unmap_domain_pirq_emuirq() will have
> > >                  * called pirq_cleanup_check() already.
> > >                  */
> > 
> > With that added:
> > 
> > Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> 
> Thanks Roger.
> 
> Oleksii - would you please consider giving this long-standing bug fix
> a
> release ack?
Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

~ Oleksii



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.