[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4.1 4/6] xen/x86: Allow stubdom access to irq created for msi.



On Thu, Mar 07, 2019 at 03:48:01PM +0100, Roger Pau Monné wrote:
> On Thu, Mar 07, 2019 at 01:50:04AM +0100, Marek Marczykowski-Górecki wrote:
> > On Fri, Feb 22, 2019 at 11:42:22AM +0100, Roger Pau Monné wrote:
> > > On Thu, Feb 21, 2019 at 06:40:40PM +0100, Marek Marczykowski-Górecki 
> > > wrote:
> > > > On Thu, Feb 21, 2019 at 05:47:51PM +0100, Roger Pau Monné wrote:
> > > > > On Fri, Feb 08, 2019 at 11:17:05AM +0100, Marek Marczykowski-Górecki 
> > > > > wrote:
> > > > > >  {
> > > > > >      int irq, ret;
> > > > > >      struct irq_desc *desc;
> > > > > > @@ -190,19 +190,19 @@ int create_irq(nodeid_t node)
> > > > > >          desc->arch.used = IRQ_UNUSED;
> > > > > >          irq = ret;
> > > > > >      }
> > > > > > -    else if ( hardware_domain )
> > > > > > +    else if ( dm_domain )
> > > > > 
> > > > > Can you guarantee that dm_domain is always current->domain?
> > > > 
> > > > No, in some cases it will be hardware_domain.
> > > 
> > > Right, but in that case current->domain == hardware_domain I guess?
> > > 
> > > > 
> > > > > I think you need to assert for this, or else take a reference to
> > > > > dm_domain (get_domain) before accessing it's fields, or else you risk
> > > > > the domain being destroyed while modifying it's fields.
> > > > 
> > > > Can hardware_domain be destroyed without panicking Xen? If so,
> > > > get_domain would be indeed needed.
> > > 
> > > What about other callers that are not the hardware_domain? You need to
> > > make sure those domains are not destroyed while {create/destroy}_irq
> > > is changing the permissions.
> > > 
> > > If you can guarantee that dm_domain == current->domain then you are
> > > safe, if not you need to get a reference before modifying any fields
> > > of the domain struct.
> > > 
> > > So IMO you either need to add an assert or a get_domain depending on
> > > the answer to the question above.
> > 
> > I've added an assert, and I think I have the answer to the above question:
> > 
> >     (XEN) Assertion 'd == current->domain' failed at irq.c:232
> >     (XEN) ----[ Xen-4.12.0-rc  x86_64  debug=y   Not tainted ]----
> >     (XEN) CPU:    2
> >     (XEN) RIP:    e008:[<ffff82d08028f545>] destroy_irq+0x126/0x143
> >     (XEN) RFLAGS: 0000000000010206   CONTEXT: hypervisor
> > (...)
> >     (XEN) Xen call trace:
> >     (XEN)    [<ffff82d08028f545>] destroy_irq+0x126/0x143
> >     (XEN)    [<ffff82d08028ce1e>] msi_free_irq+0x51/0x1b8
> >     (XEN)    [<ffff82d0802923e1>] unmap_domain_pirq+0x487/0x4d4
> >     (XEN)    [<ffff82d08029249f>] free_domain_pirqs+0x71/0x8f
> >     (XEN)    [<ffff82d0802819e0>] arch_domain_destroy+0x41/0xa1
> >     (XEN)    [<ffff82d080207d22>] 
> > domain.c#complete_domain_destroy+0x86/0x159
> >     (XEN)    [<ffff82d08022a658>] 
> > rcupdate.c#rcu_process_callbacks+0xa5/0x1cc
> >     (XEN)    [<ffff82d08023c4fa>] softirq.c#__do_softirq+0x78/0x9a
> >     (XEN)    [<ffff82d08023c566>] do_softirq+0x13/0x15
> >     (XEN)    [<ffff82d080280532>] domain.c#idle_loop+0x63/0xb9
> > 
> > In this case, current->domain obviously isn't the stubdomain, because at
> > this point it is already destroyed (it keeps reference to the target
> > domain, so target domain couldn't be destroyed before its stubdomain).
> > 
> > In fact, in this case get_dm_domain() returns wrong value, since it
> > isn't called by device model. At the point where stubdomain is already
> > destroyed, I think it should return NULL, but it returns
> > hardware_domain. But it isn't that easy, because target domain don't
> > have any reference to its stubdomain. Especially already destroyed one.
> > 
> > I's defined as:
> > 
> >     static struct domain *get_dm_domain(struct domain *d)
> >     {
> >         return current->domain->target == d ? current->domain :
> >                                               hardware_domain;
> >     }
> 
> So get_dm_domain works fine when used by create_irq, because in that
> case current->domain == d AFAICT.
> 
> As you pointed out however using the same mechanism for destroy is not
> suitable, since the stubdomain might be already destroyed, and
> unmap_domain_pirq called from the idle vCPU.
> 
> > Since hardware domain couldn't be destroyed(*) while the system is
> > running, in practice this wrong return value it isn't a problem.
> > Hardware didn't have permission over this IRQ (if stubdomain have
> > created it), so irq_deny_access is a no-op.
> > 
> > So, I would adjust assert in destroy_irq to allow also hardware_domain,
> > and add a comment that get_dm_domain may return hardware_domain during
> > domain destruction. Is that ok?
> 
> Hm, albeit I agree with your analysis, I feel like this proposed
> solution is kind of a workaround. Given the context, I think the best
> way to deal with this issue in destroy_irq is to iterate over the list
> of domains and make sure no domain has permissions over the destroyed
> irq. Note that with this proposed solution you will have to drop the
> domain parameter from destroy_irq.

I'd really like to avoid iterating the whole domain list. Jan, what do
you think? Code-wise this seems to be the easiest solution.

> Another option would be to store which domains are given permissions
> over which irqs in a way that's optimized to get the list of domains
> given an irq (ie: without having to iterate over the list of domains
> like my proposed solution above).

This may make sense, but if that would be instead of the current
structure, we'd have another problem: when destroying stubdomain, you'd
need to get list IRQs it has access to, to explicitly revoke
stubdomain's access. In theory it could be done by looking at the target
domain and iterating over its IRQs, but this is getting more and more
complex.

I think the tricky part is unmap_domain_pirq->msi_free_irq, which can be
called:
1. from PHYSDEVOP_unmap_pirq, by stubdomain
2. from PHYSDEVOP_unmap_pirq, by dom0 when device model runs there
3. from PHYSDEVOP_unmap_pirq, by dom0 even with device model in
stubdomain - normally it shouldn't happen for MSI allocated by
stubdomain, but dom0 is allowed to do so, and it shouldn't cause any
harm
4. from free_domain_pirqs, during domain destruction
5. various error paths

If unmap_domain_pirq would know where device model is running, even if
not called by it, that would help. What about adding back reference in
struct domain to a stubdomain? That would help a lot here. The only
problem is a circular dependency stubdom->target->stubdom. This cycle
would need to be broken during stubdom teardown. domain_kill(stubdom)
looks like a good place to break it. Is it guaranteed to be called, or
is there some other path to destroying a stubdomain?

Can one HVM domain have multiple stubdomains? If so, it should a be
list, not a single field.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.