[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] abstract model of IOMMU unmaping/mapping failures



On April 08, 2016 6:44am, <JBeulich@xxxxxxxx> wrote:
> >>> On 06.04.16 at 09:38, <quan.xu@xxxxxxxxx> wrote:
> > On April 01, 2016 7:57pm, <JBeulich@xxxxxxxx> wrote:
> >> >>> On 31.03.16 at 11:06, <quan.xu@xxxxxxxxx> wrote:
> >> > 4. __gnttab_unmap_common():rollback (no change)
> >> >
> >> > (Existing code)
> >> >        >>...
> >> >              if ( !kind )
> >> >             err = iommu_unmap_page(ld, op->frame);
> >> >         else if ( !(kind & MAPKIND_WRITE) )
> >> >             err = iommu_map_page(ld, op->frame, op->frame,
> >> > IOMMUF_readable);
> >> >
> >> >         double_gt_unlock(lgt, rgt);
> >> >
> >> >         if ( err )
> >> >             rc = GNTST_general_error;
> >> >        <<...
> >> >
> >> > Similarly no change required, as error has been correctly handled.
> >>
> >> I wouldn't call this "correctly handled", but for the hardware domain
> >> it should be good enough, and by crashing DomU-s simply reporting the
> >> error up the call tree is sufficient. One question though is whether
> >> the loops higher up the call tree in grant_table.c shouldn't be
> >> exited when noticing the domain has crashed, both to avoid unnecessary
> work and to reduce the risk of secondary problems.
> >>
> >
> > For this point, I suppose that the domain structure is not destructed
> > (we are safe to call domain's element, e.g. ld->grant_table) in
> > do_grant_table_op hypercall cycle, even the domain is crashed down. I
> > am not quite sure, whether it is correct or not, could you explain
> > more?
> 
> Explain what more? 
> Sure, struct domain stays around until the domain gets
> actually cleaned up, 

Explain why it is sure. Now i have got it.

> so accesses to its grant table (and alike) remain valid while
> execution didn't leave the context of that guest (vCPU) yet.
> 

Thanks for your reminding.

> >> > 8. p2m_remove_page():rollback one level(change required).
> >> >
> >> > (Existing code)
> >> >   >>...
> >> >     if ( !paging_mode_translate(p2m->domain) )
> >> >     {
> >> >         if ( need_iommu(p2m->domain) )
> >> >             for ( i = 0; i < (1 << page_order); i++ )
> >> >                 iommu_unmap_page(p2m->domain, mfn + i);
> >> >         return 0;
> >> >     }
> >> >   <<...
> >> >
> >> > There is no error checking of iommu_unmap_page. We need to add.
> >> > However, there are several callers of this function which don't
> >> > handle error at all. I'm not sure whether hardware domain will use this
> function.
> >> > Since it is in core p2m logic (which I don't have much knowledge),
> >> > I hope we can print a warning and simply return error here (given
> >> > the fact that non-hardware domains are all crashed immediately
> >> > within unmap call)
> >>
> >> Yes, at least error propagation needs to be added here. I don't think
> >> more is required. (Be careful, btw, with adding warnings - you must
> >> not spam the log with such.)
> >>
> >
> >
> >  agreed. A quick question about error propagation:
> > ...
> > for ( i = 0; i < (1UL << page_order); i++ )
> >     iommu_unmap_page(p2m->domain, gfn + i); ...
> >
> > As you mentioned, as a special case, unmapping should perhaps continue
> > despite an error, in an attempt to do best effort cleanup.
> > Then i could modify the code as:
> >
> > ...
> > for ( i = 0; i < (1UL << page_order); i++ ) {
> >     rc = iommu_unmap_page(p2m->domain, gfn + i);
> >     if ( rc )
> >       ret = rc;
> > }
> > ..
> >    return ret;
> > ...
> >
> > It looks cumbersome to me, any suggestion?
> 
> What's cumbersome here?
> 

Sorry, I don't like to introduce 'rc'/'ret' in the same function. Ignore me, if 
it is working.

> >> > 11. __get_page_type(): open (propose no-rollback).
> >> > The following is in detail:
> >> >     >>...
> >> >       if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >> >         {
> >> >             if ( (x & PGT_type_mask) == PGT_writable_page )
> >> >                 iommu_unmap_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)));
> >> >             else if ( type == PGT_writable_page )
> >> >                 iommu_map_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)),
> >> >                                page_to_mfn(page),
> >> >
> >> IOMMUF_readable|IOMMUF_writable);
> >> >         }
> >> >     <<...
> >> >
> >> > It's almost in the end of the func.
> >> > As the limited set of cases where __get_page_type() calls
> >> > iommu_{,un}map_page(), and a get_page_type() invocation that
> >> > previously was known to never fail (or at least so we hope, based
> >> > on the existing code) [I got it from Jan's reply.],
> >>
> >> Please don't mis-quote me: I said this for a specific caller of the
> >> function, not about the function itself.
> >>
> >
> > Sorry for that.
> >
> >> > we can classify it as normal PV domain and be treated as a fatal error.
> >> > So for I am inclined to leave it as today.
> >>
> >
> > I think one level propagation, only propagated in __get_page_type(),
> > is enough.
> 
> Not sure what you mean with "one level propagation" here.
> 

one level propagation is propagating error only in __get_page_type(), and 
leaving the callers (one or more levels up) as they are.

e.g. the call tree, 
__get_page_type() -- get_page_type() -- do_mmu_update()--..
propagate error only in __get_page_type(), and leaving the get_page_type() / 
do_mmu_update() as they are.


Jan, thanks for your review. I'd try my best on the release window.
Quan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.