[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] abstract model of IOMMU unmaping/mapping failures



On April 01, 2016 7:57pm, <JBeulich@xxxxxxxx> wrote:
> >>> On 31.03.16 at 11:06, <quan.xu@xxxxxxxxx> wrote:
> > 4. __gnttab_unmap_common():rollback (no change)
> >
> > (Existing code)
> >        >>...
> >              if ( !kind )
> >             err = iommu_unmap_page(ld, op->frame);
> >         else if ( !(kind & MAPKIND_WRITE) )
> >             err = iommu_map_page(ld, op->frame, op->frame,
> > IOMMUF_readable);
> >
> >         double_gt_unlock(lgt, rgt);
> >
> >         if ( err )
> >             rc = GNTST_general_error;
> >        <<...
> >
> > Similarly no change required, as error has been correctly handled.
> 
> I wouldn't call this "correctly handled", but for the hardware domain it 
> should
> be good enough, and by crashing DomU-s simply reporting the error up the call
> tree is sufficient. One question though is whether the loops higher up the 
> call
> tree in grant_table.c shouldn't be exited when noticing the domain has 
> crashed,
> both to avoid unnecessary work and to reduce the risk of secondary problems.
> 

For this point, I suppose that the domain structure is not destructed (we are 
safe to call domain's element, e.g. ld->grant_table) in do_grant_table_op 
hypercall cycle,
even the domain is crashed down. I am not quite sure, whether it is correct or 
not, could you explain more?
Then, it is unnecessary to flush tlb. The code is:
    ...
fault:
    gnttab_flush_tlb(current->domain);
    ...

If the domain is crashed down, the current->domain->is_shut_down is ture. So we 
can modify it as:
...
fault:
    if ( current->domain->is_shut_down )
       gnttab_flush_tlb(current->domain);
...

The other code of the call trees is good to me.


> > 7. set_identity_p2m_entry():rollback (no change).
> >
> > (Existing code)
> >   >>...
> >     if ( !paging_mode_translate(p2m->domain) )
> >     {
> >         if ( !need_iommu(d) )
> >             return 0;
> >         return iommu_map_page(d, gfn, gfn,
> IOMMUF_readable|IOMMUF_writable);
> >     }
> >   <<...
> >      if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
> >         ret = iommu_map_page(d, gfn, gfn,
> IOMMUF_readable|IOMMUF_writable);
> >    >> ...
> >
> > error handling and rollback already in-place.
> 
> For the first portion of the quoted code you mean. I don't see any rollback in
> the paging_mode_translate() case.
> 

Does it refer to as the below call trees:
 set_identity_p2m_entry()--rmrr_identity_mapping()--intel_iommu_add_device() 
--...
 set_identity_p2m_entry()--rmrr_identity_mapping()-- 
intel_iommu_remove_device()--... 

my description is not accurate, but the current code is right, and I am no need 
to modify it. right?


> > 8. p2m_remove_page():rollback one level(change required).
> >
> > (Existing code)
> >   >>...
> >     if ( !paging_mode_translate(p2m->domain) )
> >     {
> >         if ( need_iommu(p2m->domain) )
> >             for ( i = 0; i < (1 << page_order); i++ )
> >                 iommu_unmap_page(p2m->domain, mfn + i);
> >         return 0;
> >     }
> >   <<...
> >
> > There is no error checking of iommu_unmap_page. We need to add.
> > However, there are several callers of this function which don't handle
> > error at all. I'm not sure whether hardware domain will use this function.
> > Since it is in core p2m logic (which I don't have much knowledge), I
> > hope we can print a warning and simply return error here (given the
> > fact that non-hardware domains are all crashed immediately within
> > unmap call)
> 
> Yes, at least error propagation needs to be added here. I don't think more is
> required. (Be careful, btw, with adding warnings - you must not spam the log
> with such.)
> 


 agreed. A quick question about error propagation:
...
for ( i = 0; i < (1UL << page_order); i++ )
    iommu_unmap_page(p2m->domain, gfn + i);
...

As you mentioned, as a special case, unmapping should perhaps continue despite 
an error, in an attempt to do best effort cleanup.
Then i could modify the code as:

...
for ( i = 0; i < (1UL << page_order); i++ )
{
    rc = iommu_unmap_page(p2m->domain, gfn + i);
    if ( rc )
      ret = rc;
}
..
   return ret;
...

It looks cumbersome to me, any suggestion?

> > ----
> > 9. p2m_pt_set_entry(): open (propose no-rollback).
> >
> > (Existing code)
> >   >>...
> >            for ( i = 0; i < (1UL << page_order); i++ )
> >                 iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
> >                                iommu_pte_flags);
> >         else
> >             for ( i = 0; i < (1UL << page_order); i++ )
> >                 iommu_unmap_page(p2m->domain, gfn + i);
> >   <<...
> >
> > Above is in the end of the func.
> >
> > I'm not sure whether it will be called for hardware domain ( PVH mode
> > is one potential concern. as we know, PVH has been introduced on Xen
> > 4.4 as a DomU, and on Xen 4.5 as a Dom0.).
> 
> Indeed you can get here for Dom0 only when it's non-PV.
> 

A quick question, is non-PV the same as PVH for hardware domain?


> > If not, we can leave it as today. Otherwise, I want to hear your
> > suggestion whether we can use no-rollback (just erring) for hardware
> > domain here since this code path is very complex.
> 
> Of course (and there's really little point in repeating the same over and over
> again) - as long as no-one involved considers the general model (set forth at 
> the
> top) problematic, it can be done the same way everywhere. Which still means
> (as said before) that errors need to be propagated.
> 
Agreed.

> > 11. __get_page_type(): open (propose no-rollback).
> > The following is in detail:
> >     >>...
> >       if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >         {
> >             if ( (x & PGT_type_mask) == PGT_writable_page )
> >                 iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> >             else if ( type == PGT_writable_page )
> >                 iommu_map_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)),
> >                                page_to_mfn(page),
> >
> IOMMUF_readable|IOMMUF_writable);
> >         }
> >     <<...
> >
> > It's almost in the end of the func.
> > As the limited set of cases where __get_page_type() calls
> > iommu_{,un}map_page(), and a get_page_type() invocation that
> > previously was known to never fail (or at least so we hope, based on
> > the existing code) [I got it from Jan's reply.],
> 
> Please don't mis-quote me: I said this for a specific caller of the function, 
> not
> about the function itself.
> 

Sorry for that.

> > we can classify it as normal PV domain and be treated as a fatal error.
> > So for I am inclined to leave it as today.
> 

I think one level propagation, only propagated in __get_page_type(), is enough.

> And again - at least errors need to be propagated.
Sure.

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.