|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 03/10] IOMMU/MMU: enhance the call trees of IOMMU unmapping and mapping
On May 04, 2016 9:49 PM, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote:
> On Fri, Apr 29, 2016 at 10:25 AM, Quan Xu <quan.xu@xxxxxxxxx> wrote:
> > When IOMMU mapping is failed, we issue a best effort rollback,
> > stopping IOMMU mapping, unmapping the previous IOMMU maps and
> then
> > reporting the error up to the call trees. When rollback is not
> > feasible (in early initialization phase or trade-off of complexity)
> > for the hardware domain, we do things on a best effort basis, only throwing
> out an error message.
> >
> > IOMMU unmapping should perhaps continue despite an error, in an
> > attempt to do best effort cleanup.
> >
> > Signed-off-by: Quan Xu <quan.xu@xxxxxxxxx>
> >
> > CC: Keir Fraser <keir@xxxxxxx>
> > CC: Jan Beulich <jbeulich@xxxxxxxx>
> > CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> > CC: Jun Nakajima <jun.nakajima@xxxxxxxxx>
> > CC: Kevin Tian <kevin.tian@xxxxxxxxx>
> > CC: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> > ---
> > xen/arch/x86/mm.c | 13 ++++++++-----
> > xen/arch/x86/mm/p2m-ept.c | 27 +++++++++++++++++++++++++--
> > xen/arch/x86/mm/p2m-pt.c | 24 ++++++++++++++++++++----
> > xen/arch/x86/mm/p2m.c | 11 +++++++++--
> > xen/drivers/passthrough/iommu.c | 14 +++++++++++++-
> > 5 files changed, 75 insertions(+), 14 deletions(-)
> >
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index
> > a42097f..427097d 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> > int preemptible) {
> > unsigned long nx, x, y = page->u.inuse.type_info;
> > - int rc = 0;
> > + int rc = 0, ret = 0;
> >
> > ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
> >
> > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> > if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> > {
> > if ( (x & PGT_type_mask) == PGT_writable_page )
> > - iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
> > + ret = iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> > else if ( type == PGT_writable_page )
> > - iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
> > - page_to_mfn(page),
> > - IOMMUF_readable|IOMMUF_writable);
> > + ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
> > + page_to_mfn(page),
> > + IOMMUF_readable|IOMMUF_writable);
> > }
> > }
> >
> > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> > if ( (x & PGT_partial) && !(nx & PGT_partial) )
> > put_page(page);
> >
> > + if ( !rc )
> > + rc = ret;
> > +
> > return rc;
> > }
> >
> > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> > index 1ed5b47..df87944 100644
> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -821,6 +821,8 @@ out:
> > if ( needs_sync )
> > ept_sync_domain(p2m);
> >
> > + ret = 0;
> > +
> > /* For host p2m, may need to change VT-d page table.*/
> > if ( rc == 0 && p2m_is_hostp2m(p2m) && need_iommu(d) &&
> > need_modify_vtd_table )
> > @@ -831,11 +833,29 @@ out:
> > {
> > if ( iommu_flags )
> > for ( i = 0; i < (1 << order); i++ )
> > - iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> > iommu_flags);
> > + {
> > + rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> > iommu_flags);
> > +
> > + if ( !ret && unlikely(rc) )
> > + {
> > + while ( i-- )
> > + iommu_unmap_page(d, gfn + i);
> > +
> > + ret = rc;
> > + break;
> > + }
> > + }
> > else
> > for ( i = 0; i < (1 << order); i++ )
> > - iommu_unmap_page(d, gfn + i);
> > + {
> > + rc = iommu_unmap_page(d, gfn + i);
> > +
> > + if ( !ret && unlikely(rc) )
> > + ret = rc;
> > + }
> > }
> > +
> > + rc = 0;
> > }
>
> So the reason for doing this thing with setting ret=0, then using rc,
> then setting rc to zero, is to make sure that the change is propagated
> to the altp2m if necessary?
>
> I'm not a fan of this sort of "implied signalling". The check at the
> altp2m conditional is meant to say, "everything went fine, go ahead
> and propagate the change". But with your changes, that's not what we
> want anymore -- we want, "If the change actually made it into the
> hostp2m, propagate it to the altp2m."
>
> As such, I think it would be a lot clearer to just make a new boolean
> variable, maybe "entry_written", which we initialize to false and then
> set to true when the entry is actually written; and then change the
> condition re the altp2m to "if ( entry_written && p2m_is_hostp2m(..)
> )".
>
> Then I think you could make the ret / rc use mirror what you do
> elsewhere, where ret is used to store the return value of the function
> call, and it's propagated to rc only if rc is not already set.
>
George,
Thanks for your detailed explanation. This seems an another matter of
preference.
Of course, I'd follow your suggestion in p2m field, while I need to hear the
other maintainers' options as well
(especially when it comes from Kevin/Jan, as they do spend a lot of time for
me).
A little bit of difference here, IMO, an int 'entry_written' is much better, as
we also need to propagate the 'entry_written' up to callers,
i.e. the errors returned from atomic_write_ept_entry() are '-EINVAL', '-ESRCH'
and '-EBUSY'. Then, the follow is my modification (feel free to correct me):
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -666,7 +666,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn,
mfn_t mfn,
ept_entry_t *table, *ept_entry = NULL;
unsigned long gfn_remainder = gfn;
unsigned int i, target = order / EPT_TABLE_ORDER;
- int ret, rc = 0;
+ int ret, rc = 0, entry_written = 0;
bool_t direct_mmio = (p2mt == p2m_mmio_direct);
uint8_t ipat = 0;
bool_t need_modify_vtd_table = 1;
@@ -763,8 +763,8 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn,
mfn_t mfn,
/* now install the newly split ept sub-tree */
/* NB: please make sure domian is paused and no in-fly VT-d DMA. */
- rc = atomic_write_ept_entry(ept_entry, split_ept_entry, i);
- ASSERT(rc == 0);
+ entry_written = atomic_write_ept_entry(ept_entry, split_ept_entry, i);
+ ASSERT(entry_written == 0);
/* then move to the level we want to make real changes */
for ( ; i > target; i-- )
@@ -809,9 +809,12 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn,
mfn_t mfn,
new_entry.suppress_ve = is_epte_valid(&old_entry) ?
old_entry.suppress_ve : 1;
- rc = atomic_write_ept_entry(ept_entry, new_entry, target);
- if ( unlikely(rc) )
+ entry_written = atomic_write_ept_entry(ept_entry, new_entry, target);
+ if ( unlikely(entry_written) )
+ {
old_entry.epte = 0;
+ rc = entry_written;
+ }
else if ( p2mt != p2m_invalid &&
(gfn + (1UL << order) - 1 > p2m->max_mapped_pfn) )
/* Track the highest gfn for which we have ever had a valid mapping */
@@ -822,7 +825,7 @@ out:
ept_sync_domain(p2m);
/* For host p2m, may need to change VT-d page table.*/
- if ( rc == 0 && p2m_is_hostp2m(p2m) && need_iommu(d) &&
+ if ( entry_written == 0 && p2m_is_hostp2m(p2m) && need_iommu(d) &&
need_modify_vtd_table )
{
if ( iommu_hap_pt_share )
@@ -831,10 +834,28 @@ out:
{
if ( iommu_flags )
for ( i = 0; i < (1 << order); i++ )
- iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
+ {
+ ret = iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
iommu_flags);
+
+ if ( unlikely(ret) )
+ {
+ while ( i-- )
+ iommu_unmap_page(p2m->domain, gfn + i);
+
+ if ( !rc )
+ rc = ret;
+
+ break;
+ }
+ }
else
for ( i = 0; i < (1 << order); i++ )
- iommu_unmap_page(d, gfn + i);
+ {
+ ret = iommu_unmap_page(d, gfn + i);
+
+ if ( !rc && unlikely(ret) )
+ rc = ret;
+ }
}
}
> > @@ -680,11 +680,27 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> unsigned long gfn, mfn_t mfn,
> > }
> > else if ( iommu_pte_flags )
> > for ( i = 0; i < (1UL << page_order); i++ )
> > - iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
> > - iommu_pte_flags);
> > + {
> > + ret = iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
> > + iommu_pte_flags);
> > +
> > + if ( !rc && unlikely(ret) )
> > + {
> > + while ( i-- )
> > + iommu_unmap_page(p2m->domain, gfn + i);
> > +
> > + rc = ret;
> > + break;
> > + }
>
> Hmm, is this conditional here right? What the code actually says to
> do is to always map pages, but to only unmap them on an error if there
> have been no other errors in the function so far.
>
> As it happens, at the moment rc cannot be non-zero here since any time
> it's non-zero the code jumps down to the 'out' label, skipping this.
> But if that ever changed, this would fail to unmap when it probably
> should.
>
> It seems like this be:
>
> if ( unlikely(ret) )
> {
> while ( i-- )
> iommu_unmap_page(p2m->domain, gfn + i);
> if ( !rc )
> rc = ret;
> break;
> }
>
Looks good. Thanks.
> Or if you want to assume that rc will never be zero, then put an
> ASSERT() just before the for loop here.
>
> Everything else looks good to me. Thanks again for your work on this.
>
Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |