[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests
>>> On 12.12.18 at 15:54, <roger.pau@xxxxxxxxxx> wrote: > Fix this by releasing the target paging lock before attempting to > perform the copy of the dirty bitmap, and then forcing a restart of > the whole process in case there have been changes to the dirty bitmap > tables. I'm afraid it's not that simple: The writer side (paging_mark_pfn_dirty()) uses the same lock, and I think the assumption is that while the copying takes place no updates to the bitmap can occur. Then again the subject domain gets paused anyway, so updates probably can't happen (the "probably" of course needs to be eliminated). But at the very least I'd expect a more thorough discussion here as to why dropping the lock intermediately is fine. The preemption mechanism can't be used as sole reference, because that one doesn't drop the lock in the middle of an iteration. > --- a/xen/arch/x86/mm/paging.c > +++ b/xen/arch/x86/mm/paging.c > @@ -408,6 +408,7 @@ static int paging_log_dirty_op(struct domain *d, > unsigned long *l1 = NULL; > int i4, i3, i2; > > + again: > if ( !resuming ) > { Considering that you set "resuming" to 1 before jumping here, why don't you put the label after this if()? I'd even consider pulling the label even further down (perhaps to immediately before the last if() ahead of the main loop), and have the path branching there re-acquire the lock (and drop some of the other state setting that then becomes unnecessary). > @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d, > bytes = (unsigned int)((sc->pages - pages + 7) >> 3); > if ( likely(peek) ) > { > + if ( paging_mode_enabled(current->domain) ) > + /* > + * Drop the target p2m lock, or else Xen will panic > + * when trying to acquire the p2m lock of the caller > + * due to invalid lock order. Note that there are no > + * lock ordering issues here, and the panic is due to > + * the fact that the lock level tracking doesn't > record > + * the domain the lock belongs to. > + */ > + paging_unlock(d); This makes it sound as if tracking the domain would help. It doesn't, at least not as long as there is not also some sort of ordering or hierarchy between domains. IOW I'd prefer if the "doesn't" became "can't". Now before we go this route I think we need to consider whether this really is the only place where the two locks get into one another's way. That's because I don't think we want to introduce more of such, well, hackery, and hence we'd otherwise need a better solution. For example the locking model could be adjusted to allow such nesting in the general case: Dom0 and any domain whose ->target matches the subject domain here could be given a slightly different "weight" in the lock order violation check logic. I'm also uncertain about the overall effect this change has, in that now the lock gets dropped for _every_ page to be copied. > @@ -505,6 +517,23 @@ static int paging_log_dirty_op(struct domain *d, > clear_page(l1); > unmap_domain_page(l1); > } > + if ( paging_mode_enabled(current->domain) && peek ) > + { > + d->arch.paging.preempt.log_dirty.i4 = i4; > + d->arch.paging.preempt.log_dirty.i3 = i3; > + d->arch.paging.preempt.log_dirty.i2 = i2 + 1; If i2 + 1 == LOGDIRTY_NODE_ENTRIES I think it would be better > + d->arch.paging.preempt.log_dirty.done = pages; > + d->arch.paging.preempt.dom = current->domain; > + d->arch.paging.preempt.op = sc->op; > + resuming = 1; true? > + if ( l2 ) > + unmap_domain_page(l2); > + if ( l3 ) > + unmap_domain_page(l3); > + if ( l4 ) > + unmap_domain_page(l4); > + goto again; At the very least for safety reasons you should set l2, l3, and l4 back to NULL here. > @@ -513,6 +542,7 @@ static int paging_log_dirty_op(struct domain *d, > { > d->arch.paging.preempt.log_dirty.i4 = i4; > d->arch.paging.preempt.log_dirty.i3 = i3 + 1; > + d->arch.paging.preempt.log_dirty.i2 = 0; > rv = -ERESTART; > break; > } > @@ -525,6 +555,7 @@ static int paging_log_dirty_op(struct domain *d, > { > d->arch.paging.preempt.log_dirty.i4 = i4 + 1; > d->arch.paging.preempt.log_dirty.i3 = 0; > + d->arch.paging.preempt.log_dirty.i2 = 0; > rv = -ERESTART; > } With these I don't see why you need storage in the domain structure: All you should need is that the local variable retain its value across the "goto again". You never exit the function with i2 non-zero afaict. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |