[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference



>>> On 26.04.17 at 16:25, <tim@xxxxxxx> wrote:
> At 08:07 -0600 on 26 Apr (1493194043), Jan Beulich wrote:
>> >>> On 25.04.17 at 12:59, <tim@xxxxxxx> wrote:
>> > Hi,
>> > 
>> > At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote:
>> >> Jann's explanation of the problem:
>> >> 
>> >> "start situation:
>> >>  - domain A and domain B are PV domains
>> >>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
>> >>    are not scheduled away
>> >>  - domain A has XSM_TARGET access to domain B
>> >>  - page X is owned by domain B and has no mappings
>> >>  - page X is zeroed
>> >> 
>> >>  steps:
>> >>  - domain A uses do_mmu_update() to map page X in domain A as writable
>> >>  - domain A accesses page X through the new PTE, creating a TLB entry
>> >>  - domain A removes its mapping of page X
>> >>    - type count of page X goes to 0
>> >>    - tlbflush_timestamp of page X is bumped
>> >>  - domain B maps page X as L1 pagetable
>> >>    - type of page X changes to PGT_l1_page_table
>> >>    - TLB flush is forced using domain_dirty_cpumask of domain B
>> >>    - page X is mapped as L1 pagetable in domain B
>> >> 
>> >>  At this point, domain B's vCPUs are guaranteed to have no
>> >>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
>> >>  vCPUs can still have stale TLB entries that map page X as writable,
>> >>  permitting domain A to control a live pagetable of domain B."
>> > 
>> > AIUI this patch solves the problem by immediately flushing domain A's
>> > TLB entries at the point where domain A removes its mapping of page X.
>> > 
>> > Could we, instead, bitwise OR domain A's domain_dirty_cpumask into
>> > domain B's domain_dirty_cpumask at the same point?
>> > 
>> > Then when domain B flushes TLBs in the last step (in __get_page_type())
>> > it will catch any stale TLB entries from domain A as well.  But in the
>> > (hopefully common) case where there's a delay between domain A's
>> > __put_page_type() and domain B's __get_page_type(), the usual TLB
>> > timestamp filtering will suppress some of the IPIs/flushes.
>> 
>> So I've given this a try, and failed miserably (including losing an
>> XFS volume on the test machine). The problem is the BUG_ON() at
>> the top of domain_relinquish_resources() - there will, very likely, be
>> bits remaining set if the code added to put_page_from_l1e() set
>> some pretty recently (irrespective of avoiding to set any once
>> ->is_dying has been set).
> 
> Yeah. :(  Would it be correct to just remove that BUG_ON(), or replace it
> with an explicit check that there are no running vcpus?
> 
> Or is using domain_dirty_cpumask like this too much of a stretch?
> E.g. PV TLB flushes use it, and would maybe be more expensive until
> the dom0 CPUs fall out of the mask (which isn't guaranteed to happen).

Right, since effectively some of the bits may never clear (if
pg_owner never runs on some pCPU l1e_owner does run on), I
now think this model could even have introduced more overhead
than the immediate flushing.

> We could add a new mask just for this case, and clear CPUs from it as
> they're flushed.  But that sounds like a lot of work...

Wouldn't it suffice to set bits in this mask in put_page_from_l1e()
and consume/clear them in __get_page_type()? Right now I can't
see it being necessary for correctness to fiddle with any of the
other flushes using the domain dirty mask.

But then again this may not be much of a win, unless the put
operations come through in meaningful batches, not interleaved
by any type changes (the latter ought to be guaranteed during
domain construction and teardown at least, as the guest itself
can't do anything at that time to effect type changes). Hence I
wonder whether ...

> Maybe worth measuring the impact of the current patch before going too
> far with this?

... it wouldn't better be the other way around: We use the patch
in its current (or even v1) form, and try to do something about
performance only if we really find a case where it matters. To be
honest, I'm not even sure how I could meaningfully measure the
impact here: Simply counting how many extra flushes there would
end up being wouldn't seem all that useful, and whether there
would be any measurable difference in the overall execution time
of e.g. domain creation I would highly doubt (but if it's that what
you're after, I could certainly collect a few numbers).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.