[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] x86/mm: Suppresses vm_events caused by page-walks
On 19/09/18 09:53, Jan Beulich wrote: >>>> On 18.09.18 at 20:20, <andrew.cooper3@xxxxxxxxxx> wrote: >> On 18/09/18 11:17, Jan Beulich wrote: >>>>>> On 18.09.18 at 11:47, <aisaila@xxxxxxxxxxxxxxx> wrote: >>>> On Thu, 2018-09-13 at 08:17 -0600, Jan Beulich wrote: >>>>>>>> On 12.09.18 at 11:47, <aisaila@xxxxxxxxxxxxxxx> wrote: >>>>>> The original version of the patch emulated the current instruction >>>>>> (which, as a side-effect, emulated the page-walk as well), however >>>>>> we >>>>>> need finer-grained control. We want to emulate the page-walk, but >>>>>> still >>>>>> get an EPT violation event if the current instruction would trigger >>>>>> one. >>>>>> This patch performs just the page-walk emulation. >>>>> Rather than making this basically a revision log, could you please >>>>> focus >>>>> on what you actually want to achieve? >>>>> >>>>> As to the title: "Suppress ..." please. >>>>> >>>>>> @@ -149,6 +151,10 @@ guest_walk_tables(struct vcpu *v, struct >>>>>> p2m_domain *p2m, >>>>>> ar_and &= gflags; >>>>>> ar_or |= gflags; >>>>>> >>>>>> + if ( set_ad && set_ad_bits(&l4p[guest_l4_table_offset(va)].l4, >>>>>> + &gw->l4e.l4, false) ) >>>>>> + accessed = true; >>>>> It is in particular this seemingly odd (and redundant with what's >>>>> done >>>>> later in the function) which needs thorough explanation. >>>> On this patch I've followed Andrew Cooper's suggestion on how to set >>>> A/D Bits: >>>> >>>> "While walking down the levels, set any missing A bits and remember if we >>>> set any. If we set A bits, consider ourselves complete and exit back to >>>> the guest. If no A bits were set, and the access was a write (which we >>>> know from the EPT violation information), then set the leaf D bit." >>>> >>>> If I misunderstood the comment please clarify. >>> It doesn't look to me as if you misunderstood anything, but only Andrew >>> can say for sure. However, none of this was in the description of your >>> patch (neither as part of the description, nor as code comment), and I >>> think you'd even have to greatly extend on this in order to explain to >>> everyone why the resulting behavior is still architecturally correct. In no >>> case should you assume anyone reading your patch (now or in the >>> future) has participated in the earlier discussion. >> The problem we have is that, while we know the EPT Violation was for a >> write of an A or D bit to a write-protected guest pagetable, we don't >> know if it was the A or the D bit which was attempting to be set. >> >> Furthermore (without emulating the instruction, which is what we are >> trying to avoid), we can't reconstruct the access. >> >> Access bits are only written if they were missing before, but may be set >> speculatively. Dirty bits are only set when a write is retired. From a >> practical point of view, the pipeline sets A and D bits as separate actions. >> >> Following this logic (and assuming for now a single vcpu), if we get a >> GPT EPT Violation, and there are missing access bits on the walk, then >> the fault is definitely from setting an access bit. > Definitely? Yes > Is there anything guaranteeing architecturally that an access > bit related EPT violation would be delivered earlier than any other one > on that same or a lower page table level? No, but why does that matter? Architecturally defined or not, we know that the action the processor was trying to perform was to set an A/D bit, because we got a vmexit telling us so. > It doesn't matter much for > the implementation (because of it being permissible to set the A bits > speculatively, as you also say further down, and any other violation > then re-occurring after exiting back to the guest once the A bits are > all set), but since we're discussing here what exactly the patch > description should contain, I think I'd prefer this to be fully correct there. > > Or wait - I think I can agree with "definitely", provided you further > restrict the context: "..., if we get a GPT EPT Write Violation ...". But > from what I can tell the patch'es change to p2m_mem_access_check() > doesn't apply (or pass on) any of these qualifications at all. I've not looked at the patch in detail yet. I'm tempted to suggest rearranging guest_walk_tables() to just set the access bits on the decent, rather than at the end. This matches how some hardware behaves when pulling entries into the paging structure cache. > >> Set all access bits >> and call it done. If we get a GPT EPT Violation and all access bits >> were set, then it was definitely from setting the Dirty bit. >> >> For multi-vcpu scenarios, things get racy. Setting all the Access bits >> is safe because its a speculative action, but a speculatively load on >> one vcpu can race with a write (to a read-only mapping) on the other >> vcpu, and would trick this algorithm into setting the dirty bit when the >> write would have faulted (and not set the dirty bit). >> >> Do we have numbers on how many the GPT EPT Violations are for (only) >> access sets, and how many are for dirty tsets? Would the first half of >> the algorithm (which is definitely not racy) still be a net perf win? > Does Windows make use of A bits at all? I'd expect most OSes to > simply set them right away, and actively use of the D bits. What gives you the expectation that OSes wouldn't use A bits? For paging out, the best options are non-accessed non-dirty page because their contents can be discarded immediately and reread from disk at a later point. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |