[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v1] x86/mm: Suppresses vm_events caused by page-walks
>>> On 27.08.18 at 15:02, <andrew.cooper3@xxxxxxxxxx> wrote: > On 27/08/18 13:53, Razvan Cojocaru wrote: >> On 8/27/18 3:37 PM, Andrew Cooper wrote: >>> On 27/08/18 13:12, Jan Beulich wrote: >>>>>> For NPT, isn't there an error code bit telling you whether the >>>>>> request was a user or "system" one? If not, some cheating >>>>>> would be needed (derive from CPL, accepting that e.g. >>>>>> descriptor table accesses would get mis-attributed), but >>>>>> that's still not going to involve looking at the PTE flags. >>>>> The alternative would be to simply walk (without enforcing any flags, >>>>> and so making the pfec walk parameter unnecessary) to the respective >>>>> address, and query for _PAGE_ACCESSED and _PAGE_DIRTY only. >>>>> >>>>> If _PAGE_ACCESSED is not set, set it and exit. >>>>> If _PAGE_ACCESSED is set, set _PAGE_DIRTY also and exit. >>>> Since it's ambiguous in the NPT case - are you talking about >>>> setting the flags in the guest or host page tables? The >>>> former, I'm afraid, might not be acceptable (as not always >>>> being architecturally correct). In anyway feels as if we'd >>>> been here before, in that this reminds me of you meaning >>>> to imply from a second walk (with A already set) that it must >>>> be a write access. I thought we had settled on such an >>>> implication not being generally correct. >>> The problem that is trying to be solved is that when operating in >>> non-root mode, the cpu pagewalk, when trying to set a guest A/D bit in a >>> write-protected EPT page, takes an EPT violation for a write to a >>> read-only page. >>> >>> Manually setting the A/D bits (as appropriate) and restarting the >>> instruction is sufficient for it to complete correctly. >>> >>> At the moment, every time this happens, a request is sent to the >>> introspection agent, and the agent calculates that it was due to >>> pagetable protection, and instructs Xen to emulate the instruction. >>> This accounts for 97% (?) of the VMExits, and is unrelated to any of the >>> real protections which introspection is trying to achieve. >>> >>> What Razvan is looking to do is to have Xen skip the "send to the >>> introspection agent" part as an optimisation, because hardware tells Xen >>> (as part of the VMExit) when this condition has occurred, and the >>> vm_event logic has already asked Xen to try and fix up this condition >>> automatically. >>> >>> What can actually be done depends on how A/D bits behave in real hardware. >>> >>> Setting access bits for non-leaf entries is definitely fine, and >>> speculatively setting the access bit is also explicitly permitted by the >>> spec. However, I can't find any comment on speculative dirty bits (from >>> either Intel or AMD), and I've not encountered such a behaviour with the >>> pagetable work I've been doing. Yeah, a description of the problem to solve definitely helps. >> I've forgotten a piece of information that I really should have written >> here: we would only set the D bit if A is already set and either the >> page is writable (has _PAGE_RW set) or CR0.WP is 0 (the latter case is >> admittedly more tricky). > > How about a new function which works similarly to guest-walk-tables, but > only ever sets A/D bits. > > Given information from hardware, we know the linear address, and that it > was a problem with the guest pagetables, from which we explicitly know > that it was from writing an A/D bit to a guest PTE. > > While walking down the levels, set any missing A bits and remember if we > set any. If we set A bits, consider ourselves complete and exit back to > the guest. If no A bits were set, and the access was a write (which we > know from the EPT violation information), then set the leaf D bit. Plus taking into consideration CR0.WP and the entry's W bit, as Razvan has said. > This should be architecturally correct as it is exclusively derived from > information provided by the VMExit, and won't cause dirty bits to be > written in cases where the hardware wouldn't have written them > (speculative or otherwise). It does mean that an instruction which > would need to set A and D bits in the walk will take two EPT violations > to achieve the end result, but it probably is still quicker than sending > the vm_event out. I'm afraid this is going to be only mostly correct: Atomicity of the page table write is going to be lost. This could become an actual problem if the guest used racing PTE accesses. Such racing accesses might not be a bug - simply consider the OS scanning for set A and/or D bits (and clearing them when they're set). Or an entity temporarily clearing (parts of) PTEs, with recovery logic in place to restore them when needed for a synchronous access. At the very least there's then the risk of a live lock within the guest. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |