[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v1] x86/mm: Suppresses vm_events caused by page-walks
On 8/27/18 4:02 PM, Andrew Cooper wrote: > On 27/08/18 13:53, Razvan Cojocaru wrote: >> On 8/27/18 3:37 PM, Andrew Cooper wrote: >>> On 27/08/18 13:12, Jan Beulich wrote: >>>>>> For NPT, isn't there an error code bit telling you whether the >>>>>> request was a user or "system" one? If not, some cheating >>>>>> would be needed (derive from CPL, accepting that e.g. >>>>>> descriptor table accesses would get mis-attributed), but >>>>>> that's still not going to involve looking at the PTE flags. >>>>> The alternative would be to simply walk (without enforcing any flags, >>>>> and so making the pfec walk parameter unnecessary) to the respective >>>>> address, and query for _PAGE_ACCESSED and _PAGE_DIRTY only. >>>>> >>>>> If _PAGE_ACCESSED is not set, set it and exit. >>>>> If _PAGE_ACCESSED is set, set _PAGE_DIRTY also and exit. >>>> Since it's ambiguous in the NPT case - are you talking about >>>> setting the flags in the guest or host page tables? The >>>> former, I'm afraid, might not be acceptable (as not always >>>> being architecturally correct). In anyway feels as if we'd >>>> been here before, in that this reminds me of you meaning >>>> to imply from a second walk (with A already set) that it must >>>> be a write access. I thought we had settled on such an >>>> implication not being generally correct. >>> The problem that is trying to be solved is that when operating in >>> non-root mode, the cpu pagewalk, when trying to set a guest A/D bit in a >>> write-protected EPT page, takes an EPT violation for a write to a >>> read-only page. >>> >>> Manually setting the A/D bits (as appropriate) and restarting the >>> instruction is sufficient for it to complete correctly. >>> >>> At the moment, every time this happens, a request is sent to the >>> introspection agent, and the agent calculates that it was due to >>> pagetable protection, and instructs Xen to emulate the instruction. >>> This accounts for 97% (?) of the VMExits, and is unrelated to any of the >>> real protections which introspection is trying to achieve. >>> >>> What Razvan is looking to do is to have Xen skip the "send to the >>> introspection agent" part as an optimisation, because hardware tells Xen >>> (as part of the VMExit) when this condition has occurred, and the >>> vm_event logic has already asked Xen to try and fix up this condition >>> automatically. >>> >>> What can actually be done depends on how A/D bits behave in real hardware. >>> >>> Setting access bits for non-leaf entries is definitely fine, and >>> speculatively setting the access bit is also explicitly permitted by the >>> spec. However, I can't find any comment on speculative dirty bits (from >>> either Intel or AMD), and I've not encountered such a behaviour with the >>> pagetable work I've been doing. >> Thanks for the reply! >> >> I've forgotten a piece of information that I really should have written >> here: we would only set the D bit if A is already set and either the >> page is writable (has _PAGE_RW set) or CR0.WP is 0 (the latter case is >> admittedly more tricky). > > How about a new function which works similarly to guest-walk-tables, but > only ever sets A/D bits. > > Given information from hardware, we know the linear address, and that it > was a problem with the guest pagetables, from which we explicitly know > that it was from writing an A/D bit to a guest PTE. > > While walking down the levels, set any missing A bits and remember if we > set any. If we set A bits, consider ourselves complete and exit back to > the guest. If no A bits were set, and the access was a write (which we > know from the EPT violation information), then set the leaf D bit. > > This should be architecturally correct as it is exclusively derived from > information provided by the VMExit, and won't cause dirty bits to be > written in cases where the hardware wouldn't have written them > (speculative or otherwise). It does mean that an instruction which > would need to set A and D bits in the walk will take two EPT violations > to achieve the end result, but it probably is still quicker than sending > the vm_event out. Right, that's pretty much what we were proposing, a basic algoritm of: if ((pte & A) && (pte & RW)) pte |= D; pte |= A; where the if probably becomes: if ((pte & A) && ((pte & RW) || CR0.WP == 0)) pte |= D; pte |= A for the CR0.WP case. As discussed privately, there's also the case where two VCPUs may try to set A concurrently, which is what I assume is the case Jan has hinted at. Another small issue is that we do need to ignore the EPT violation information as it pertains to reads or writes: that will always be the page-walk access type, rw. Thanks, Razvan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |