Xen project Mailing List

Re: [Xen-devel] [PATCH v2] x86/mm: Suppresses vm_events caused by page-walks

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 19 Sep 2018 14:41:57 +0100

Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==

Cc: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, xen-devel@xxxxxxxxxxxxx, aisaila@xxxxxxxxxxxxxxx

Delivery-date: Wed, 19 Sep 2018 13:42:12 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

On 19/09/18 09:53, Jan Beulich wrote: >>>> On 18.09.18 at 20:20, <andrew.cooper3@xxxxxxxxxx> wrote: >> On 18/09/18 11:17, Jan Beulich wrote: >>>>>> On 18.09.18 at 11:47, <aisaila@xxxxxxxxxxxxxxx> wrote: >>>> On Thu, 2018-09-13 at 08:17 -0600, Jan Beulich wrote: >>>>>>>> On 12.09.18 at 11:47, <aisaila@xxxxxxxxxxxxxxx> wrote: >>>>>> The original version of the patch emulated the current instruction >>>>>> (which, as a side-effect, emulated the page-walk as well), however >>>>>> we >>>>>> need finer-grained control. We want to emulate the page-walk, but >>>>>> still >>>>>> get an EPT violation event if the current instruction would trigger >>>>>> one. >>>>>> This patch performs just the page-walk emulation. >>>>> Rather than making this basically a revision log, could you please >>>>> focus >>>>> on what you actually want to achieve? >>>>> >>>>> As to the title: "Suppress ..." please. >>>>> >>>>>> @@ -149,6 +151,10 @@ guest_walk_tables(struct vcpu *v, struct >>>>>> p2m_domain *p2m, >>>>>> ar_and &= gflags; >>>>>> ar_or |= gflags; >>>>>> >>>>>> + if ( set_ad && set_ad_bits(&l4p[guest_l4_table_offset(va)].l4, >>>>>> + &gw->l4e.l4, false) ) >>>>>> + accessed = true; >>>>> It is in particular this seemingly odd (and redundant with what's >>>>> done >>>>> later in the function) which needs thorough explanation. >>>> On this patch I've followed Andrew Cooper's suggestion on how to set >>>> A/D Bits: >>>> >>>> "While walking down the levels, set any missing A bits and remember if we >>>> set any. If we set A bits, consider ourselves complete and exit back to >>>> the guest. If no A bits were set, and the access was a write (which we >>>> know from the EPT violation information), then set the leaf D bit." >>>> >>>> If I misunderstood the comment please clarify. >>> It doesn't look to me as if you misunderstood anything, but only Andrew >>> can say for sure. However, none of this was in the description of your >>> patch (neither as part of the description, nor as code comment), and I >>> think you'd even have to greatly extend on this in order to explain to >>> everyone why the resulting behavior is still architecturally correct. In no >>> case should you assume anyone reading your patch (now or in the >>> future) has participated in the earlier discussion. >> The problem we have is that, while we know the EPT Violation was for a >> write of an A or D bit to a write-protected guest pagetable, we don't >> know if it was the A or the D bit which was attempting to be set. >> >> Furthermore (without emulating the instruction, which is what we are >> trying to avoid), we can't reconstruct the access. >> >> Access bits are only written if they were missing before, but may be set >> speculatively. Dirty bits are only set when a write is retired. From a >> practical point of view, the pipeline sets A and D bits as separate actions. >> >> Following this logic (and assuming for now a single vcpu), if we get a >> GPT EPT Violation, and there are missing access bits on the walk, then >> the fault is definitely from setting an access bit. > Definitely? Yes > Is there anything guaranteeing architecturally that an access > bit related EPT violation would be delivered earlier than any other one > on that same or a lower page table level? No, but why does that matter? Architecturally defined or not, we know that the action the processor was trying to perform was to set an A/D bit, because we got a vmexit telling us so. > It doesn't matter much for > the implementation (because of it being permissible to set the A bits > speculatively, as you also say further down, and any other violation > then re-occurring after exiting back to the guest once the A bits are > all set), but since we're discussing here what exactly the patch > description should contain, I think I'd prefer this to be fully correct there. > > Or wait - I think I can agree with "definitely", provided you further > restrict the context: "..., if we get a GPT EPT Write Violation ...". But > from what I can tell the patch'es change to p2m_mem_access_check() > doesn't apply (or pass on) any of these qualifications at all. I've not looked at the patch in detail yet. I'm tempted to suggest rearranging guest_walk_tables() to just set the access bits on the decent, rather than at the end. This matches how some hardware behaves when pulling entries into the paging structure cache. > >> Set all access bits >> and call it done. If we get a GPT EPT Violation and all access bits >> were set, then it was definitely from setting the Dirty bit. >> >> For multi-vcpu scenarios, things get racy. Setting all the Access bits >> is safe because its a speculative action, but a speculatively load on >> one vcpu can race with a write (to a read-only mapping) on the other >> vcpu, and would trick this algorithm into setting the dirty bit when the >> write would have faulted (and not set the dirty bit). >> >> Do we have numbers on how many the GPT EPT Violations are for (only) >> access sets, and how many are for dirty tsets? Would the first half of >> the algorithm (which is definitely not racy) still be a net perf win? > Does Windows make use of A bits at all? I'd expect most OSes to > simply set them right away, and actively use of the D bits. What gives you the expectation that OSes wouldn't use A bits? For paging out, the best options are non-accessed non-dirty page because their contents can be discarded immediately and reread from disk at a later point. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.