[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 07/16] x86/shadow: call sh_update_cr3() directly from sh_page_fault()


  • To: Tim Deegan <tim@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 28 Mar 2023 12:37:12 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jbCZuIqNlPg+SwxFcp0FIazRUetdLwA6oZtyUR36JnM=; b=Gb1YTzLVcry66CR16WPyL+RaaFbjOZXJGSpawf7vSSwhAy2wzn2cWmKDEa3RABupOZFayuRWMfPSaYCYW1SqP5rd9TT9IzEKakuvpzWSxxSRuRr+FS7OXEa+1Xnt/x8iSGW3LGC01r/NNysrBmvzLJDPuX1XEAFVMvSK4tUKWCwLXSKeoMHjxjQOPGpve1TPX+fAo9Gy5TMZr+NaivA7yB1Q4nXXe3xInTnh6ofdgFffwUaaPaAlpm1jmur5Xl9+A8QWX+24y8LKYweUJ63gf6KGUVJIkWpTvyzp6peBrhbuEZoSwZmpElqvcWClvVToKr59jf874Mq3XkcPpF/Giw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=meESlKgkTSyNeuCJGZ335sg2W8Vsza3CIpAZjDge56F/opSBwWMiJ5ztDhZ5ksWtv4tsYfmmcrY6hfW5EMTlAM/ShXiEVw/d6qXK/j49jUuvWov2gfH2zpcYWBltLVncBh22R+DPOZ+MoSDSytL1r0RhBOC7I30VLQ0903D1DmgZdfYyLZf8varuSDINUC2wMXVI8cFR6c2Snt/qoGZwp9nySOq/rTo1Q+tINsGKKdfMf6+weFVVoZzYz28gg+ykfyeArpVN+OXBG7iYs01poZA8V8M2NX2VgoRWmYwpa4ctIIx4xdkjnTxouxioJtH7q9JUZaQw2UmqQ3pl7BvSGQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>
  • Delivery-date: Tue, 28 Mar 2023 10:37:23 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 27.03.2023 17:39, Tim Deegan wrote:
> At 10:33 +0100 on 22 Mar (1679481226), Jan Beulich wrote:
>> There's no need for an indirect call here, as the mode is invariant
>> throughout the entire paging-locked region. All it takes to avoid it is
>> to have a forward declaration of sh_update_cr3() in place.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> ---
>> I find this and the respective Win7 related comment suspicious: If we
>> really need to "fix up" L3 entries "on demand", wouldn't we better retry
>> the shadow_get_and_create_l1e() rather than exit? The spurious page
>> fault that the guest observes can, after all, not be known to be non-
>> fatal inside the guest. That's purely an OS policy.
> 
> I think it has to be non-fatal because it can happen on real hardware,
> even if the hardware *does* fill the TLB here (which it is not
> required to).

But if hardware filled the TLB, it won't raise #PF. If it didn't fill
the TLB (e.g. because of not even doing a walk when a PDPTE is non-
present), a #PF would be legitimate, and the handler would then need
to reload CR3. But such a #PF is what, according to the comment, Win7
chokes on. So it can only be the former case, yet what we do here is
fill the (virtual) TLB _and_ raise #PF. Win7 apparently ignores this
as spurious, but that's not required behavior for an OS afaik.

> Filling just one sl3e sounds plausible, though we don't want to go
> back to the idea of having L3 shadows on PAE!

Of course.

>> Furthermore the sh_update_cr3() will also invalidate L3 entries which
>> were loaded successfully before, but invalidated by the guest
>> afterwards. I strongly suspect that the described hardware behavior is
>> _only_ to load previously not-present entries from the PDPT, but not
>> purge ones already marked present.
> 
> Very likely, but we *are* allowed to forget old entries whenever we
> want to, so this is at worst a performance problem.

That depends on the model, I think: In the original Intel model PDPTEs
cannot be "forgotten". In some AMD variants, where L3 is walked normally,
they of course can be.

>> IOW I think sh_update_cr3() would
>> need calling in an "incremental" mode here.
> 
> This would be the best way of updating just the one entry - but as far
> as I can tell the existing code is correct so I wouldn't add any more
> complexity unless we know that this path is causing perf problems.

If it was/is just performance - sure.

>> In any event emitting a TRC_SHADOW_DOMF_DYING trace record in this case
>> looks wrong.
> 
> Yep.

Will add another patch to the series then.

>> Beyond the "on demand" L3 entry creation I also can't see what guest
>> actions could lead to the ASSERT() being inapplicable in the PAE case.
>> The 3-level code in shadow_get_and_create_l2e() doesn't consult guest
>> PDPTEs, and all other logic is similar to that for other modes.
> 
> The assert's not true here because the guest can push us down this
> path by doing exactly what Win 7 does here - loading CR3 with a
> missing L3E, then filling the L3E and causing a page fault that uses
> the now-filled L3E.  (Or maybe that's not true any more; my mental
> model of the pagetable walker code might be out of date)

Well, I specifically started the paragraph with 'Beyond the "on demand"
L3 entry creation'. IOW the assertion would look applicable to me if
we, as suggested higher up, retried shadow_get_and_create_l1e() and it
failed again. As the comment there says, "we know the guest entries are
OK", so the missing L3 entry must have appeared.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.