[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 07/16] x86/shadow: call sh_update_cr3() directly from sh_page_fault()


  • To: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 23 Mar 2023 13:34:07 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ohgbkqq5ErLSp2I4RdJRJG8DMxExlUBTzggKv4tuSfg=; b=a9IRXKF/YWLPLa0pg98cmxehKCl5u+N3ivEL6EEYmzqSO/Hku7zaXnbkYC7otui2ZGOhMiKuIl+VQKC3VceE1u7aFxTF52tdf/4m3VqGdO2fz9mwYLKwOS3FeqhVRHcHB0Hud16nz4HSXwlKkPKS75yCP6zIJ4vvJhP5UCBsvdX4t5mCHkfBoh8+ytt1Ze/frk/WKUw8JFo2AzvHdtJQWDqW3lfMA1430h//usdkDFdbTPDGhkMWuY0uW2NeG/RwG9fv7bZEEyWtf7G9LahFq5/vo8MO/aoYeHF8flrlAOnMvCKNEwEGQL0ucFSWZX+S9POEdEVOSR91zWhObCGdpw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MHL64trQMUwm04gKKWb7zBlac2KXMT+HS3AfbwQNlcbyCOd4q22n1yw2epbIcuLQVS0xTDb0SE2vKaXi+O6dywMhA/psBPnRghUSuAdC1Wl0v/oInyurgTFBtejKpP5JdHNqqkxfd4xd1NdU5Dq+qPzFd83IvFWUOCzdpyyOoAzov+r4TbumRND5usvEdQYzQbEb6Rx1a+5s+m6VK+0Bp9+cJQJq4Y2EnzPhBraowLn/mWYfcr7fNSQWuY3a5lE4uQzKGBylIqQ8xNZVqPO8t5YOyDUWVRD004eU0nrFvgdgUzz2fsfIOAkwnfHSxj9yuWLz/BTR72dkSyCuBZFzrA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>
  • Delivery-date: Thu, 23 Mar 2023 13:34:41 +0000
  • Ironport-data: A9a23:wf8DKKOqaoCYRQzvrR2VlsFynXyQoLVcMsEvi/4bfWQNrUoh12AHy jcYUDqPPq3eZGXwf99zOoWzpkJS78TSmN9kHgto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CU6jufQAOKnUoYoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGj9SuvLrRC9H5qyo42tD5AdmP5ingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0tlqGVFw5 8A1Exk2ZxWqitKmwOKiFMA506zPLOGzVG8ekldJ6GiASNoDH9XESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PVxujaCpOBy+OGF3N79U9qGX8hK2G2fo XrL5T/RCRAGLt2PjzGC9xpAg8eWxX+nBNpLSuLQGvhCsVOd3lc0CwEsRV6j/KbnoUq+a9FhN BlBksYphe1onKCxdfH/VRClpH+PvjYHRsFdVeY97Wml2qfSpgqUGGUAZjpAc8A98t87QyQw0 V2ElM+vAiZg2JWKTVqN+7HSqim9UQAJLGoqdSICCwwf7LHeTJobixvOSpNvFfCzh9isQDXom WnU/W45mqkZitMN2+Oj51fbjjmwp5/PCAko+gHQWWHj5QR8DGK4W7GVBZHgxa4oBO6kopOp5 RDoR+D2ADgyMKyw
  • Ironport-hdrordr: A9a23:pYEm2a+xU5tL8TdY1lBuk+AcI+orL9Y04lQ7vn2ZKSY5TiX4rb HKoB1/73XJYVkqN03I9ervBEDiewK/yXcW2+ks1N6ZNWGLhILBFupfBODZsl7d8kPFl9K01c 1bAtJD4N+bNykGsS4tijPIb+rJw7O8gd+Vbf+19QYIcenzAZsQlzuQDGygYypLbTgDP7UVPr yG6PFKojKxEE5nFfhSVhE+Lo7+T8SgruOeXSI7
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22/03/2023 9:33 am, Jan Beulich wrote:
> There's no need for an indirect call here, as the mode is invariant
> throughout the entire paging-locked region. All it takes to avoid it is
> to have a forward declaration of sh_update_cr3() in place.
>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> I find this and the respective Win7 related comment suspicious: If we
> really need to "fix up" L3 entries "on demand", wouldn't we better retry
> the shadow_get_and_create_l1e() rather than exit? The spurious page
> fault that the guest observes can, after all, not be known to be non-
> fatal inside the guest. That's purely an OS policy.
>
> Furthermore the sh_update_cr3() will also invalidate L3 entries which
> were loaded successfully before, but invalidated by the guest
> afterwards. I strongly suspect that the described hardware behavior is
> _only_ to load previously not-present entries from the PDPT, but not
> purge ones already marked present. IOW I think sh_update_cr3() would
> need calling in an "incremental" mode here. (The alternative of doing
> this in shadow_get_and_create_l3e() instead would likely be more
> cumbersome.)
>
> In any event emitting a TRC_SHADOW_DOMF_DYING trace record in this case
> looks wrong.
>
> Beyond the "on demand" L3 entry creation I also can't see what guest
> actions could lead to the ASSERT() being inapplicable in the PAE case.
> The 3-level code in shadow_get_and_create_l2e() doesn't consult guest
> PDPTEs, and all other logic is similar to that for other modes.
>
> (See 89329d832aed ["x86 shadow: Update cr3 in PAE mode when guest walk
> succeed but shadow walk fails"].)

I recall that there was a complicated bug, ultimately discovered because
Win7 changed behaviour vs older versions, and the shadow logic had been
written to AMD's PAE behaviour, not Intel's.

Remember that Intel and AMD differer in how PAE paging works between
root and non-root mode, and it is to do with whether all PDPTRs get
cached at once, or on demand.

Off the top of my head:
* 32bit PV guests get on-demand semantics (as they're really 4-level)
* VT-x strictly use architectural "PDPTRs get cached on mov to CR3"
semantics
* SVM with NPT have on-demand semantics
* SVM with shadow is model-specific as to which semantics is uses, IIRC
Fam15h and later are on-demand

These differences still manifest bugs in the common HVM code, the PTE
caching code, and the pagewalk code.

Looking at the comment specifically, I'm pretty sure it's wrong.  I
think that suggests we've got even more PDPTR bugs than I'd previously
identified.  In some copious free time, I do need to extend the
pagetable-emul XTF test to include PDPTR updates because it's the one
area of pagewalking that doesn't have any suitable testing right now.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.