Xen project Mailing List

Re: Issue with shared information page on Xen/ARM 4.17

To: Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Wed, 4 Oct 2023 14:59:02 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=CVNOvd32f5UBcMAkpgUyu385fYKCUqijRhKV4I65DGw=; b=TiPtJUo6qWMnnKtq/FTM1TkLHc0AZ08TBytsvOGJ8ziJLaFZqbe0/ngRESVx8RD2wPS7Tl1PtVvzJaVk7+CAYspCsg4HX+JrMRW2AGVFVXAMG6Cl6+muhA0aMK5QQXXsGoRggDYadQ2f9XEfDbOz1VBAZN+OSYneGlMqcyqnhrIpwIIOAXKm+ybR3xdh3J/CVvVOp/VbFMt/n2vR66POCtkfvwbwNay1vO2VxxuacVqnY784vmy4Z41PWN56hJHHwBDxtcU6eR0gSlGOkra/zr9ZbAxC8Y2MB5kjVkQ+1XE1n5tJBeMrXcyzrMCpLdVN66hIzLEHBJQLJ1iasbWIBA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WODj7XDwth9fV61l4NFeCfcg8mcRHau5x7/QqWR5wcDCPFFn/qyn/+7y8Ngh8/wqI145juK4cnHdnSfOWcj7xQ1rF8q0Igx/hqGvH2D7/qbLpEmQBG8PLATabmIyQlwgUQnAU4oRKeylHx8hJEUu5WYN7UkLtHmqB83K5cz3E57/xRtZK5eEgp37p7ZFIQ1Qq50SvIs4ZzisyiZ4RdYOmpfc04sGcgHGTqMd5qZVCdHEO/CS8+cNbijxXU4UaKxfP09m28z+JqNdMIGyMMYJ+I+okYXWjzO1bD+3xGuyXlaKOwMy3R2PzpDJQrwZmhTiVWZosjLk6hzGWyGLk2V1Ow==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: Julien Grall <julien@xxxxxxx>, Elliott Mitchell <ehem+xen@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>

Delivery-date: Wed, 04 Oct 2023 12:59:25 +0000

Ironport-data: A9a23:gDnLnqMLGsANzSvvrR0ylsFynXyQoLVcMsEvi/4bfWQNrUoh0D0Hm GsYCj2BOvjbajD9L9skbd6+/UlV78CBzddrHgto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CQ6jefQAOOkVIYoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGjxSs/nrRC9H5qyo42pA5gBmPpingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0uVSDEFkz vgzFBQibhGFh/Kcz6+BEeY506zPLOGzVG8ekldJ6GmDSNoDGtXESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PRxvza7IA9ZidABNPLPfdOHX4NNl1uwr WPa5WXpRBodMbRzzBLcqC332LeTx3yTtIQ6Mp+p2eNB22ap2kNMJEwLVWW0sOajsxvrMz5YA wlOksY0loAV8EGoVdD7UwePnG+fvhUcVtxTFMU38AiIjKHT5m6xBHUATzNHQMwrsokxXzNC/ lKJktaqFTFpmLjIUTSW8bL8hSO/P20ZIHEPYQcATBAZ+J/zrYcrlBXNQ91/VqmvgbXdAirsy jqHqCw/gbQ7jsMR0ai/u1fdjFqErJXTUhQ84AmRW2u/9x54f6asfYnu4l/ehd5+K4KeQkiEr WIzscGU5+ATDrmAjCWIBu4KGdmUC+2tNTTdhRtjGsIn/jH1oXq7J9gMvnd5OVtjNdsCdXnxe kjPtAhN5ZhVeny3catwZIH3AMMvpUT9KenYujnvRoImSvBMmMWvpUmCuWb4M7jRrXUR

Ironport-hdrordr: A9a23:3mgilKMR2LR26sBcTjujsMiBIKoaSvp037BK7S1MoNJuEvBw9v re+MjzsCWftN9/Yh4dcLy7VpVoIkmskKKdg7NhXotKNTOO0AeVxelZhrcKqAeQeREWmNQ96U 9hGZIOdeEZDzJB/LrHCN/TKade/DGFmprY+9s31x1WPGZXgzkL1XYDNu6ceHcGIjVuNN4CO7 e3wNFInDakcWR/VLXAOpFUN9Kz3uEijfjdEGY7OyI=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Oct 04, 2023 at 11:42:32AM +0000, Oleksandr Tyshchenko wrote: > > > On 04.10.23 13:55, Julien Grall wrote: > > Hello all. > > > Hi Roger, > > > > On 04/10/2023 09:13, Roger Pau Monné wrote: > >> On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: > >>> On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: > >>>> On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: > >>>>> I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current > >>>>> issue is > >>>>> the changes with the handling of the shared information page appear to > >>>>> have broken things for me. > >>>>> > >>>>> With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build > >>>>> of the 4.17 release, mapping the shared information page doesn't work. > >>>> > >>>> This is due to 71320946d5edf AFAICT. > >>> > >>> Yes. While the -EBUSY line may be the one triggering, I'm unsure why. > >>> This seems a fairly reasonable change, so I had no intention of asking > >>> for a revert (which likely would have been rejected). There is also a > >>> real possibility the -EBUSY comes from elsewhere. Could also be > >>> 71320946d5edf caused a bug elsewhere to be exposed. > >> > >> A good way to know would be to attempt to revert 71320946d5edf and see > >> if that fixes your issue. > >> > >> Alternatively you can try (or similar): > >> > >> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > >> index 6ccffeaea57d..105ef3faecfd 100644 > >> --- a/xen/arch/arm/mm.c > >> +++ b/xen/arch/arm/mm.c > >> @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( > >> page_set_xenheap_gfn(mfn_to_page(mfn), gfn); > >> } > >> else > >> + { > >> + printk("%u already mapped\n", space); > >> /* > >> * Mandate the caller to first unmap the page before > >> mapping it > >> * again. This is to prevent Xen creating an unwanted > >> hole in > >> @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( > >> * to unmap it afterwards. > >> */ > >> rc = -EBUSY; > >> + } > >> p2m_write_unlock(p2m); > >> } > >> > >>>>> I'm using Tianocore as the first stage loader. This continues to work > >>>>> fine. The build is using tag "edk2-stable202211", commit fff6d81270. > >>>>> While Tianocore does map the shared information page, my reading of > >>>>> their > >>>>> source is that it properly unmaps the page and therefore shouldn't > >>>>> cause > >>>>> trouble. > >>>>> > >>>>> Notes on the actual call is gpfn was 0x0000000000040072. This is > >>>>> outside > >>>>> the recommended address range, but my understanding is this is > >>>>> supposed > >>>>> to be okay. > >>>>> > >>>>> The return code is -16, which is EBUSY. > >>>>> > >>>>> Ideas? > >>>> > >>>> I think the issue is that you are mapping the shared info page over a > >>>> guest RAM page, and in order to do that you would fist need to create > >>>> a hole and then map the shared info page. IOW: the issue is not with > >>>> edk2 not having unmapped the page, but with FreeBSD trying to map the > >>>> shared_info over a RAM page instead of a hole in the p2m. x86 > >>>> behavior is different here, and does allow mapping the shared_info > >>>> page over a RAM gfn (by first removing the backing RAM page on the > >>>> gfn). > >>> > >>> An interesting thought. I thought I'd tried this, but since I didn't > >>> see > >>> such in my experiments list. What I had tried was removing all the > >>> pages > >>> in the suggested mapping range. Yet this failed. > >> > >> Yeah, I went too fast and didn't read the code correctly, it is not > >> checking that the provided gfn is already populated, but whether the > >> mfn intended to be mapped is already mapped at a different location. > >> > >>> Since this seemed reasonable, I've now tried and found it fails. The > >>> XENMEM_remove_from_physmap call returns 0. > >> > >> XENMEM_remove_from_physmap returning 0 is fine, but it seems to me > >> like edk2 hasn't unmapped the shared_info page. The OS has no idea > >> at which position the shared_info page is currently mapped, and hence > >> can't do anything to attempt to unmap it in order to cover up for > >> buggy firmware. > >> > >> edk2 should be the entity to issue the XENMEM_remove_from_physmap > >> against the gfn where it has the shared_info page mapped. Likely > >> needs to be done as part of ExitBootServices() method. > >> > >> FWIW, 71320946d5edf is an ABI change, and as desirable as such > >> behavior might be, a new hypercall should have introduced that had the > >> behavior that the change intended to retrofit into > >> XENMEM_add_to_physmap. > > I can see how you think this is an ABI change but the previous behavior > > was incorrect. Before this patch, on Arm, we would allow the shared page > > to be mapped twice. As we don't know where the firmware had mapped it > > this could result to random corruption. > > > > Now, we could surely decide to remove the page as x86 did. But this > > could leave a hole in the RAM. As the OS would not know where the hole > > is, this could lead to page fault randomly during runtime. > > > +1. > > In addition to what Julien has already said, I would like to say the > same issue was faced due to U-Boot (running as a part of Xen guest > before OS) didn't perform a cleanup before jumping to OS. This is > already fixed to follow the current behavior. I didn't find > corresponding U-Boot mail thread, but can point to already upstreamed > commit in the main repo. What about other bootloaders? There might be other loaders that didn't unmap shared_info either, but that removed the page where the shared_info is mapped from the reported RAM ranges to the guest. In such case not doing the unmapping was benign, as the guest would never use that gfn as RAM anyway. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.