[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Issue with shared information page on Xen/ARM 4.17
Hi Roger, On 04/10/2023 13:53, Roger Pau Monné wrote: On Wed, Oct 04, 2023 at 11:55:05AM +0100, Julien Grall wrote:Hi Roger, On 04/10/2023 09:13, Roger Pau Monné wrote:On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote:On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote:On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote:I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue is the changes with the handling of the shared information page appear to have broken things for me. With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build of the 4.17 release, mapping the shared information page doesn't work.This is due to 71320946d5edf AFAICT.Yes. While the -EBUSY line may be the one triggering, I'm unsure why. This seems a fairly reasonable change, so I had no intention of asking for a revert (which likely would have been rejected). There is also a real possibility the -EBUSY comes from elsewhere. Could also be 71320946d5edf caused a bug elsewhere to be exposed.A good way to know would be to attempt to revert 71320946d5edf and see if that fixes your issue. Alternatively you can try (or similar): diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 6ccffeaea57d..105ef3faecfd 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( page_set_xenheap_gfn(mfn_to_page(mfn), gfn); } else + { + printk("%u already mapped\n", space); /* * Mandate the caller to first unmap the page before mapping it * again. This is to prevent Xen creating an unwanted hole in @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( * to unmap it afterwards. */ rc = -EBUSY; + } p2m_write_unlock(p2m); }I'm using Tianocore as the first stage loader. This continues to work fine. The build is using tag "edk2-stable202211", commit fff6d81270. While Tianocore does map the shared information page, my reading of their source is that it properly unmaps the page and therefore shouldn't cause trouble. Notes on the actual call is gpfn was 0x0000000000040072. This is outside the recommended address range, but my understanding is this is supposed to be okay. The return code is -16, which is EBUSY. Ideas?I think the issue is that you are mapping the shared info page over a guest RAM page, and in order to do that you would fist need to create a hole and then map the shared info page. IOW: the issue is not with edk2 not having unmapped the page, but with FreeBSD trying to map the shared_info over a RAM page instead of a hole in the p2m. x86 behavior is different here, and does allow mapping the shared_info page over a RAM gfn (by first removing the backing RAM page on the gfn).An interesting thought. I thought I'd tried this, but since I didn't see such in my experiments list. What I had tried was removing all the pages in the suggested mapping range. Yet this failed.Yeah, I went too fast and didn't read the code correctly, it is not checking that the provided gfn is already populated, but whether the mfn intended to be mapped is already mapped at a different location.Since this seemed reasonable, I've now tried and found it fails. The XENMEM_remove_from_physmap call returns 0.XENMEM_remove_from_physmap returning 0 is fine, but it seems to me like edk2 hasn't unmapped the shared_info page. The OS has no idea at which position the shared_info page is currently mapped, and hence can't do anything to attempt to unmap it in order to cover up for buggy firmware. edk2 should be the entity to issue the XENMEM_remove_from_physmap against the gfn where it has the shared_info page mapped. Likely needs to be done as part of ExitBootServices() method. FWIW, 71320946d5edf is an ABI change, and as desirable as such behavior might be, a new hypercall should have introduced that had the behavior that the change intended to retrofit into XENMEM_add_to_physmap.I can see how you think this is an ABI change but the previous behavior was incorrect. Before this patch, on Arm, we would allow the shared page to be mapped twice. As we don't know where the firmware had mapped it this could result to random corruption. Now, we could surely decide to remove the page as x86 did. But this could leave a hole in the RAM. As the OS would not know where the hole is, this could lead to page fault randomly during runtime.I would say it's the job of the firmware to notify the OS where the hole is, by modifying the memory map handled to the OS. Or else mapping the shared_info page in an unpopulated p2m hole. I agree but I am not convinced that they are all doing it. At least U-boot didn't do it before we fixed it. When using UEFI there's RAM that will always be in-use by the firmware, as runtime services cannot be shut down, and hence the firmware must already have a way to remove/reserve such region(s) on the memory map. Can either you or Elliott confirm if EDK2 reserve the region? Neither of the two behaviors help the users. In fact, I think they only make the experience worse because you don't know when the issue will happen. AFAICT, there is no way for an HVM guestto know which GFN was inuse. So in all the cases, I can't think of a way for the OS to workaround properly buggy firmware. Therefore, returning -EBUSY is the safest we can do for our users and I don't view it as a ABI change (someone rely on the previous behavior is bound to failure).I fully agree the current behavior might not be the best one, but I do consider this part of the ABI, specially as booting guests using edk2 has now stopped working after this change. Right. If we remove the check, you may be able to boot a guest. But are we sure that such guest would run safely? Also, it is not really argument, but this is not the only broken part in EDK2 for Xen Arm guests. The other one I know is EDS makes assumption how some Device-Tree nodes and this will break on newer Xen. So overall, it feels to me that EDK2 is not entirely ready to be used in production for Xen on Arm guests. Cheers, -- Julien Grall
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |