[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Issue with shared information page on Xen/ARM 4.17
On 04.10.23 15:59, Roger Pau Monné wrote: Hello Roger > On Wed, Oct 04, 2023 at 11:42:32AM +0000, Oleksandr Tyshchenko wrote: >> >> >> On 04.10.23 13:55, Julien Grall wrote: >> >> Hello all. >> >>> Hi Roger, >>> >>> On 04/10/2023 09:13, Roger Pau Monné wrote: >>>> On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: >>>>> On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: >>>>>> On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: >>>>>>> I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current >>>>>>> issue is >>>>>>> the changes with the handling of the shared information page appear to >>>>>>> have broken things for me. >>>>>>> >>>>>>> With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build >>>>>>> of the 4.17 release, mapping the shared information page doesn't work. >>>>>> >>>>>> This is due to 71320946d5edf AFAICT. >>>>> >>>>> Yes. While the -EBUSY line may be the one triggering, I'm unsure why. >>>>> This seems a fairly reasonable change, so I had no intention of asking >>>>> for a revert (which likely would have been rejected). There is also a >>>>> real possibility the -EBUSY comes from elsewhere. Could also be >>>>> 71320946d5edf caused a bug elsewhere to be exposed. >>>> >>>> A good way to know would be to attempt to revert 71320946d5edf and see >>>> if that fixes your issue. >>>> >>>> Alternatively you can try (or similar): >>>> >>>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c >>>> index 6ccffeaea57d..105ef3faecfd 100644 >>>> --- a/xen/arch/arm/mm.c >>>> +++ b/xen/arch/arm/mm.c >>>> @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( >>>> page_set_xenheap_gfn(mfn_to_page(mfn), gfn); >>>> } >>>> else >>>> + { >>>> + printk("%u already mapped\n", space); >>>> /* >>>> * Mandate the caller to first unmap the page before >>>> mapping it >>>> * again. This is to prevent Xen creating an unwanted >>>> hole in >>>> @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( >>>> * to unmap it afterwards. >>>> */ >>>> rc = -EBUSY; >>>> + } >>>> p2m_write_unlock(p2m); >>>> } >>>> >>>>>>> I'm using Tianocore as the first stage loader. This continues to work >>>>>>> fine. The build is using tag "edk2-stable202211", commit fff6d81270. >>>>>>> While Tianocore does map the shared information page, my reading of >>>>>>> their >>>>>>> source is that it properly unmaps the page and therefore shouldn't >>>>>>> cause >>>>>>> trouble. >>>>>>> >>>>>>> Notes on the actual call is gpfn was 0x0000000000040072. This is >>>>>>> outside >>>>>>> the recommended address range, but my understanding is this is >>>>>>> supposed >>>>>>> to be okay. >>>>>>> >>>>>>> The return code is -16, which is EBUSY. >>>>>>> >>>>>>> Ideas? >>>>>> >>>>>> I think the issue is that you are mapping the shared info page over a >>>>>> guest RAM page, and in order to do that you would fist need to create >>>>>> a hole and then map the shared info page. IOW: the issue is not with >>>>>> edk2 not having unmapped the page, but with FreeBSD trying to map the >>>>>> shared_info over a RAM page instead of a hole in the p2m. x86 >>>>>> behavior is different here, and does allow mapping the shared_info >>>>>> page over a RAM gfn (by first removing the backing RAM page on the >>>>>> gfn). >>>>> >>>>> An interesting thought. I thought I'd tried this, but since I didn't >>>>> see >>>>> such in my experiments list. What I had tried was removing all the >>>>> pages >>>>> in the suggested mapping range. Yet this failed. >>>> >>>> Yeah, I went too fast and didn't read the code correctly, it is not >>>> checking that the provided gfn is already populated, but whether the >>>> mfn intended to be mapped is already mapped at a different location. >>>> >>>>> Since this seemed reasonable, I've now tried and found it fails. The >>>>> XENMEM_remove_from_physmap call returns 0. >>>> >>>> XENMEM_remove_from_physmap returning 0 is fine, but it seems to me >>>> like edk2 hasn't unmapped the shared_info page. The OS has no idea >>>> at which position the shared_info page is currently mapped, and hence >>>> can't do anything to attempt to unmap it in order to cover up for >>>> buggy firmware. >>>> >>>> edk2 should be the entity to issue the XENMEM_remove_from_physmap >>>> against the gfn where it has the shared_info page mapped. Likely >>>> needs to be done as part of ExitBootServices() method. >>>> >>>> FWIW, 71320946d5edf is an ABI change, and as desirable as such >>>> behavior might be, a new hypercall should have introduced that had the >>>> behavior that the change intended to retrofit into >>>> XENMEM_add_to_physmap. >>> I can see how you think this is an ABI change but the previous behavior >>> was incorrect. Before this patch, on Arm, we would allow the shared page >>> to be mapped twice. As we don't know where the firmware had mapped it >>> this could result to random corruption. >>> >>> Now, we could surely decide to remove the page as x86 did. But this >>> could leave a hole in the RAM. As the OS would not know where the hole >>> is, this could lead to page fault randomly during runtime. >> >> >> +1. >> >> In addition to what Julien has already said, I would like to say the >> same issue was faced due to U-Boot (running as a part of Xen guest >> before OS) didn't perform a cleanup before jumping to OS. This is >> already fixed to follow the current behavior. I didn't find >> corresponding U-Boot mail thread, but can point to already upstreamed >> commit in the main repo. > > What about other bootloaders? > > There might be other loaders that didn't unmap shared_info either, but > that removed the page where the shared_info is mapped from the > reported RAM ranges to the guest. In such case not doing the > unmapping was benign, as the guest would never use that gfn as RAM > anyway. I got your point, this way, I agree, not doing the unmapping wouldn't be a big problem. I am wondering whether such bootloaders that instead of unmapping indeed hiding/reserving the gfn (where shared_info is mapped to) are really present in the world (Arm)? Patch was merged more than 1 year ago, and the issue hasn't been noticed until now with EDK2. Let's wait for the addition input on how the EDK2 is really dealing with shared_info. > > Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |