[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issue with shared information page on Xen/ARM 4.17


  • To: Julien Grall <julien@xxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 4 Oct 2023 14:53:07 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Gak1ELPomXjZMaS5d7LVvNtq3KPQknErhH0eO81ZY4M=; b=PPKsUI22B1f5zZq2jH6SlPDpxHO2/l7yaf3m4UMUeKWcJJU0CF4QQKBvc8QS0zi/upWHnEHvcISrP72inmIPEGyanYwUHakomPebCbyb+XbgKxGFSeKeaOWvbC1FGjl4PTHlumpOHAwsYnbPQ4QRmTsMFqkvqh/MDm6JqV8rV9tfMhE0q9bMySl2J3wCMHIftQvzecUAZW/vqtsAqZD6I1mcSYc+kxwWl2FB4bGpX/EYTFKnPRrcBWK8BFDvLSbW+UdAs8RXWKsKywIJgiLsPTsgwuWdlvt5EgSXfJXxMmK9jVZ+pHEgAvuCFlColVvyXj2Q1Fjb/t0N/695iiqaBQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NjYWF9QaUZ53VIC5J1pV/RGA4bTwlp/7IeHAxUEoV8DQ2wzRMVN7PJUdqV3Hmvq+viFbUZ0v3gCP0PUb4DYoAllBsF9+saQaYv6awmhiWVM60iMqQ7o1zkdgLHx1jt6lDbK0G3WLJ/f3tqKc9cJAjmgd4kdGrqKC2YyUTkGKbW/uRGZRNhyJZ2H6ke78FVSwkrDQvaiOeRuY3e9MfLZtMtWcshfs3AjdeodrE8W03Ee4yr9fX9CW6oRPWGVyq4CLqPnajLcObqXYcK2QkExVpyoSsL4W0JOYVxGCynmoMdMybX06vW9F5Rb5gadjgMvqmRNlKlopWu9iMP02UM9KIg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Elliott Mitchell <ehem+xen@xxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
  • Delivery-date: Wed, 04 Oct 2023 12:53:46 +0000
  • Ironport-data: A9a23:+10uaqkjh+6prLE4o/3sjO7o5gwWJ0RdPkR7XQ2eYbSJt1+Wr1Gzt xJODzrXPP7bNmf2ed1/a9vk9E4DsMPXzdNnSAJr+yk8EiMWpZLJC+rCIxarNUt+DCFhoGFPt JxCN4aafKjYaleG+39B55C49SEUOZmgH+e6UKicfHkpGWeIcQ954Tp7gek1n4V0ttawBgKJq LvartbWfVSowFaYCEpNg064gE0p5K+aVA8w5ARkPqkT5AOGzBH5MbpETU2PByqgKmVrNrbSq 9brlNmR4m7f9hExPdKp+p6TnpoiG+O60aCm0xK6aoD66vRwjnVaPpUTbZLwXXx/mTSR9+2d/ f0W3XCGpaXFCYWX8AgVe0Ew/yiTpsSq8pefSZS0mZT7I0Er7xIAahihZa07FdRwxwp5PY1B3 dsWeSlUdQybvcad3bu8QLlstuYdBsa+aevzulk4pd3YJdAPZMiZBonvvppf1jp2gd1SF/HDY cZfcSBocBnLfxxIPBEQFY46m+CrwHL4dlW0qnrM/fZxvzeVkVI3jOSF3Nn9I7RmQe1PmUmVv CTe9nnRCRAGLt2PjzGC9xpAg8eWx36jAt9ISe3QGvhCjH6hzW8NORsvcnz4oKiit1exQc9UN BlBksYphe1onKCxdfH/Qhm5rXisrhMaHd1KHIUS5QGAz+nE7gCxAzUcCDVGbbQOpMIwADAny FKNt9foHiB09q2YT2qH8bWZpi/0PjIaRUcZfjMNRwYB59jloakwgwjJQ9IlF7S65vX/FCvs2 TmMoG47jq8KkM8Q/6yh+BbMhDfEjqbOSgk59wDGRFWP5wlyZJOmT4Gw4F2d5vFFRK6eSlSCp 3ECl9Kp8PEVDZqNmSqOR80ABLisof2CNVXhbUVHGpAg83Gh/iWldIUIujVmfh81boADZCPjZ 1LVtUVJ/phPMXC2bKhxJYWsF8AtyqumHtPgPhzJUudzjlFKXFfv1ElTiYS4hggBTGBEfXkDB Kqm
  • Ironport-hdrordr: A9a23:8eupC61aVdCcZhaT3mIQZwqjBEgkLtp133Aq2lEZdPU0SKGlfg 6V/MjztCWE7Ar5PUtLpTnuAsa9qB/nm6KdgrNhWItKPjOW21dARbsKheffKlXbcBEWndQtt5 uIHZIeNDXxZ2IK8PoT4mODYqodKA/sytHWuQ/cpU0dMz2Dc8tbnmBE4p7wKDwMeOFBb6BJcq a01458iBeLX28YVci/DmltZZm4mzWa/KiWGCLvHnQcmXGzsQ8=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Oct 04, 2023 at 11:55:05AM +0100, Julien Grall wrote:
> Hi Roger,
> 
> On 04/10/2023 09:13, Roger Pau Monné wrote:
> > On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote:
> > > On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote:
> > > > On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote:
> > > > > I'm trying to get FreeBSD/ARM operational on Xen/ARM.  Current issue 
> > > > > is
> > > > > the changes with the handling of the shared information page appear to
> > > > > have broken things for me.
> > > > > 
> > > > > With a pre-4.17 build of Xen/ARM things worked fine.  Yet with a build
> > > > > of the 4.17 release, mapping the shared information page doesn't work.
> > > > 
> > > > This is due to 71320946d5edf AFAICT.
> > > 
> > > Yes.  While the -EBUSY line may be the one triggering, I'm unsure why.
> > > This seems a fairly reasonable change, so I had no intention of asking
> > > for a revert (which likely would have been rejected).  There is also a
> > > real possibility the -EBUSY comes from elsewhere.  Could also be
> > > 71320946d5edf caused a bug elsewhere to be exposed.
> > 
> > A good way to know would be to attempt to revert 71320946d5edf and see
> > if that fixes your issue.
> > 
> > Alternatively you can try (or similar):
> > 
> > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> > index 6ccffeaea57d..105ef3faecfd 100644
> > --- a/xen/arch/arm/mm.c
> > +++ b/xen/arch/arm/mm.c
> > @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one(
> >                   page_set_xenheap_gfn(mfn_to_page(mfn), gfn);
> >           }
> >           else
> > +        {
> > +            printk("%u already mapped\n", space);
> >               /*
> >                * Mandate the caller to first unmap the page before mapping 
> > it
> >                * again. This is to prevent Xen creating an unwanted hole in
> > @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one(
> >                * to unmap it afterwards.
> >                */
> >               rc = -EBUSY;
> > +        }
> >           p2m_write_unlock(p2m);
> >       }
> > 
> > > > > I'm using Tianocore as the first stage loader.  This continues to work
> > > > > fine.  The build is using tag "edk2-stable202211", commit fff6d81270.
> > > > > While Tianocore does map the shared information page, my reading of 
> > > > > their
> > > > > source is that it properly unmaps the page and therefore shouldn't 
> > > > > cause
> > > > > trouble.
> > > > > 
> > > > > Notes on the actual call is gpfn was 0x0000000000040072.  This is 
> > > > > outside
> > > > > the recommended address range, but my understanding is this is 
> > > > > supposed
> > > > > to be okay.
> > > > > 
> > > > > The return code is -16, which is EBUSY.
> > > > > 
> > > > > Ideas?
> > > > 
> > > > I think the issue is that you are mapping the shared info page over a
> > > > guest RAM page, and in order to do that you would fist need to create
> > > > a hole and then map the shared info page.  IOW: the issue is not with
> > > > edk2 not having unmapped the page, but with FreeBSD trying to map the
> > > > shared_info over a RAM page instead of a hole in the p2m.  x86
> > > > behavior is different here, and does allow mapping the shared_info
> > > > page over a RAM gfn (by first removing the backing RAM page on the
> > > > gfn).
> > > 
> > > An interesting thought.  I thought I'd tried this, but since I didn't see
> > > such in my experiments list.  What I had tried was removing all the pages
> > > in the suggested mapping range.  Yet this failed.
> > 
> > Yeah, I went too fast and didn't read the code correctly, it is not
> > checking that the provided gfn is already populated, but whether the
> > mfn intended to be mapped is already mapped at a different location.
> > 
> > > Since this seemed reasonable, I've now tried and found it fails.  The
> > > XENMEM_remove_from_physmap call returns 0.
> > 
> > XENMEM_remove_from_physmap returning 0 is fine, but it seems to me
> > like edk2 hasn't unmapped the shared_info page.  The OS has no idea
> > at which position the shared_info page is currently mapped, and hence
> > can't do anything to attempt to unmap it in order to cover up for
> > buggy firmware.
> > 
> > edk2 should be the entity to issue the XENMEM_remove_from_physmap
> > against the gfn where it has the shared_info page mapped.  Likely
> > needs to be done as part of ExitBootServices() method.
> > 
> > FWIW, 71320946d5edf is an ABI change, and as desirable as such
> > behavior might be, a new hypercall should have introduced that had the
> > behavior that the change intended to retrofit into
> > XENMEM_add_to_physmap.
> I can see how you think this is an ABI change but the previous behavior was
> incorrect. Before this patch, on Arm, we would allow the shared page to be
> mapped twice. As we don't know where the firmware had mapped it this could
> result to random corruption.
> 
> Now, we could surely decide to remove the page as x86 did. But this could
> leave a hole in the RAM. As the OS would not know where the hole is, this
> could lead to page fault randomly during runtime.

I would say it's the job of the firmware to notify the OS where the
hole is, by modifying the memory map handled to the OS.  Or else
mapping the shared_info page in an unpopulated p2m hole.

When using UEFI there's RAM that will always be in-use by the
firmware, as runtime services cannot be shut down, and hence the
firmware must already have a way to remove/reserve such region(s) on
the memory map.

> Neither of the two behaviors help the users. In fact, I think they only make
> the experience worse because you don't know when the issue will happen.
> 
> AFAICT, there is no way for an HVM guestto know which GFN was inuse. So in
> all the cases, I can't think of a way for the OS to workaround properly
> buggy firmware. Therefore, returning -EBUSY is the safest we can do for our
> users and I don't view it as a ABI change (someone rely on the previous
> behavior is bound to failure).

I fully agree the current behavior might not be the best one, but I do
consider this part of the ABI, specially as booting guests using edk2
has now stopped working after this change.

Introducing a different hypercall, or even using
XENMAPSPACE_shared_info with idx = 1 to signal the usage of the new
behavior should be used instead.

This would also allow unifying the behavior between x86 and Arm.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.