[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Linux pin_user_pages_fast fails on Xen

To: Juergen Gross <jgross@xxxxxxxx>
From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Date: Thu, 15 Sep 2022 17:12:20 -0700 (PDT)
Cc: Stefano Stabellini <stefano.stabellini@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, "NK, JESHWANTHKUMAR (JESHWANTH KUMAR)" <JESHWANTHKUMAR.NK@xxxxxxx>, "boris.ostrovsky@xxxxxxxxxx" <boris.ostrovsky@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Rangasamy, Devaraj" <Devaraj.Rangasamy@xxxxxxx>, "Pandeshwara krishna, Mythri" <Mythri.Pandeshwarakrishna@xxxxxxx>, "SK, SivaSangeetha (Siva Sangeetha)" <SivaSangeetha.SK@xxxxxxx>, "Thomas, Rijo-john" <Rijo-john.Thomas@xxxxxxx>
Delivery-date: Fri, 16 Sep 2022 00:12:28 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, 15 Sep 2022, Juergen Gross wrote:
> On 15.09.22 03:18, Stefano Stabellini wrote:
> > On Wed, 14 Sep 2022, Jan Beulich wrote:
> > > On 14.09.2022 01:31, Stefano Stabellini wrote:
> > > > The problem is that drivers/xen/privcmd.c:privcmd_mmap sets VM_IO |
> > > > VM_PFNMAP, and either flag would cause check_vma_flags to return
> > > > -EFAULT.
> > > > 
> > > > Do you know if it works if you remove VM_IO | VM_PFNMAP from
> > > > privcmd_mmap?
> > > 
> > > My Linux MM knowledge is certainly rusty, but I don't think this can
> > > work, at the very least not without further changes elsewhere.
> > 
> > The definition of VM_PFNMAP is:
> > 
> >      Page-ranges managed without "struct page", just pure PFN
> > 
> > So it made perfect sense to use VM_PFNMAP back in the day when we were
> > using address ranges without "struct page" for foreign mappings.
> > 
> > 
> > However, nowadays Linux drivers typically call
> > xen_alloc_unpopulated_pages to get local pages to be used for foreign
> > mappings. xen_alloc_unpopulated_pages should work for both PV and
> > autotranslated guests. So the local pages should have a regular "struct
> > page" backing them.
> 
> I agree that a struct page is associated with such PFNs.
> 
> I'm not sure there are no other implicit dependencies in our drivers relying
> on VM_PFNMAP and/or VM_IO being set.
> 
> This would need really intensive testing for verification.
>
> > I noticed that privcmd calls
> > alloc_empty_pages->xen_alloc_unpopulated_pages only for autotranslated
> > guests. Do you guys think it is intentional? In theory,
> > xen_alloc_unpopulated_pages should work for PV guests too.
> 
> I agree that it should work, but it isn't needed.

Sorry I don't follow -- why isn't it needed?

Don't we need a struct page so that get_user_pages_fast & friends work
correctly for a given address? I thought that was the original intention
behind all of the xen_alloc_unpopulated_pages rework.


> > After that, privcmd calls xen_remap_domain_gfn_array, which calls
> > xen_xlate_remap_gfn_array or xen_remap_pfn depending on
> > PV or autotranslated.
> > 
> > But then I can see the following at the top of xlate_remap_gfn_array:
> > 
> >     /* Kept here for the purpose of making sure code doesn't break
> >        x86 PVOPS */
> >     BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP |
> > VM_IO)));
> > 
> > and a similar one in arch/x86/xen/mmu_pv.c:xen_remap_pfn:
> > 
> >     BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP |
> > VM_IO)));
> > 
> > 
> > Given that the pages passed to xen_xlate_remap_gfn_array and
> > xen_remap_pfn could have been allocated with
> > xen_alloc_unpopulated_pages, why the BUG_ON?
> > 
> > Is this just legacy? In the sense that the following could work?
> > 
> > - privcmd calls xen_alloc_unpopulated_pages for both PV & autotranslated
> 
> This would add some performance penalty for PV compared to today.
> 
> > - no setting VM_PFNMAP | VM_IO
> 
> This would need some very thorough testing for not breaking any PV device.
> There are even more implications, as e.g. the kernel's memory management
> might interfere in extreme situations like memory shortage or page migration
> with pages not having set the flags (I might be wrong here, but better safe
> than sorry).
 
For sure it would need to be very well tested.

Given that we have done very good work in refactoring the kernel memory
allocation for mapping foreign pages, and now it is a lot more solid and
"compatible" with the rest of the Linux infrastructure, it would be nice
if we could exploit it to the fullest and get rid of bugs such as
"get_user_pages_fast doesn't work".


> > - no BUG_ON in xlate_remap_gfn_array
> > - no BUG_ON in xen_remap_pfn
> 
> Those are rather easy, as I'm not aware of those BUG_ON()s having triggered
> in the last few years.

References:
- Linux pin_user_pages_fast fails on Xen
  - From: Stefano Stabellini
- Re: Linux pin_user_pages_fast fails on Xen
  - From: Stefano Stabellini
- Re: Linux pin_user_pages_fast fails on Xen
  - From: Juergen Gross
- Re: Linux pin_user_pages_fast fails on Xen
  - From: Stefano Stabellini
- Re: Linux pin_user_pages_fast fails on Xen
  - From: Juergen Gross
- Re: Linux pin_user_pages_fast fails on Xen
  - From: Stefano Stabellini
- RE: Linux pin_user_pages_fast fails on Xen
  - From: NK, JESHWANTHKUMAR (JESHWANTH KUMAR)
- RE: Linux pin_user_pages_fast fails on Xen
  - From: NK, JESHWANTHKUMAR (JESHWANTH KUMAR)
- RE: Linux pin_user_pages_fast fails on Xen
  - From: Stefano Stabellini
- Re: Linux pin_user_pages_fast fails on Xen
  - From: Jan Beulich
- Re: Linux pin_user_pages_fast fails on Xen
  - From: Stefano Stabellini
- Re: Linux pin_user_pages_fast fails on Xen
  - From: Juergen Gross

Prev by Date: osstest branch refs and self-push-gate Re: Sorting out osstest
Next by Date: Re: [PATCH] libvirt: disable Werror for non-libvirt flights
Previous by thread: Re: Linux pin_user_pages_fast fails on Xen
Next by thread: Re: Linux pin_user_pages_fast fails on Xen
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.