Xen project Mailing List

Re: [RFC PATCH v2] xen/privcmd: Convert get_user_pages*() to pin_user_pages*()

To: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>

From: Souptick Joarder <jrdr.linux@xxxxxxxxx>

Date: Wed, 24 Jun 2020 22:11:08 +0530

Cc: Juergen Gross <jgross@xxxxxxxx>, sstabellini@xxxxxxxxxx, paul@xxxxxxx, John Hubbard <jhubbard@xxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 24 Jun 2020 16:41:28 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Jun 24, 2020 at 9:07 PM Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> wrote: > > On 6/23/20 9:36 PM, Souptick Joarder wrote: > > On Tue, Jun 23, 2020 at 11:11 PM Boris Ostrovsky > > <boris.ostrovsky@xxxxxxxxxx> wrote: > >> On 6/23/20 7:58 AM, Souptick Joarder wrote: > >>> In 2019, we introduced pin_user_pages*() and now we are converting > >>> get_user_pages*() to the new API as appropriate. [1] & [2] could > >>> be referred for more information. This is case 5 as per document [1]. > >>> > >>> As discussed, pages need to be marked as dirty before unpinned it. > >>> > >>> Previously, if lock_pages() end up partially mapping pages, it used > >>> to return -ERRNO due to which unlock_pages() have to go through > >>> each pages[i] till *nr_pages* to validate them. This can be avoided > >>> by passing correct number partially mapped pages & -ERRNO separately > >>> while returning from lock_pages() due to error. > >>> With this fix unlock_pages() doesn't need to validate pages[i] till > >>> *nr_pages* for error scenario. > >> > >> This should be split into two patches please. The first one will fix the > >> return value bug (and will need to go to stable branches) and the second > >> will use new routine to pin pages. > > Initially I split the patches into 2 commits. But at last moment I > > figure out that, > > this bug fix ( better to call coding error, doesn't looks like lead to > > any runtime bug) is tightly coupled to 2nd commit for > > pin_user_pages*() conversion, > > which means we don't need the bug fix patch if we are not converting the > > API to > > pin_user_pages*()/ unpin_user_pages_dirty_lock(). That's the reason to > > clubbed these two > > commits into a single one. > > > I am not sure I understand why the two issues are connected. Failure of > either get_user_pages_fast() or pin_user_pages() will result in the same > kind of overall ioctl failure, won't it? > Sorry, I think, I need to go through the bug again for my clarity. My understanding is -> First, In case of success lock_pages() returns 0, so what privcmd_ioctl_dm_op() will return to user is depend on the return value of HYPERVISOR_dm_op() and at last all the pages are unpinned. At present I am not clear about the return value of HYPERVISOR_dm_op(). But this path of code looks to be correct. second, incase of failure from lock_pages() will return either -ENOSPC / -ERRNO receive from {get|pin|_user_pages_fast() inside for loop. (at that point there might be some partially mapped pages as it is mapping inside the loop). Upon receiving -ERRNO privcmd_ioctl_dm_op() will pass the same -ERRNO to user (not the partial mapped page count). This looks to be correct behaviour from user space. The problem I was trying to address is, in the second scenario when lock_pages() returns -ERRNO and there are already existing mapped pages. In this scenario, when unlcok_pages() is called it will iterate till *nr_pages* to validate and unmap the pages. But it is supposed to unmap only the mapped pages which I am trying to address as part of bug fix. Let me know if I am able to be in sync with what you are expecting ? > One concern though is that we are changing how user will see this error. My understanding from above is user will always see right -ERRNO and will not be impacted. Another scenario I was thinking, if {get|pin|_user_pages_fast() return number of pages pinned which may be fewer than the number requested. Then lock_pages() returns 0 to caller and caller/user space will continue with the assumption that all pages are pinned correctly. Is this an expected behaviour as per current code ? > Currently Xen devicemodel (which AFAIK is the only caller) ignores it, > which is I think is wrong. So another option would be to fix this in Xen > and continue returning positive number here. I guess it's back to Paul > again. > > > -boris > >

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.