[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2] xen/gntdev: Prevent leaking grants



On Mon, Oct 03, 2022 at 09:31:25PM -0400, M. Vefa Bicakci wrote:
> On 2022-10-02 20:29, Demi Marie Obenour wrote:
> > On Sun, Oct 02, 2022 at 06:20:05PM -0400, M. Vefa Bicakci wrote:
> > > Prior to this commit, if a grant mapping operation failed partially,
> > > some of the entries in the map_ops array would be invalid, whereas all
> > > of the entries in the kmap_ops array would be valid. This in turn would
> > > cause the following logic in gntdev_map_grant_pages to become invalid:
> > > 
> > >    for (i = 0; i < map->count; i++) {
> > >      if (map->map_ops[i].status == GNTST_okay) {
> > >        map->unmap_ops[i].handle = map->map_ops[i].handle;
> > >        if (!use_ptemod)
> > >          alloced++;
> > >      }
> > >      if (use_ptemod) {
> > >        if (map->kmap_ops[i].status == GNTST_okay) {
> > >          if (map->map_ops[i].status == GNTST_okay)
> > >            alloced++;
> > >          map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
> > >        }
> > >      }
> > >    }
> > >    ...
> > >    atomic_add(alloced, &map->live_grants);
> > > 
> > > Assume that use_ptemod is true (i.e., the domain mapping the granted
> > > pages is a paravirtualized domain). In the code excerpt above, note that
> > > the "alloced" variable is only incremented when both kmap_ops[i].status
> > > and map_ops[i].status are set to GNTST_okay (i.e., both mapping
> > > operations are successful).  However, as also noted above, there are
> > > cases where a grant mapping operation fails partially, breaking the
> > > assumption of the code excerpt above.
> > > 
> > > The aforementioned causes map->live_grants to be incorrectly set. In
> > > some cases, all of the map_ops mappings fail, but all of the kmap_ops
> > > mappings succeed, meaning that live_grants may remain zero. This in turn
> > > makes it impossible to unmap the successfully grant-mapped pages pointed
> > > to by kmap_ops, because unmap_grant_pages has the following snippet of
> > > code at its beginning:
> > > 
> > >    if (atomic_read(&map->live_grants) == 0)
> > >      return; /* Nothing to do */
> > > 
> > > In other cases where only some of the map_ops mappings fail but all
> > > kmap_ops mappings succeed, live_grants is made positive, but when the
> > > user requests unmapping the grant-mapped pages, __unmap_grant_pages_done
> > > will then make map->live_grants negative, because the latter function
> > > does not check if all of the pages that were requested to be unmapped
> > > were actually unmapped, and the same function unconditionally subtracts
> > > "data->count" (i.e., a value that can be greater than map->live_grants)
> > > from map->live_grants. The side effects of a negative live_grants value
> > > have not been studied.
> > > 
> > > The net effect of all of this is that grant references are leaked in one
> > > of the above conditions. In Qubes OS v4.1 (which uses Xen's grant
> > > mechanism extensively for X11 GUI isolation), this issue manifests
> > > itself with warning messages like the following to be printed out by the
> > > Linux kernel in the VM that had granted pages (that contain X11 GUI
> > > window data) to dom0: "g.e. 0x1234 still pending", especially after the
> > > user rapidly resizes GUI VM windows (causing some grant-mapping
> > > operations to partially or completely fail, due to the fact that the VM
> > > unshares some of the pages as part of the window resizing, making the
> > > pages impossible to grant-map from dom0).
> > > 
> > > The fix for this issue involves counting all successful map_ops and
> > > kmap_ops mappings separately, and then adding the sum to live_grants.
> > > During unmapping, only the number of successfully unmapped grants is
> > > subtracted from live_grants. The code is also modified to check for
> > > negative live_grants values after the subtraction and warn the user.
> > > 
> > > Link: https://github.com/QubesOS/qubes-issues/issues/7631
> > > Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()")
> > 
> > Looks like this patch has been pretty buggy, sorry.  This is the second
> > time there has been a problem with it.  Thanks for the fix.
> 
> Hi,
> 
> No problem! :-) Debugging this issue and coming up with a fix was a
> nice challenge for me.

You’re welcome!  I’m glad you were able to do this.

> > > Cc: stable@xxxxxxxxxxxxxxx
> > > Signed-off-by: M. Vefa Bicakci <m.v.b@xxxxxxxxxx>
> > > ---
> > > 
> > > Changes since v1:
> > > - To determine which unmap operations were successful, the previous
> > >    version of this patch set the "unmap_ops[i].status" and
> > >    "kunmap_ops[i].status" fields to the value "1" prior to passing these
> > >    data structures to the hypervisor. Instead of doing that, the code now
> > >    checks whether the "handle" fields in the same data structures were
> > >    *not* set to "INVALID_GRANT_HANDLE". (Suggested by Juergen Gross.)
> > > ---
> > >   drivers/xen/gntdev.c | 22 +++++++++++++++++-----
> > >   1 file changed, 17 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > > index 84b143eef395..eb0586b9767d 100644
> > > --- a/drivers/xen/gntdev.c
> > > +++ b/drivers/xen/gntdev.c
> > > @@ -367,8 +367,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map 
> > > *map)
> > >           for (i = 0; i < map->count; i++) {
> > >                   if (map->map_ops[i].status == GNTST_okay) {
> > >                           map->unmap_ops[i].handle = 
> > > map->map_ops[i].handle;
> > > -                 if (!use_ptemod)
> > > -                         alloced++;
> > > +                 alloced++;
> > >                   } else if (!err)
> > >                           err = -EINVAL;
> > > @@ -377,8 +376,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map 
> > > *map)
> > >                   if (use_ptemod) {
> > >                           if (map->kmap_ops[i].status == GNTST_okay) {
> > > -                         if (map->map_ops[i].status == GNTST_okay)
> > > -                                 alloced++;
> > > +                         alloced++;
> > >                                   map->kunmap_ops[i].handle = 
> > > map->kmap_ops[i].handle;
> > >                           } else if (!err)
> > >                                   err = -EINVAL;
> > > @@ -394,8 +392,14 @@ static void __unmap_grant_pages_done(int result,
> > >           unsigned int i;
> > >           struct gntdev_grant_map *map = data->data;
> > >           unsigned int offset = data->unmap_ops - map->unmap_ops;
> > > + int successful_unmaps = 0;
> > > + int live_grants;
> > >           for (i = 0; i < data->count; i++) {
> > > +         if (map->unmap_ops[offset + i].status == GNTST_okay &&
> > > +             map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE)
> > > +                 successful_unmaps++;
> > > +
> > >                   WARN_ON(map->unmap_ops[offset + i].status != GNTST_okay 
> > > &&
> > >                           map->unmap_ops[offset + i].handle != 
> > > INVALID_GRANT_HANDLE);
> > >                   pr_debug("unmap handle=%d st=%d\n",
> > > @@ -403,6 +407,10 @@ static void __unmap_grant_pages_done(int result,
> > >                           map->unmap_ops[offset+i].status);
> > >                   map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
> > >                   if (use_ptemod) {
> > > +                 if (map->kunmap_ops[offset + i].status == GNTST_okay &&
> > > +                     map->kunmap_ops[offset + i].handle != 
> > > INVALID_GRANT_HANDLE)
> > > +                         successful_unmaps++;
> > > +
> > >                           WARN_ON(map->kunmap_ops[offset + i].status != 
> > > GNTST_okay &&
> > >                                   map->kunmap_ops[offset + i].handle != 
> > > INVALID_GRANT_HANDLE);
> > >                           pr_debug("kunmap handle=%u st=%d\n",
> > > @@ -411,11 +419,15 @@ static void __unmap_grant_pages_done(int result,
> > >                           map->kunmap_ops[offset+i].handle = 
> > > INVALID_GRANT_HANDLE;
> > >                   }
> > >           }
> > > +
> > >           /*
> > >            * Decrease the live-grant counter.  This must happen after the 
> > > loop to
> > >            * prevent premature reuse of the grants by gnttab_mmap().
> > >            */
> > > - atomic_sub(data->count, &map->live_grants);
> > > + live_grants = atomic_sub_return(successful_unmaps, &map->live_grants);
> > > + if (WARN_ON(live_grants < 0))
> > > +         pr_err("%s: live_grants became negative (%d) after unmapping %d 
> > > pages!\n",
> > > +                __func__, live_grants, successful_unmaps);
> > >           /* Release reference taken by __unmap_grant_pages */
> > >           gntdev_put_map(NULL, map);
> > > -- 
> > > 2.37.3
> > 
> > Is there a possibility that live_grants could overflow, as it is now
> > set to a value twice as large as what it had been previously?
> 
> Good point! My answer in summary: I think that the code could be improved,
> but with reasonable values for the "limit" module parameter, there should
> not be issues.
> 
> Grant mappings are set up via ioctl calls, and the structure field that
> holds the number of grant references has u32 type:
> 
> (Quoting from kernel v5.15.71 for convenience)
> include/uapi/xen/gntdev.h
> === 8< ===
> struct ioctl_gntdev_map_grant_ref {
>       /* IN parameters */
>       /* The number of grants to be mapped. */
>       __u32 count;
> === >8 ===
> 
> However, the number of grant references is further limited in the actual
> ioctl handler function gntdev_ioctl_map_grant_ref(), which calls
> gntdev_test_page_count() to ensure that the number of granted pages
> requested to be mapped does not exceed "limit". "limit" defaults to 64K,
> which should be okay to use with an atomic_t type (i.e., a 32-bit signed
> integer type) like "live_grants", assuming that the system administrator
> does not go overboard and set "limit" to a very large value:
> 
> drivers/xen/gntdev.c
> === 8< ===
> static unsigned int limit = 64*1024;
> module_param(limit, uint, 0644);
> MODULE_PARM_DESC(limit,
>       "Maximum number of grants that may be mapped by one mapping request");
> 
> /* trimmed */
> 
> bool gntdev_test_page_count(unsigned int count)
> {
>       return !count || count > limit;
> }
> 
> /* trimmed */
> 
> static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
>                                      struct ioctl_gntdev_map_grant_ref __user 
> *u)
> {
>       /* trimmed */
> 
>       pr_debug("priv %p, add %d\n", priv, op.count);
>       if (unlikely(gntdev_test_page_count(op.count)))
>               return -EINVAL;
> 
>       /* trimmed */
> }
> === >8 ===
> 
> To be fair, the "count" field of the gndev_grant_map structure is a signed
> integer, so very large values of count could overflow live_grants, as
> live_grants needs to accommodate values up to and including 2*count.

Could this be replaced by an unsigned and/or 64-bit integer?
Alternatively, one could use module_param_cb and param_set_uint_minmax
to enforce that the limit is something reasonable.  That said, one needs
almost 8TiB to trigger this problem, so while it ought to be fixed it
isn’t a huge deal.  Certainly should not block getting this merged.

> drivers/xen/gntdev-common.h
> === 8< ===
> struct gntdev_grant_map {
>       atomic_t in_use;
>       struct mmu_interval_notifier notifier;
>       bool notifier_init;
>       struct list_head next;
>       int index;
>       int count;
>       /* trimmed */
> }
> === >8 ===
> 
> > If not, you can add:
> > 
> > Acked-by: Demi Marie Obenour <demi@xxxxxxxxxxxxxxxxxxxxxx>
> 
> Thank you! I hope that the explanation and rationale above are satisfactory.
> Please let me know what you think.

They are indeed.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.