[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: “Backend has not unmapped grant” errors



On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote:
> On 23.08.22 09:40, Demi Marie Obenour wrote:
> > I recently had a VM’s /dev/xvdb stop working with a “backend has not
> > unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
> > that rendered the VM effectively useless.  I had to kill it with
> > qvm-kill.
> > 
> > The backend of /dev/xvdb is dom0, so a malicious backend is clearly not
> > the cause of this.  I believe the actual cause is a race condition, such
> > as the following:
> > 
> > 1. GUI agent in VM allocates grant X.
> > 2. GUI agent tells GUI daemon in dom0 to map X.
> > 3. GUI agent frees grant X.
> > 4. blkfront allocates grant X and passes it to dom0.
> > 5. dom0’s blkback maps grant X.
> > 6. blkback unmaps grant X.
> > 7. GUI daemon maps grant X.
> > 8. blkfront tries to revoke access to grant X and fails.  Disaster
> >     ensues.
> > 
> > What could be done to prevent this race?  Right now all of the
> > approaches I can think of are horribly backwards-incompatible.  They
> > require replacing grant IDs with some sort of handle, and requiring
> > userspace to pass these handles to ioctls.  It is also possible that
> > netfront and blkfront could race against each other in a way that causes
> > this, though I suspect that race would be much harder to trigger.
> > 
> > This has happened more than once so it is not a fluke due to e.g. cosmic
> > rays or other random bit-flips.
> > 
> > Marek, do you have any suggestions?
> 
> To me that sounds like the interface of the GUI is the culprit.
> 
> The GUI agent in the guest should only free a grant, if it got a message
> from the backend that it can do so. Just assuming to be able to free it
> because it isn't in use currently is the broken assumption here.

FWIW, I hit this issue twice already in this week CI run, while it never
happened before. The difference compared to previous run is Linux
5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The
only related commits I see there are three commits indeed related to
persistent grants:

  c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when connect
  ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when connect
  7304be4c985d xen-blkback: fix persistent grants negotiation

But none of the commit messages suggests intentional disabling it
without explicit request for doing so. I did not requested disabling it
in toolstack (although I have set backend as "trusted" - XSA-403).
I have confirmed it's the frontend version that matters. Running older
frontend kernel with 5.15.61 backend results in persistent grants
enabled (and both frontend and backend xenstore "feature-persistent"
entries are "1" in this case).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.