[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: “Backend has not unmapped grant” errors



On Mon, Aug 29, 2022 at 02:55:55PM +0200, Juergen Gross wrote:
> On 28.08.22 07:15, Demi Marie Obenour wrote:
> > On Wed, Aug 24, 2022 at 08:11:56AM +0200, Juergen Gross wrote:
> > > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
> > > > On Tue, Aug 23, 2022 at 09:48:57AM +0200, Juergen Gross wrote:
> > > > > On 23.08.22 09:40, Demi Marie Obenour wrote:
> > > > > > I recently had a VM’s /dev/xvdb stop working with a “backend has not
> > > > > > unmapped grant” error.  Since /dev/xvdb was the VM’s private volume,
> > > > > > that rendered the VM effectively useless.  I had to kill it with
> > > > > > qvm-kill.
> > > > > > 
> > > > > > The backend of /dev/xvdb is dom0, so a malicious backend is clearly 
> > > > > > not
> > > > > > the cause of this.  I believe the actual cause is a race condition, 
> > > > > > such
> > > > > > as the following:
> > > > > > 
> > > > > > 1. GUI agent in VM allocates grant X.
> > > > > > 2. GUI agent tells GUI daemon in dom0 to map X.
> > > > > > 3. GUI agent frees grant X.
> > > > > > 4. blkfront allocates grant X and passes it to dom0.
> > > > > > 5. dom0’s blkback maps grant X.
> > > > > > 6. blkback unmaps grant X.
> > > > > > 7. GUI daemon maps grant X.
> > > > > > 8. blkfront tries to revoke access to grant X and fails.  Disaster
> > > > > >       ensues.
> > > > > > 
> > > > > > What could be done to prevent this race?  Right now all of the
> > > > > > approaches I can think of are horribly backwards-incompatible.  They
> > > > > > require replacing grant IDs with some sort of handle, and requiring
> > > > > > userspace to pass these handles to ioctls.  It is also possible that
> > > > > > netfront and blkfront could race against each other in a way that 
> > > > > > causes
> > > > > > this, though I suspect that race would be much harder to trigger.
> > > > > > 
> > > > > > This has happened more than once so it is not a fluke due to e.g. 
> > > > > > cosmic
> > > > > > rays or other random bit-flips.
> > > > > > 
> > > > > > Marek, do you have any suggestions?
> > > > > 
> > > > > To me that sounds like the interface of the GUI is the culprit.
> > > > > 
> > > > > The GUI agent in the guest should only free a grant, if it got a 
> > > > > message
> > > > > from the backend that it can do so. Just assuming to be able to free 
> > > > > it
> > > > > because it isn't in use currently is the broken assumption here.
> > > > 
> > > > FWIW, I hit this issue twice already in this week CI run, while it never
> > > > happened before. The difference compared to previous run is Linux
> > > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled.
> > > 
> > > I think this additional bug is just triggering the race in the GUI
> > > interface more easily, as blkfront will allocate new grants with a
> > > much higher frequency.
> > > 
> > > So fixing the persistent grant issue will just paper over the real
> > > issue.
> > 
> > Indeed so, but making the bug happen much less frequently is still a
> > significant win for users.
> 
> Probably, yes.
> 
> > In the long term, there is one situation I do not have a good solution
> > for: recovery from GUI agent crashes.  If the GUI agent crashes, the
> > kernel it is running under has two bad choices.  Either the kernel can
> > reclaim the grants, risking them being mapped at a later time by the GUI
> > daemon, or it can leak them, which is bad for obvious reasons.  I
> > believe the current implementation makes the former choice.
> 
> It does.
> 
> I don't have enough information about the GUI architecture you are using.
> Which components are involved on the backend side, and which on the
> frontend side? Especially the responsibilities and communication regarding
> grants is important here.

See Marek’s reply.

> > To fix this problem, I recommend the following changes:
> > 
> > 1. Treat “backend has not unmapped grant” errors as non-fatal.  The most
> >     likely cause is buggy userspace software, not an attempt to exploit
> >     XSA-396.  Instead of disabling the device, just log a warning message
> >     and put the grant on the deferred queue.  Even leaking the grant
> >     would be preferable to the current behavior, as disabling a block
> >     device typically leaves the VM unusable.
> 
> Sorry, I don't agree. This is a major violation of the normal I/O
> architecture. Your reasoning with the disabled block device doesn't make
> much sense IMHO, as the mapped grant was due to a bad interface leading to
> another component using a grant it was not meant to use.
> 
> Shutting down the block device is the right thing to do here, as data
> corruption might be happening.

In this case, the grants are being mapped read-only, so (unless I have
missed something) data corruption is not possible.

> > 3. Provide a means for a domain to be notified by Xen whenever one of
> >     its grants is unmapped.  Setting an event channel and writing to a
> >     shared ring would suffice.  This would allow eliminating the kludgy
> >     deferred freeing mechanism.
> 
> Interesting idea.
> 
> I believe such an interface would need to be activated per grant, as
> otherwise performance could suffer a lot. There are still some unused bits
> in the grant flags, one could be used for that purpose.

At least in the GUI case, large numbers of grants are typically unmapped
at once, and a notification is only necessary when the entire block has
been unmapped.  This should mitigate the performance concerns.

> I'm not sure how often this would be used. In case it is only for the rare
> case of unexpectedly long mapped grant pages, a simple event might do the
> job, with the event handler just skimming through the pending unmaps to
> find the grants being available again.

In Qubes OS, this happens so often that we had to patch the Linux kernel
to handle it better.  Prior to the patch, the background deferred
reclaim could not keep up, causing a memory leak.  Furthermore, the log
messages whenever an unmap had to be deferred were flooding the logs.
While we could change the GUI protocol to provide an unmap-time
notification, this is only because we use an LD_PRELOAD hack to hook
Xorg’s unmapping calls.  I would prefer to not continue to rely on this.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.