[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 13/16] xen-blkback: Implement diskseq checks


  • To: Demi Marie Obenour <demi@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 12 Jun 2023 10:09:39 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1Z63R6Id1ABI9rujzXIHpOLD8BeAuCiLoI8/AFT5TQY=; b=cVgp0gQFtnxmAwDg35A6dri2dXipsfRZov78b2AZIu1Np/P45PXn9yOA943cmZXN8TXrA0qqY2Zde+XS+o7JCCsWaBWyJ9rXNjW9NajtSfaVh0x6/dVNVGtJ0QKqMSmfTpTXYe/lW/VGGykZIoBHQ9d/uod++eA/Pl6wuSJMKglXQSVKl/EHLBRPsKW+d/s1oVlkyXrjYSXfkgMXOf/0rxOtFepQ3Men3ZtVIZ59BxOyWvF80BofCdVNNFi8TirWDWNR9ebmKkr1Za9iPzev1ZCO5S0TWlrv0vvSUUW2rqHRk4Ql5evjGc5LYqoTpvU2Ay2OMLyENzI2pp74xT+96w==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dSqkSHt7/pSIOTrT75Ops3buGgZhWXa94OrNcJIUB7MSY5AhD7e4Tow83J2ZbIuEsGaG4vxAfK2DY43Go0hM2MbiLJ3NpCQeTaHk94N6IIr3xjjlLBYMDsD4FZoDayZk2LiNK8tiaiEFJm+XVI0K7AACUk2D50xtEtQlCfW9KFZZCMPoyVVLVFV/SnCgw+8Z4PiEKNbSCb8pVSSB884UIzqIsJ1s78jFt/3K8+YnbpOwQ+zaKDQpQUbNs4ytw1m8m5/oRPcIu/4HrezKRnkuEkm8oN9TvAIjFVzzCq3B3++z7rYNK2P2IHpdZYYdemYI18tVLrHRKByi6/FA1jh0gw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Jens Axboe <axboe@xxxxxxxxx>, Alasdair Kergon <agk@xxxxxxxxxx>, Mike Snitzer <snitzer@xxxxxxxxxx>, dm-devel@xxxxxxxxxx, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, linux-block@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Mon, 12 Jun 2023 08:10:10 +0000
  • Ironport-data: A9a23:Bnkjx6wIzIrPA2DKsjh6t+d0xirEfRIJ4+MujC+fZmUNrF6WrkUCx mNNW2uOPvyDZzTzLdhyYIu/8k4HsceDyIVgTVBs/iAxQypGp/SeCIXCJC8cHc8wwu7rFxs7s ppEOrEsCOhuExcwcz/0auCJQUFUjP3OHfykTrafYEidfCc8IA85kxVvhuUltYBhhNm9Emult Mj75sbSIzdJ4RYtWo4vw/zF8EsHUMja4mtC5QRgPKoT5TcyqlFOZH4hDfDpR5fHatE88t6SH 47r0Ly/92XFyBYhYvvNfmHTKxBirhb6ZGBiu1IOM0SQqkEqSh8ai87XAME0e0ZP4whlqvgqo Dl7WT5cfi9yVkHEsLx1vxC1iEiSN4UekFPMCSDXXcB+UyQq2pYjqhljJBheAGEWxgp4KX4V8 +YHNCkEVxrAoMynnImGZ9Ns1tt2eaEHPKtH0p1h5RfwKK56BLrlE+DN79Ie2yosjMdTG/qYf 9AedTdkcBXHZVtIJ0sTD5U92uyvgxETcRUB8A7T+fVxvDSVkFwZPLvFabI5fvSQQspYhACAr 3/u9GXlGBAKcteYzFJp91r137aQwHqqCd16+LuQyeZMjGyNyDwqVRRJeVKRsMC/qGWzVIcKQ 6AT0m90xUQoz2SnQ9z8dxy+vn+fvxQYVsZQEus18wWEwOzf5APxLmEURDlMQN8rrsk7QXom1 TehltzqLSZivL2cVTSW8bL8hSi0ODQHIHUqZiIeSwYBpd75r+kbgheKQtt9HaqdlND5GTjsh TuNqUAWgrwJiMcR2qaT/FbZgimtrJzEUg404ArMWmuvqAh+YeaNY42u9Eie5/JHJ4WxUFaMp j4HltKY4eRICouC/ASJQeMQDPSq6uyDPTn0n1FiBd8i+i6r9nrleppfiBluKUEsPssadDvBZ E7IpRgX9JJVJGGtb6J8f8S2EctC8ET7PdHsV/SRZNweZJF0LFOD5Hs2OR7W2H3xmk8xl615I Y2cbcunEXcdD+Jg0SayQOAel7Qsw0jS2F/ueHwy9Dz/uZL2WZJfYe1t3IemBgzh0J65nQ==
  • Ironport-hdrordr: A9a23:UBBpDaPo6PrRL8BcTjejsMiBIKoaSvp037BK7S1MoNJuEvBw9v re+sjzsCWftN9/Yh4dcLy7VpVoBEmsl6KdgrNhWotKPjOW21dARbsKheffKn/bakjDH4Zmvp uIGJIObOEYY2IasS77ijPIbOrJwrO8gd6VbTG19QYdceloAZsQnzuQEmygYzRLrJEtP+tFKH KbjPA33waISDAsQemQIGIKZOTHr82jruObXfZXbyRXkzVnlFmTmcTHLyQ=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Jun 09, 2023 at 12:55:39PM -0400, Demi Marie Obenour wrote:
> On Fri, Jun 09, 2023 at 05:13:45PM +0200, Roger Pau Monné wrote:
> > On Thu, Jun 08, 2023 at 11:33:26AM -0400, Demi Marie Obenour wrote:
> > > On Thu, Jun 08, 2023 at 10:29:18AM +0200, Roger Pau Monné wrote:
> > > > On Wed, Jun 07, 2023 at 12:14:46PM -0400, Demi Marie Obenour wrote:
> > > > > On Wed, Jun 07, 2023 at 10:20:08AM +0200, Roger Pau Monné wrote:
> > > > > > Can you fetch a disk using a diskseq identifier?
> > > > > 
> > > > > Not yet, although I have considered adding this ability.  It would be
> > > > > one step towards a “diskseqfs” that userspace could use to open a 
> > > > > device
> > > > > by diskseq.
> > > > > 
> > > > > > Why I understand that this is an extra safety check in order to 
> > > > > > assert
> > > > > > blkback is opening the intended device, is this attempting to fix 
> > > > > > some
> > > > > > existing issue?
> > > > > 
> > > > > Yes, it is.  I have a block script (written in C) that validates the
> > > > > device it has opened before passing the information to blkback.  It 
> > > > > uses
> > > > > the diskseq to do this, but for that protection to be complete, 
> > > > > blkback
> > > > > must also be aware of it.
> > > > 
> > > > But if your block script opens the device, and keeps it open until
> > > > blkback has also taken a reference to it, there's no way such device
> > > > could be removed and recreated in the window you point out above, as
> > > > there's always a reference on it taken?
> > > 
> > > This assumes that the block script is not killed in the meantime,
> > > which is not a safe assumption due to timeouts and the OOM killer.
> > 
> > Doesn't seem very reliable to use with delete-on-close either then.
> 
> That’s actually the purpose of delete-on-close!  It ensures that if the
> block script gets killed, the device is automatically cleaned up.

Block script attach getting killed shouldn't prevent the toolstack
from performing domain destruction, and thus removing the stale block
device.

OTOH if your toolstack gets killed then there's not much that can be
done, and the system will need intervention in order to get back into
a sane state.

Hitting OOM in your control domain however is unlikely to be handled
gracefully, even with delete-on-close.

> > > > Then the block script will open the device by diskseq and pass the
> > > > major:minor numbers to blkback.
> > > 
> > > Alternatively, the toolstack could write both the diskseq and
> > > major:minor numbers and be confident that it is referring to the
> > > correct device, no matter how long ago it got that information.
> > > This could be quite useful for e.g. one VM exporting a device to
> > > another VM by calling losetup(8) and expecting a human to make a
> > > decision based on various properties about the device.  In this
> > > case there is no upper bound on the race window.
> > 
> > Instead of playing with xenstore nodes, it might be better to simply
> > have blkback export on sysfs the diskseq of the opened device, and let
> > the block script check whether that's correct or not.  That implies
> > less code in the kernel side, and doesn't pollute xenstore.
> 
> This would require that blkback delay exposing the device to the
> frontend until the block script has checked that the diskseq is correct.

This depends on your toolstack implementation.  libxl won't start the
domain until block scripts have finished execution, and hence the
block script waiting for the sysfs node to appear and check it against
the expected value would be enough.

> Much simpler for the block script to provide the diskseq in xenstore.
> If you want to avoid an extra xenstore node, I can make the diskseq part
> of the physical-device node.

I'm thinking that we might want to introduce a "physical-device-uuid"
node and use that to provide the diskseq to the backened.  Toolstacks
(or block scripts) would need to be sure the "physical-device-uuid"
node is populated before setting "physical-device", as writes to
that node would still trigger blkback watch.  I think using two
distinct watches would just make the logic in blkback too
complicated.

My preference would be for the kernel to have a function for opening a
device identified by a diskseq (as fetched from
"physical-device-uuid"), so that we don't have to open using
major:minor and then check the diskseq.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.