[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 13/16] xen-blkback: Implement diskseq checks


  • To: Demi Marie Obenour <demi@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 8 Jun 2023 10:29:18 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=umgdw07tohzefqm1klTgcXD7BeA8TvKMRB+InE5P3jg=; b=MVE8mbcVXYeBeRSccmnnm19EWiILjFW+kp1R0ktzakkIxBKYij3VtBEKXc9Q8vOHLtsuUwf7xJU50GdF9rFkt3l0m7K08tCP0janwr/d0C8gHdA/ZvD/AtZ6AmS186foYfx0cHC5ZEWK+9Q8oOT3I8TSUNMgpn3ix4KCQKcOFBQSlEhiq6T2figV3hpZc8OQkc/W7oKZVxeBVniZgnuR4ERicIJO5ueeRgd9j8nSaUnH2tfFzVVYpwtaZhXdMn8ikBnE4kmqOi5F6yhluvIx2W9ggwwm/+I+DPvGrSTpHp4MVtmrNf7I8w7PcPUVa4xA4AcQ0XdelghxCRCM3S1Pbg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KkGvDoFLayPBVRfeOJn+r/SaFz3/A8QpeQzoL103hppbBfee/Oq9CH1nScaZJql6O69GRm1wFEMC6zk9rbyWHr4srUohXWwa4tVP8X+VhAdaSrc5wUlS7Z7qFDJAk5ojdOD4sOONdKyRjbF8n1EF0kKe5tkzemr+uKmyNOTgpUrUzEvjlWCRfpfmJ1O3oVeVHFMc8wS5GknBGTwwfdFn0oPq8f/NBKskjhIx+/CAmgfXQbvkonLjefjxqBFkpI+s2tneSd81hHuWzVLtsMgce+KAZKAd0U9MaeQEz5mwuVFYgnlw+8X7iboDa8Ri1oEUAOwR0Ds9njKqGFeIfeoL8A==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Jens Axboe <axboe@xxxxxxxxx>, Alasdair Kergon <agk@xxxxxxxxxx>, Mike Snitzer <snitzer@xxxxxxxxxx>, dm-devel@xxxxxxxxxx, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, linux-block@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 08 Jun 2023 08:29:48 +0000
  • Ironport-data: A9a23:Z8zlDq5iuQ+exh1tlXUKbAxRtHvHchMFZxGqfqrLsTDasY5as4F+v mQWCmuOMvyNYWHyLtEjb9u0pBlUsMDVztNjQVQ5riphHi5G8cbLO4+Ufxz6V8+wwm8vb2o8t plDNYOQRCwQZiWBzvt4GuG59RGQ7YnRGvynTraCYnsrLeNdYH9JoQp5nOIkiZJfj9G8Agec0 fv/uMSaM1K+s9JOGjt8B5mr9lU35JwehBtC5gZlPa4T7AeH/5UoJMl3yZ+ZfiOQrrZ8RoZWd 86bpJml82XQ+QsaC9/Nut4XpWVTH9Y+lSDX4pZnc/DKbipq/0Te4Y5iXBYoUm9Fii3hojxE4 I4lWapc6+seFvakdOw1C3G0GszlVEFM0OevzXOX6aR/w6BaGpdFLjoH4EweZOUlFuhL7W5m2 vVIIw4rfx+6qcGa4/GRQ9IxucUIM5y+VG8fkikIITDxK98DGMqGZpqQoNhS0XE3m9xEGuvYa 4wBcz1zYR/cYhpJfFAKFJY5m+TujX76G9FagAvN+exrvC6NkkotjtABM/KMEjCObd9SkUuC4 HrP4kzyAw0ANczZwj2Amp6prraWx3+nCNtPSNVU8NY3q02N+m0uMCQxckOkh+mahVTicelAf hl8Fi0G6PJaGFaQZtXwWhyQoXiavwUdUd5dD+077g6WzqPepQ2eAwAsRy5Lb9EOt8IsQzEuk FOK9/vgCj9HqrCZSXuBsLyTqFuaIi4UMX0PfwcHQBED7t2lp5s85jrDS5NvHbC4ivXvFD3wy izMpy87750WhNQO3r+2/njGhSytvZnDSgMp5gTRUXmh5wk/b4mgD6Ss6F7G/bNKKIGSTXGfs 3Ue3cuT9uYDCdeKjiPlaOEMGqy5ovWIKjvRhXZxEJQ7sTeg4XiuecZX+j4WDFdkNIMIdCHkZ GfXuBhN/9lDMX2yd6h1bomtTcMwwsDd+c/NU/nVap9LfcJ3fQrepiV2PxbMgSbqjVQmlrw5N dGDa8GwAH0GCKNhij2rW+Ma1rxtzSc7rY/Oea3GI92c+eL2TBaopX0taTNisshRAHu4nTjo
  • Ironport-hdrordr: A9a23:G7onnapr57uthxW1nBDPqCAaV5rveYIsimQD101hICG9Evb0qy nOpoV/6faQslwssR4b9uxoVJPvfZq+z+8W3WByB9eftWDd0QPFEGgL1+DfKlbbak7DH4BmtJ uJc8JFeafN5VoRt7eG3OFveexQvOVu88qT9JjjJ28Gd3APV0n5hT0JcjpyFCdNNW57LKt8Lr WwzOxdqQGtfHwGB/7LfUXsD4D41rv2fIuNW29+OyIa
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Jun 07, 2023 at 12:14:46PM -0400, Demi Marie Obenour wrote:
> On Wed, Jun 07, 2023 at 10:20:08AM +0200, Roger Pau Monné wrote:
> > On Tue, Jun 06, 2023 at 01:01:20PM -0400, Demi Marie Obenour wrote:
> > > On Tue, Jun 06, 2023 at 10:25:47AM +0200, Roger Pau Monné wrote:
> > > > On Tue, May 30, 2023 at 04:31:13PM -0400, Demi Marie Obenour wrote:
> > > > > -     if (be->major | be->minor) {
> > > > > -             if (be->major != major || be->minor != minor)
> > > > > -                     pr_warn("changing physical device (from %x:%x 
> > > > > to %x:%x) not supported.\n",
> > > > > -                             be->major, be->minor, major, minor);
> > > > > +     diskseq_str = xenbus_read(XBT_NIL, dev->nodename, "diskseq", 
> > > > > &diskseq_len);
> > > > > +     if (IS_ERR(diskseq_str)) {
> > > > > +             int err = PTR_ERR(diskseq_str);
> > > > > +             diskseq_str = NULL;
> > > > > +
> > > > > +             /*
> > > > > +              * If this does not exist, it means legacy userspace 
> > > > > that does not
> > > > > +              * support diskseq.
> > > > > +              */
> > > > > +             if (unlikely(!XENBUS_EXIST_ERR(err))) {
> > > > > +                     xenbus_dev_fatal(dev, err, "reading diskseq");
> > > > > +                     return;
> > > > > +             }
> > > > > +             diskseq = 0;
> > > > > +     } else if (diskseq_len <= 0) {
> > > > > +             xenbus_dev_fatal(dev, -EFAULT, "diskseq must not be 
> > > > > empty");
> > > > > +             goto fail;
> > > > > +     } else if (diskseq_len > 16) {
> > > > > +             xenbus_dev_fatal(dev, -ERANGE, "diskseq too long: got 
> > > > > %d but limit is 16",
> > > > > +                              diskseq_len);
> > > > > +             goto fail;
> > > > > +     } else if (diskseq_str[0] == '0') {
> > > > > +             xenbus_dev_fatal(dev, -ERANGE, "diskseq must not start 
> > > > > with '0'");
> > > > > +             goto fail;
> > > > > +     } else {
> > > > > +             char *diskseq_end;
> > > > > +             diskseq = simple_strtoull(diskseq_str, &diskseq_end, 
> > > > > 16);
> > > > > +             if (diskseq_end != diskseq_str + diskseq_len) {
> > > > > +                     xenbus_dev_fatal(dev, -EINVAL, "invalid 
> > > > > diskseq");
> > > > > +                     goto fail;
> > > > > +             }
> > > > > +             kfree(diskseq_str);
> > > > > +             diskseq_str = NULL;
> > > > > +     }
> > > > 
> > > > Won't it be simpler to use xenbus_scanf() with %llx formatter?
> > > 
> > > xenbus_scanf() doesn’t check for overflow and accepts lots of junk it
> > > really should not.  Should this be fixed in xenbus_scanf()?
> > 
> > That would be my preference, so that you can use it here instead of
> > kind of open-coding it.
> 
> This winds up being a much more invasive patch as it requires changing
> sscanf().  It also has a risk (probably mostly theoretical) of breaking
> buggy userspace that passes garbage values here.

Well, if the current function is not suitable for your purposes it
would be better to fix it rather than open-code what you need.  Mostly
because further usages would then also need to open-code whatever
required.

> > > > Also, you tie this logic to the "physical-device" watch, which
> > > > strictly implies that the "diskseq" node must be written to xenstore
> > > > before the "physical-device" node.  This seems fragile, but I don't
> > > > see much better optiono since the "diskseq" is optional.
> > > 
> > > What about including the diskseq in the "physical-device" node?  Perhaps
> > > use diskseq@major:minor syntax?
> > 
> > Hm, how would you know whether the blkback instance in the kernel
> > supports the diskseq syntax in physical-device?
> 
> That’s what the next patch is for 🙂.

Hm, I think we should separate diskseq support from the notify open
stuff: it's possible a different (non-Linux) backend wants to
implement open notify support but doesn't have diskseq.

> > Can you fetch a disk using a diskseq identifier?
> 
> Not yet, although I have considered adding this ability.  It would be
> one step towards a “diskseqfs” that userspace could use to open a device
> by diskseq.
> 
> > Why I understand that this is an extra safety check in order to assert
> > blkback is opening the intended device, is this attempting to fix some
> > existing issue?
> 
> Yes, it is.  I have a block script (written in C) that validates the
> device it has opened before passing the information to blkback.  It uses
> the diskseq to do this, but for that protection to be complete, blkback
> must also be aware of it.

But if your block script opens the device, and keeps it open until
blkback has also taken a reference to it, there's no way such device
could be removed and recreated in the window you point out above, as
there's always a reference on it taken?

> > I'm not sure I see how the major:minor numbers would point to a
> > different device than the one specified by the toolstack unless the
> > admin explicitly messes with the devices before blkback has got time
> > to open them.  But then the admin can already do pretty much
> > everything it wants with the system.
> 
> Admins typically refer to e.g. device-mapper devices by name, not by
> major:minor number.  If a device is destroyed and recreated right as the
> block script is running, this race condition can occur.

Right, but what about this device recreation happening after the admin
has written the guest config file but before the call to (lib)xl
happens?  blkback would also end up using a different device than
indented, and your proposed approach doesn't fix this.  The only way to
solve this would be to reference devices by UUID (iow: diskseq)
directly in the guest config file.

Then the block script will open the device by diskseq and pass the
major:minor numbers to blkback.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.