[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle

To: Jan Kara <jack@xxxxxxx>
From: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Date: Sat, 26 Aug 2023 03:28:52 +0100
Cc: linux-fsdevel@xxxxxxxxxxxxxxx, linux-block@xxxxxxxxxxxxxxx, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Alasdair Kergon <agk@xxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Anna Schumaker <anna@xxxxxxxxxx>, Chao Yu <chao@xxxxxxxxxx>, Christian Borntraeger <borntraeger@xxxxxxxxxxxxx>, "Darrick J. Wong" <djwong@xxxxxxxxxx>, Dave Kleikamp <shaggy@xxxxxxxxxx>, David Sterba <dsterba@xxxxxxxx>, dm-devel@xxxxxxxxxx, drbd-dev@xxxxxxxxxxxxxxxx, Gao Xiang <xiang@xxxxxxxxxx>, Jack Wang <jinpu.wang@xxxxxxxxx>, Jaegeuk Kim <jaegeuk@xxxxxxxxxx>, jfs-discussion@xxxxxxxxxxxxxxxxxxxxx, Joern Engel <joern@xxxxxxxxxxxxxxx>, Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx>, Kent Overstreet <kent.overstreet@xxxxxxxxx>, linux-bcache@xxxxxxxxxxxxxxx, linux-btrfs@xxxxxxxxxxxxxxx, linux-erofs@xxxxxxxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx, linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx, linux-mtd@xxxxxxxxxxxxxxxxxxx, linux-nfs@xxxxxxxxxxxxxxx, linux-nilfs@xxxxxxxxxxxxxxx, linux-nvme@xxxxxxxxxxxxxxxxxxx, linux-pm@xxxxxxxxxxxxxxx, linux-raid@xxxxxxxxxxxxxxx, linux-s390@xxxxxxxxxxxxxxx, linux-scsi@xxxxxxxxxxxxxxx, linux-xfs@xxxxxxxxxxxxxxx, "Md. Haris Iqbal" <haris.iqbal@xxxxxxxxx>, Mike Snitzer <snitzer@xxxxxxxxxx>, Minchan Kim <minchan@xxxxxxxxxx>, ocfs2-devel@xxxxxxxxxxxxxx, reiserfs-devel@xxxxxxxxxxxxxxx, Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>, Song Liu <song@xxxxxxxxxx>, Sven Schnelle <svens@xxxxxxxxxxxxx>, target-devel@xxxxxxxxxxxxxxx, Ted Tso <tytso@xxxxxxx>, Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Jens Axboe <axboe@xxxxxxxxx>, Christian Brauner <brauner@xxxxxxxxxx>
Delivery-date: Sat, 26 Aug 2023 02:30:02 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Aug 25, 2023 at 03:47:56PM +0200, Jan Kara wrote:

> I can see the appeal of not having to introduce the new bdev_handle type
> and just using struct file which unifies in-kernel and userspace block
> device opens. But I can see downsides too - the last fput() happening from
> task work makes me a bit nervous whether it will not break something
> somewhere with exclusive bdev opens. Getting from struct file to bdev is
> somewhat harder but I guess a helper like F_BDEV() would solve that just
> fine.
> 
> So besides my last fput() worry about I think this could work and would be
> probably a bit nicer than what I have. But before going and redoing the whole
> series let me gather some more feedback so that we don't go back and forth.
> Christoph, Christian, Jens, any opinion?

Redoing is not an issue - it can be done on top of your series just
as well.  Async behaviour of fput() might be, but...  need to look
through the actual users; for a lot of them it's perfectly fine.

FWIW, from a cursory look there appears to be a missing primitive: take
an opened bdev (or bdev_handle, with your variant, or opened file if we
go that way eventually) and claim it.

I mean, look at claim_swapfile() for example:
                p->bdev = blkdev_get_by_dev(inode->i_rdev,
                                   FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
                if (IS_ERR(p->bdev)) {
                        error = PTR_ERR(p->bdev);
                        p->bdev = NULL;
                        return error;
                }
                p->old_block_size = block_size(p->bdev);
                error = set_blocksize(p->bdev, PAGE_SIZE);
                if (error < 0)
                        return error;
we already have the file opened, and we keep it opened all the way until
the swapoff(2); here we have noticed that it's a block device and we
        * open the fucker again (by device number), this time claiming
it with our swap_info_struct as holder, to be closed at swapoff(2) time
(just before we close the file)
        * flip the block size to PAGE_SIZE, to be reverted at swapoff(2)
time That really looks like it ought to be
        * take the opened file, see that it's a block device
        * try to claim it with that holder
        * on success, flip the block size
with close_filp() in the swapoff(2) (or failure exit path in swapon(2))
doing what it would've done for an O_EXCL opened block device.
The only difference from O_EXCL userland open is that here we would
end up with holder pointing not to struct file in question, but to our
swap_info_struct.  It will do the right thing.

This extra open is entirely due to "well, we need to claim it and the
primitive that does that happens to be tied to opening"; feels rather
counter-intuitive.

For that matter, we could add an explicit "unclaim" primitive - might
be easier to follow.  That would add another example where that could
be used - in blkdev_bszset() we have an opened block device (it's an
ioctl, after all), we want to change block size and we *really* don't
want to have that happen under a mounted filesystem.  So if it's not
opened exclusive, we do a temporary exclusive open of own and act on
that instead.   Might as well go for a temporary claim...

BTW, what happens if two threads call ioctl(fd, BLKBSZSET, &n)
for the same descriptor that happens to have been opened O_EXCL?
Without O_EXCL they would've been unable to claim the sucker at the same
time - the holder we are using is the address of a function argument,
i.e. something that points to kernel stack of the caller.  Those would
conflict and we either get set_blocksize() calls fully serialized, or
one of the callers would eat -EBUSY.  Not so in "opened with O_EXCL"
case - they can very well overlap and IIRC set_blocksize() does *not*
expect that kind of crap...  It's all under CAP_SYS_ADMIN, so it's not
as if it was a meaningful security hole anyway, but it does look fishy.

Follow-Ups:
- Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
  - From: Christoph Hellwig

References:
- [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
  - From: Jan Kara
- Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
  - From: Al Viro
- Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
  - From: Jan Kara

Prev by Date: [xen-unstable test] 182517: tolerable FAIL - PUSHED
Next by Date: [linux-linus test] 182519: regressions - trouble: broken/fail/pass
Previous by thread: Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
Next by thread: Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.