[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] Blktap: Userspace file-based image support.(RFC)

> AW> This should be fixable though.  I'm also not sure how carefully
> AW> dm-u watches block completion responses to ensure safety of
> AW> metadata updates relative to data writes.  This too should be
> AW> fixable -- i just don't know if the user-level tools can currently
> AW> request completion notifications on requests that they've
> AW> processed.
> So, right now, we're a little optimistic about metadata writing.  It
> will be relatively easy to hijack the callback routine for the disk
> request (a technique which is heavily used in the rest of the block
> layer) to get a completion trigger.  We can then notify userspace for
> the metadata write and then trigger the original callback routine for
> completion.

Yep, dm-userspace is certainly going to need to have a way of
intercepting IO completions and then choosing when it's actually going
to propagate the completion to the backend. That's quite a big change to
the current code (incidentally, the dm-snap code is pretty shocking in
this respect too).

> AW> A benefit to the dm-user patch is that it is more of a linux
> AW> approach than a xen+linux approach.  Dm-user will be generally
> AW> useful in the linux tree
> Right, this is a huge advantage, I think.  Being able to mount images
> as if they were disks will be quite helpful.  Another benefit is the
> ability to easily convert between formats.  Converting a vmdk to a
> qcow is as easy as mounting both and doing a "cp -R" between them.

I think the blktap code should definitely export a kernel device at the
top so that the same property holds. Should be easy to add.

> AW> which has some bad failure characteristics which can result in
> AW> both data being acknowledged as written even though it hasn't
> AW> been, and the OOM killer going insane.  I think some fixes to loop
> AW> probably need to be applied in the near future given how much
> AW> people are generally depending on the code with VMs.
> Can you elaborate about what specifically is wrong with the loop
> driver?

It doesn't bypass the buffer cache (so all bets are off for data
integrity) and can end up consuming all of dom0 memory with dirty
buffers -- just create a few loop devices and do a few parallel dd's to
them and watch the oomkiller go on the rampage. It's even worse if the
filesystem the file lives on is slow e.g. NFS.

> AW> Julian and I have talked about extending the tap driver to combine
> AW> it with blkback and allow block address translation without access
> AW> to request contents.
> Since the kernel already has a block address translation solution
> (i.e. device-mapper), is there a benefit to adding another
> xen-specific one?

I think blktap and dm-userspace are quite complementary, so I don't see
a problem with having them both in the tree. Right now, blktap looks to
be the more mature solution, but dm-userspace could catch up. Blktap
will obviously still be preferable when its necessary to actually touch
the data.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.