[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [PATCH 0 of 6] dm-userspace xen integrationpatches


  • To: "Dan Smith" <danms@xxxxxxxxxx>
  • From: "Andrew Warfield" <andrew.warfield@xxxxxxxxxxxx>
  • Date: Wed, 30 Aug 2006 07:37:33 -0700
  • Cc: Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx>, Julian Chesterfield <jac90@xxxxxxxxx>, Xen Devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 30 Aug 2006 07:38:05 -0700
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=Hpbd8QMNfpsGpQjvbhZsdkmoCEazOm0ma2XwOr439wUxwZf2dA1tQU9oYoZt4nt+mxf0jAIU6C7nzPsWpCy67eoSO3vB+VPTwGbJMx6KfAh88O7owhJJtHO2/NTYip/ISNCMcKTIWizt+WnhdFLXEpYUqsnCvZqnaM3bMoolbqA=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi Dan,

Some do, but plenty do not.  However, being able to do snapshots, CoW,
rollback, etc while easily synchronizing with other pieces of Xen is
something that will be simpler with dm-userspace than with a hardware
device.  It also allows us to provide the same advanced features, no
matter what device we are working on top of, and independent of
whether or not it supports them.

Okay -- I think there may have been a disconnect on what the
assumptions driving dscow's design were.  Based on your clarifying
emails these seem to be that an administrator has a block device that
they want to apply cow to, and that they have oodles of space.
They'll just hard-allocate a second block device of the same size as
the original on a per-cow basis, and use this (plus the bitmap header)
to write updates into.

All of the CoW formats that I've seen use some form of pagetable-style
lookup hierarchy to represent sparseness, frequently a combination of
a lookup tree and leaf bitmaps -- your scheme is just the extreme of
this... a zero-level tree.  It seems like a possibly useful thing to
have in some environments to use as a fast-image-copy operation,
although it would be cool to have something that ran in the background
and lazily copied all the other blocks over and eventually resulted in
a fully linear disk image.  Perhaps you'd consider adding that and
porting the format as a plugin to the blktap tools as well? ;)

JC> What's the policy on metadata writes - are metadata writes
JC> synchronised with the acknowledgement of block IO requests?

Yes.  Ian asked for this the last time we posted our code.  We have
worked hard to implement this ability in dm-userspace/cowd between the
time we posted our original version and our recent post.

While the normal persistent domain case definitely needs this to be
"correct", there are other usage models for virtual machines that do
not necessarily need to have a persistent disk store.  We are able to
disable the metadata syncing (and the metadata writing altogether if
desired) and regain a lot of speed.

Yes -- we've seen comments from users who are very pleased with the
better-than-disk write throughput that they achieve with the loopback
driver ;) -- basically the same effect of letting the buffer cache
step in and play things a little more "fast and loose".

I've got a few questions about your driver code -- sorry for taking a
while to get back to you on this:

1. You are allocating a per-disk mapping cache in the driver.  Do you
have any sense of how big this needs to be to be useful for common
workloads?  Generally speaking, this seems like a strange thing to add
-- I understand the desire for an in-kernel cache to avoid context
switches, but why make it implicit and LRU.  Wouldn't it simplify the
code considerably to allow the userspace stuff to manage the size and
contents of the cache so that they can do replacement based on their
knowledge of the block layout?

2. There are a heck of a lot of spin locks in that driver.  Did you
run into a lot of stability problems that led to aggressively
conservative locking?  Would removing the cache and/or switching some
of the locks to refcounts simplify things at all?

I haven't looked at the userspace stuff all that closely.  Time
permitting I'll peek at that and get back to you with some comments in
the next little while.

best,
a.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.