Xen project Mailing List

Re: [Xen-devel] Re: [PATCH 0 of 6] dm-userspace xen integrationpatches

From: "Andrew Warfield" <andrew.warfield@xxxxxxxxxxxx>

Date: Wed, 30 Aug 2006 07:37:33 -0700

Cc: Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx>, Julian Chesterfield <jac90@xxxxxxxxx>, Xen Devel <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 30 Aug 2006 07:38:05 -0700

Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=Hpbd8QMNfpsGpQjvbhZsdkmoCEazOm0ma2XwOr439wUxwZf2dA1tQU9oYoZt4nt+mxf0jAIU6C7nzPsWpCy67eoSO3vB+VPTwGbJMx6KfAh88O7owhJJtHO2/NTYip/ISNCMcKTIWizt+WnhdFLXEpYUqsnCvZqnaM3bMoolbqA=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi Dan,

Some do, but plenty do not.  However, being able to do snapshots, CoW,
rollback, etc while easily synchronizing with other pieces of Xen is
something that will be simpler with dm-userspace than with a hardware
device.  It also allows us to provide the same advanced features, no
matter what device we are working on top of, and independent of
whether or not it supports them.

Okay -- I think there may have been a disconnect on what the assumptions driving dscow's design were. Based on your clarifying emails these seem to be that an administrator has a block device that they want to apply cow to, and that they have oodles of space. They'll just hard-allocate a second block device of the same size as the original on a per-cow basis, and use this (plus the bitmap header) to write updates into. All of the CoW formats that I've seen use some form of pagetable-style lookup hierarchy to represent sparseness, frequently a combination of a lookup tree and leaf bitmaps -- your scheme is just the extreme of this... a zero-level tree. It seems like a possibly useful thing to have in some environments to use as a fast-image-copy operation, although it would be cool to have something that ran in the background and lazily copied all the other blocks over and eventually resulted in a fully linear disk image. Perhaps you'd consider adding that and porting the format as a plugin to the blktap tools as well? ;)

JC> What's the policy on metadata writes - are metadata writes
JC> synchronised with the acknowledgement of block IO requests?

Yes.  Ian asked for this the last time we posted our code.  We have
worked hard to implement this ability in dm-userspace/cowd between the
time we posted our original version and our recent post.

While the normal persistent domain case definitely needs this to be
"correct", there are other usage models for virtual machines that do
not necessarily need to have a persistent disk store.  We are able to
disable the metadata syncing (and the metadata writing altogether if
desired) and regain a lot of speed.

Yes -- we've seen comments from users who are very pleased with the better-than-disk write throughput that they achieve with the loopback driver ;) -- basically the same effect of letting the buffer cache step in and play things a little more "fast and loose". I've got a few questions about your driver code -- sorry for taking a while to get back to you on this: 1. You are allocating a per-disk mapping cache in the driver. Do you have any sense of how big this needs to be to be useful for common workloads? Generally speaking, this seems like a strange thing to add -- I understand the desire for an in-kernel cache to avoid context switches, but why make it implicit and LRU. Wouldn't it simplify the code considerably to allow the userspace stuff to manage the size and contents of the cache so that they can do replacement based on their knowledge of the block layout? 2. There are a heck of a lot of spin locks in that driver. Did you run into a lot of stability problems that led to aggressively conservative locking? Would removing the cache and/or switching some of the locks to refcounts simplify things at all? I haven't looked at the userspace stuff all that closely. Time permitting I'll peek at that and get back to you with some comments in the next little while. best, a. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.