[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Re: [PATCH 0 of 6] dm-userspace xen integrationpatches
AW> Okay -- I think there may have been a disconnect on what the AW> assumptions driving dscow's design were. Based on your clarifying AW> emails these seem to be that an administrator has a block device AW> that they want to apply cow to, and that they have oodles of AW> space. They'll just hard-allocate a second block device of the AW> same size as the original on a per-cow basis, and use this (plus AW> the bitmap header) to write updates into. Correct. Again, let me reiterate that I am not claiming that dscow is the best format for anything other than a few small situations that we are currently targeting :) We definitely want to eventually develop a sparse allocation method that will allow us to take a big block store and carve it up (like LVM does) for on-demand cow volumes. AW> All of the CoW formats that I've seen use some form of AW> pagetable-style lookup hierarchy to represent sparseness, AW> frequently a combination of a lookup tree and leaf bitmaps -- your AW> scheme is just the extreme of this... a zero-level tree. Indeed, many use a pagetable approach. Development of the above idea would definitely require it. FWIW, I believe cowloop uses a format similar to dscow. AW> It seems like a possibly useful thing to have in some environments AW> to use as a fast-image-copy operation, although it would be cool AW> to have something that ran in the background and lazily copied all AW> the other blocks over and eventually resulted in a fully linear AW> disk image. Yes, I have discussed this on-list a few times, in reference to live-copy of LVMs and building a local version of a network-accessible image, such as an nbd device. AW> Perhaps you'd consider adding that and porting the format as a AW> plugin to the blktap tools as well? ;) I do not really see the direct value of that. If the functionality exists with dm-userspace and cowd, then dm-userspace could be used to slowly build the image, while blktap could provide access to that image for a domain (in direct mode, as Julian pointed out). Building the functionality into dm-userspace would allow it to be generally applicable to vanilla linux systems. Why build it into a xen-specific component? AW> Yes -- we've seen comments from users who are very pleased with AW> the better-than-disk write throughput that they achieve with the AW> loopback driver ;) -- basically the same effect of letting the AW> buffer cache step in and play things a little more "fast and AW> loose". Heh, right. I was actually talking about increased performance against a block device. However, for this kind of transient domain model, a file will work as well. AW> 1. You are allocating a per-disk mapping cache in the driver. Do AW> you have any sense of how big this needs to be to be useful for AW> common workloads? My latest version (which we will post soon) puts a cap on the number of remaps each device can maintain. Changing from a 4096-map limit to a 16384-map limit makes some difference, but it does not appear to be significant. We will post concrete numbers when we send the latest version. AW> Generally speaking, this seems like a strange thing to add -- I AW> understand the desire for an in-kernel cache to avoid context AW> switches, but why make it implicit and LRU. Well, if you do not keep that kind of data in the kernel, I think performance would suffer significantly. The idea here is to have, at steady-state, a block device that behaves almost exactly like a device-mapper device (read: LVM) does right now. All block redirections happen in-kernel. Remember that the userspace side can invalidate any mapping cached in the kernel at any time. If userspace wanted to do cache management, it could do so. I have also discussed the possibility of feeding some coarse statistics back to userspace so it can make more informed decisions. I would not say that the caching is implicit. If you set the DMU_FLAG_TEMPORARY bit on a response, the kernel will not remember the mapping and thus will fault the next access back to userspace again. AW> Wouldn't it simplify the code considerably to allow the userspace AW> stuff to manage the size and contents of the cache so that they AW> can do replacement based on their knowledge of the block layout? I am not sure why this would be much better than letting the kernel manage it. The kernel knows two things that userspace does not: low-memory pressure and access statistics. I do not see why it would make sense to have the kernel collect and communicate access statistics for each block to userspace and then rely on it to evict unused mappings. Further, the kernel can run without the userspace component if no unmapped blocks are accessed. This allows a restart or upgrade of the userspace component without disturbing the device. It is entirely possible that I do not understand your point, so feel free to correct me :) AW> 2. There are a heck of a lot of spin locks in that driver. Did AW> you run into a lot of stability problems that led to aggressively AW> conservative locking? I think I have said this before, but: no performance analysis has been done on dm-userspace to identify areas of contention. The use of spinlocks was the best way (for me) to get things working and stable. Most of the spinlocks are used to protect linked lists, which I think is relatively valid. I should also point out that the structure of the entire thing has been a moving target up until recently. It is definitely possible that some of the places that a spinlock was used could be refactored under the current model. AW> Would removing the cache and/or switching some of the locks to AW> refcounts simplify things at all? Moving to something other than spinlocks for a few of the data structures may be possible; we can investigate and post some numbers on the next go-round. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@xxxxxxxxxx Attachment:
pgpeNDNdLFoKM.pgp _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |