[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Re: [PATCH] Blktap: Userspace file-based image support.(RFC)
On Tue, 20 Jun 2006 14:10:30 -0700, Dan Smith wrote: > IP> It doesn't bypass the buffer cache (so all bets are off for data > IP> integrity) and can end up consuming all of dom0 memory with dirty > IP> buffers -- just create a few loop devices and do a few parallel > IP> dd's to them and watch the oomkiller go on the rampage. It's even > IP> worse if the filesystem the file lives on is slow e.g. NFS. > > Ok, it seems like this should be addressed in the upstream loop > driver. I imagine quite a few people are depending on the loop driver > right now, expecting it to maintain data integrity. It's probably worth spending some cycles trying to improve the loop driver itself. > Could the loop driver make use of the routines that do direct IO > instead of the normal routines to solve this when it's an issue? It appears that the loop driver is split between two threads using a producer/consumer queue. The main thread gets the bio requests and queues them for the consumer thread. The consumer thread can do a number of things depending on properties of the fd. It may use address ops, use fops->write, or do a transform of the data. It should be possible to, if the fd is opened with O_DIRECT and fops has a valid aio_{read,write}, use proper aio calls to queue the requests. You'll probably have to get clever about how the thread blocks (has to wake up either on the queue mutex or when an aio request completes). I suspect that this will have a pretty noticable performance improvement in the loop driver (especially on SCSI/SATA storage). The loop driver still has issues though. It cannot grow and it has a pretty odd hardcoded limit (256 devices) which quickly becomes a scalability issue. The former problem could possibly be address by having a parameter for SET_STATUS that let's you set the size of the device to be greater than the size of the underlying file. If a bio comes for an offset greater than the underlying file, it would have to be smart enough to ftruncate the file. The error handling is a bit tough (you'll have to make sure that if ftruncate fails, you fail the read/write--extra points if the failure is temporary such that later on if space is freed up you succeed). The hardcoded limit is a bit larger of a problem. The driver would likely need a bit of reworking. Since 256 is the limit based on minor number allocation, you would have to either get some more device number space for it or just have the ability to allocate dynamic numbers and rely on udev/hotplug for folks that want more than 256. > This brings me to another question: Will people really be using > file-based images for their VMs? It seems to me that the performance > of using a block device overshadows the convenience of a file image. If the performance of the loop driver could be better (and fundamentally, there's no reason it can't be pretty good), then I see no reason why using file images wouldn't be the most common approach. Files are quite a lot easier to manage than partitions. Of course, I see no reason why someone couldn't write a FUSE front-end to LVM :-) Regards, Anthony Liguori _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |