[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Re: XEN - networking and performance

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

fpt stl wrote:

What is most likely happening here is that while your OS sees the storage as two devices, in fact they are on the same disk (or set of disks). So the copy becomes :

read a bit - seek - write a bit - seek write some metadata - seek - read a bit - seek - write a bit ...

That's a lot of seeking and seeks kill performance really badly.

It also depend son what that 8G is. A small number of big files stands a half decent chance of using some write cache to buffer some of the seeks, but if it's lots of small files then there'll be a huge amount of filesystem metadata to be updated as well.

And it also depend on what you are using for the copy. Some programs (such as dd and cpio) allow you to set a blocksize. Increasing this as far as your memory allows will help as that would mean reading a big chunk of data before seeking elsewhere to write it. Less seeks = better performance.

The copy is a plain cp and the files are around 100k - 200k each, with some gzip archives as well.

So, if I understand correctly, if a DomU (VM) has attached several disks/xvd's (LVs in Dom0 storage space) then the Dom0 or Xen hypervisor is responsible for moving the data between the DomU disks.

Alternately, if the DomU has just one xvd (a larger LM in Dom0 storage space) then only DomU's allocated resources will be used.

Dom0 does NOT handle transfers for DomU - at least not in the way that you mean.

It does not matter whether you export a whole disk and let DomU partition it, or partition it in Dom0 and export the partitions, or even use file based volumes in Dom0. In all cases, DomU does the data move/copy. Ie whatever tool you use in DomU will read the source data into memory and write it out to the destination. Even if it does a device-device transfer*, it will still be read into buffers in DomU memory and written out again.

The part Dom0 plays is to "pretend" to be a disk. So when a DomU reads a block of data, a thread on Dom0 translates that request into a read request for the appropriate location on the appropriate device, reads it, and passes it to DomU. Similarly, when DomU writes data, Dom0 simply translates the location and writes the data out. For raw disk devices (eg whole disks, whole partitions, or whole LVM volumes) then the mapping is a simple 1-1 map. Using sparse file storage, there's a bit more to it as the Dom0 filesystem will need to keep track of which parts of the virtual disk file exist and add extents as needed.

So the process to copy a block of data from one volume to another in a DomU is :
Application does read.
Filesystem/kernel/virtual device drivers etc translate this into a read request for a virtual block device.
The VBD drivers interfaces with it's counterpart in Dom0.
The virtual block storage code in Dom0 translates the request into a request for the appropriate device/partition/volume/file and reads the block.
The block is passed up to the VBD in DomU
The data is then presented up through the kernel/filesystem/etc code to the application.

Writing a block is pretty much the reverse of the above.

Note that except in special cases, a hypervisor doesn't (and can't) have intimate knowledge of the guest filesystems, and especially the filesystem state. That's not to say it can't be done (Microsoft do things like that in their latest systems) - but it cannot be done in the general case as it needs very intimate knowledge and cooperation between hypervisor and guest.

Thus a copy operation always involves the data being read into memory in the guest and then written out again.

* I believe there is a function where a program can ask the kernel etc to copy data directly from source to destination without using buffers in the program. This is still a read-into-memory then write-to-device operation.

James Harper wrote:

 > Interesting that a product I recall from 'some years ago' doesn't seem
to have
 popped up again - or perhaps it has and I never noticed since I'm not
 high end storage. This device looked to the host like a standard SCSI
disk, but
 internally it had a load of DRAM, a small (2 1/2" ?) disk, a
controller, and a
 small battery.
 Basically it was a big RAM disk with a SCSI interface, but when the
 went off it would write everything to disk. I suspect it probably had
 continuous process of writing dirty blocks to disk.
 Mind you, I suppose RAM does still cost somewhat more than disk.


It's a 500GB 2.5" disk with 4GB of SSD used as a cache. The drive
handles the caching internally so the OS just sees a disk. I have one in
my laptop (running Windows) and it seems to speed things up a great
deal, although I don't know how much of that is just that it's a 7200
rather than a 5400 RPM disk.

It's still a disk with some cache in front - and so still subject to seek delays if your working set is larger than the cache, and the cache is still SSD which is slower (especially on writes) than dynamic RAM. The product I recall was a true ram disk - with effectively zero seek times regardless of working set size.

The one I recall was also a) not that large, and b) eye wateringly expensive though. I suspect such things exist for those that need the performance and will pay for them.

Simon Hobson

Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed
author Gladys Hobson. Novels - poetry - short stories - ideal as
Christmas stocking fillers. Some available as e-books.

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.