[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Figuring out a Storage system tailored for Xen

  • To: Zir Blazer <zir_blazer@xxxxxxxxxxx>, "xen-users@xxxxxxxxxxxxx" <xen-users@xxxxxxxxxxxxx>
  • From: "H. Sieger" <powerhouse.linux@xxxxxxxxx>
  • Date: Wed, 23 Apr 2014 03:19:18 -0700 (PDT)
  • Delivery-date: Wed, 23 Apr 2014 10:20:44 +0000
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=s5Iq3qyqTN6mJNcOveBeUSGdEQzx6pddH2fqrS/BqtkVMLjZ/hy/iDC/AGkBhu+E6w0/g7f/qlHRN37x79Jr0qdgJ5Y4p7Lin/jFO/KnZku8kOPiTby9lP3rJ2wARVj4rmeTY0Zg83TSCqkph7PMS44jklzDO+Zuaar+DY75VXc=;
  • List-id: Xen user discussion <xen-users.lists.xen.org>

I would start with LVM, as is suggested in the Xen wiki. I'm running Xen 4.3 on a Linux Mint 16 (= Ubuntu 13.10) machine with 32GB RAM, with one Windows 7 64bit pro HVM guest. I have moved all my PCs to LVM, using the following partitioning scheme:

sda1 - /boot - 1GB as ext2
sda2 - LVM volume assigned to the "main" VG - rest of the disk -> if you use EFI, you need another EFI partition
LVM "main" VG contains the following LVs:
root - / - 16GB as ext4
home - /home - 20GB (or whatever you need) as ext4
swap - swap - 34GB if you plan to hibernate the PC, much much less (~4-12GB) if you don't
vm1 - unassigned - unformatted for VM1
vm2 ...

With LVM you don't format your guest LVs, but let the guest OS do the formatting. In my case my Windows 7 VM would format the volume as NTFS, which is what you need. To access the guest LV from dom0, you need to use kpartx (you can't just mount the LV).

Performance wise I can't complain. The LVM based guest volumes perform the same as or very close to bare metal. Some others and I have done Passmark benchmarks which can be found http://forums.linuxmint.com/viewtopic.php?f=225&t=153482.
I've found that installing the GPLPV drivers in the Windows domU help disk performance significantly (make a backup of your Windows domU before installing, just in case).

Under the Linux dom0 you can move some /tmp folders to RAM. Edit your /etc/fstab file and add:
#force temporary files to be stored in memory instead of on disk
none /tmp     tmpfs nodev,nosuid,mode=1777 0 0
none /var/tmp tmpfs nodev,nosuid,mode=1777 0 0

Choosing a single 4TB drive is not the best choice performance wise. Multiple disks perform better, as you can use one drive for programs and another for the data, for example. But nothing beats the speed of a SSD.

I'm using a "small" 120GB SSD for both dom0 and my Windows VM (OS and programs, no data except the Lightroom catalog). My data resides on regular HDDs (6 of them now, probably more soon). Two of my data drives are striped LVM volumes to give RAID0 performance. All my data drives are backed up. I'm using LVM for everything except the /boot partition. This way I can add drives and resize volumes easily.

One thing to bear in mind when using LVM is to configure as few VGs (volume groups) as necessary. When I started using LVM I created a VG for every different type of data/storage that I used: dom0, guests, data, media, you name it. That turned out to be a big mistake! To really benefit from the flexibility LVM offers, you need to have as few VGs as possible.

Today I use a "main" VG and a "backup" VG. Volumes (LVs) in these groups must never be on the same physical drive, because if a drive in the "main" VG goes bad I must be able to restore it from a different physical drive (one in the "backup" VG).
Looking into other file systems like ZFS or BTRFS may be worth when running large servers or performance tuning in data centers, but I wouldn't bother with it on a desktop PC. You gave enough reasons to avoid them.

And don't even consider "hardware" RAID, or what some motherboard manufacturers call their BIOS-based RAID feature. Hardware RAID should only be considered when using a professional PCI RAID controller card that will also cost some serious money. Linux software RAID is fine though, but I personally am contained with LVM. By the way, you can also combine LVM with RAID. Of course RAID is only a consideration if you run more than one drive, preferably 2 or more drives of the same make, model, capacity, etc.

Under Xen, the toolstack may also have an influence on the domU performance. I'm currently using the xl toolstack with qemu-xen-traditional (qemu-xen won't work with VGA passthrough) and find it performs very good under Xen 4.3, using a Nvidia Quadro card for the domU. Xen 4.4 and xl also works very well with AMD cards' passthrough, without the issues that were encountered in previous releases (I tested Ubuntu 14.04 with Xen 4.4 and a AMD 6450 card).
On Tuesday, April 22, 2014 7:44 PM, Zir Blazer <zir_blazer@xxxxxxxxxxx> wrote:
After around 4 months using Xen, I must say that the experience has been quite satisfactory, and pretty much meet my expectations of how it would be to migrate everything to a fully virtualized environment, where I'm not bound to the limitations of a single OS, but I can choose the best OS for specific use scenarios and fast switch between them. However, after I managed to get working most of the features I needed to be in a production-ready status (Mainly VGA Passthrough for gaming, etc), I simply stopped tinkering with Xen and Dom0 configuration, which in my current setup is very far from polished. I have been trying to resume every now and then, but I didn't received enough feedback when I posted this on some other Forums, yet for some reason I forgot about asking here on xen-users. I expect I may get help or ideas about what to do. But I'l warn you: If anyone have read any Post mine before, you should expect a long wall of text that isn't concise. I'm not gonna disappoint today, neither. 

Possibly one of the things that I don't feel comfortable with in the Linux world, is that you have nearly infinite choices of how to do something. This is a pain in the butt when you need to figure out pros and cons of a long list of choices, then pick one solution that should be the best among them. My problem is that I'm still unsure about what to pick, nor if it is well planned, nor if it is going to work. Even after tons of googling, I barely found people that attempted to achieve the same that I want to learn from their experiences, and their setups or use cases were quite different to mine, so I can't directly apply or rely on them. And for some of the more complicated things, I don't have the knowledge to even understand how to make it work.
This is compounded by the fact that storage itself is a quite deep and messy topic. Everyone and every guide you read, has usually different suggestions and styles about things like how many partitions to do, file systems, etc. Add in the fact that you can't really play a lot with storage, as for re-partitioning and re-formatting to start from scratch, you have to move a ton of data from one computer to another and back. I want to set in stone what I have to do so I can get it right on the next time I deal with this.

Basically, what I want to do is get a definite word of how to get the best I/O performance and reliability for my current setup and usage fashion. Currently I have a 4 TB HD, it has a 10 GB EXT4 partition where I have installed Arch Linux then Xen 4.3.1 on top of it (Will upgrade to 4.4 next time I tinker), and a very big 1 TB EXT4 partition that I use for both storing the DomUs as IMG files (With tap:tapdisk:aio on the Xen CFG file), and general storage. The most important DomU is a Windows XP SP3 installation that I use for gaming, but I notice that performance is quite low on anything involving lots of small files. An example should be League of Legends, whose folder seems composed of a bucketload of small files. The splash screen (Before the actual loading screen) takes around 45 seconds or so, when for most other people is less than 10 seconds. Loading times themselves and in-game performance are good, so I suppose than that long splash screen is I/O related given the fact that LoL has such an awful amount of small files, as it doesn't happens in other games which have a few big files.

I know than that performance issue should be easily solved by using LVM, which according to this should give near native I/O performance:

There are several guides that explains how to install and setup LVM, and how to give Xen a raw, unformatted LVM partition. That should be the easier way to solve my current performance issue. However, I was enticed by reading about more advanced File Systems, like ZFS and BTRFS:

Anyways, after lots of research, I have written about what I intend to do with my current single 4 TB HD in a more coherent and straightforward way. There are a lot of holes that I wasn't able to fill regarding if something is going to work or not, which is where I need suggestions the most.

As I have a 4 TB HD, formatting it with GPT (gdisk) instead of MBR (fdisk) is pretty much mandatory due to the 2.2 TB barrier. Currently, I'm using Syslinux as Boot Loader, as it supports GPT in BIOS mode. It also allows me to edit its CFG file to hide some PCI devices from Dom0 so I can do passthrough to DomU without hotplugging or having Dom0 initializing them first, but GRUB also supported that, so is redundant.

My original idea was to do everything with UEFI and GPT to fully drop legacy. I could do that using Gummiboot as Boot Manager to launch the xen.efi executable, or alternatively, adding the xen.efi route to the UEFI Boot Menu itself for something even more slim. However, I was never able to make Xen work in UEFI mode, and due to lack of tools to debug my issue when I posted in xen-devel, I wasn't able to push further. Recently there has been some patchs in xen-devel which added some additional UEFI support, so it may give it a try at a later time:

The first thing to have in consideration is that the HD performance is variable depending on what part of the platter the data is physically at, being faster at the outer edge and slower near the motor. This means that the data that should be used often (Which should include the Hypervisor installation, and maybe some of the most important VM's storage) should be in the outer edge. As far that I know, LBA addresses starts from the outer edge and ends on the inner tracks, so if you make partitions following order of importance on a fresh HD, you will get it right.

The actual question should be how many physical partitions are actually needed, and what should be either an appropiate or confortable size for them. It should be a point where I'm not wasting tons of space that will never be used, yet will never feel the need to resize them because they're too small and causes them to run out of space for critical stuff that HAS to be there.
I expected that I would need at minimum 3 physical partitions: The first one will be the EFI System Partition (Which will be unused due my UEFI issue, it is mostly a placeholder for a later time), that according to some Microsoft info about the ESP, it had a recommended size of around 3xx MB and has to be FAT32 formatted. I decided to settle on 512 MB for the ESP. The second one will be the Hypervisor installation (Arch Linux + Xen), for which 10 GB seems to be enough (Is what I am currently using), through I don't know how much it could grow if, say, I had anything that did intensive logging of the Hypervisor activities. It could also need to be bigger assuming I were to store for convenience multiple installation ISOs there instead of somewhere else. Finally, the third partition could be a single, big, storage partition which all the remaining space (3+ TB). 
Some other considerations may include that if I were to use other GPT-capable OSes, I would need a partition for each in case I'm intending to run them native, but as the idea is to run everything virtualized and not even bother with a native option, I don't see a need for those. Also, there could be more than one storage partition, as if I were to want to guarantee that the data is physically on the outer tracks boundary, so instead of a single, big, data partition, I could have two or three like if they were priority layers.
Examples of how my HD could end partitioned would look like this (And also, in this LBA order):

1- ESP (512 MB, FAT32)
2- Hypervisor (10 GB)
3- Storage fast (1 TB)
4- Storage slow (Remaining 2.9 TB or so)

1- ESP (512 MB, FAT32)
2- Hypervisor (10 GB)
3- Native OS 1 (120 GB or so)
4- Native OS 2 (120 GB or so)
5- Storage fast (1 TB)
6- Storage slow (Remaining 2.6 TB or so)

This is how I traditionally would have done it. I intended that all DomUs storage were going to be files, alas, the reason why I am writting this is because I want more I/O performance, so it can't be done that way. Things get more complex from now on...

Logical volumes overlaps with traditional partitioning, as deciding to use them influences how I am going to do the physical partitioning, as I can rely on LVM or ZFS to do the fine grain. In my case, it doesn't change that much as I had a already very simple partition layout. The FAT32 ESP for UEFI booting looks to be untouchable as a physical partition, as do the nearly HD wide one. The Hypervisor itself can sit in its own partition or inside a logical volume, be it LVM or ZFS. Arch Linux can install on either:

...So the bare minimum physical partitions are only 2. However, I don't consider installing the Hypervisor on a logical volume a good idea, not only that it complicates the installation process, but also because if I have to do maintenance, it seems to be easier to do it from Dom0 itself instead of having to rush for a rescue disc if I have issues dealing with LVM or ZFS.

At the very beginning of my Xen test runs I used LVM because its the only mentioned in the Xen Beginners Guide on the Wiki:
...but after learning how to use file based storage, I formatted everything and started over. The reason why I didn't like LVM is because it seems to make the partition tree much more bloated and complicated, as did managing DomU storage. I preferred files, as they're much more easy to copy or duplicate, backup on another Windows-based computer, etc. However, at that time I didn't feel low I/O performance, now I do so I need LVM. If anything the problem was that I didn't get used to it.
I suppose that there should be tools that allows me to make a file out of a logical volume and viceversa, so if I want to snapshot a DomU storage and send it for backup to another computer, I can make a file out of it, move it, then move it back and restore it at a later date. So I can use both files for cold storage of backup DomUs, and logical blocks for production DomUs.

A thing which worried me about LVM was performance. I didn't wanted to create the LVM layer then manage all DomUs as files as I'm currently doing because I through that I was going to add significant overhead, but after googling some time, the overhead from LVM itself appears to be minimal or null:

I suppose that logical volume resizing should degrade performance, as it may put new data too far away depending on free space, causing fragmentation (For example, place the new data on the inner HD tracks for a partition whose data used to be continuous on the outer edge, after several TBs worth of data in the middle). At least initially it seems good enough as I'm not planning to make changes left and right, so I don't think I should hit that issue.
Another thing that bothers me is that if I were going to resize logical volumes on-demand, I suppose that I also need tools to resize the partitions and File Systems inside DomUs to account for the extra allocated storage, as these seems to be unaware.

While what to do for to get LVM running is clear, and Xen also supports it out of the box for DomU storage, I'm not so sure on ZFS. I know that ZFS is mentioned a lot as a logical volume manager, but I don't know if Xen works directly on raw ZFS volumes (ZVOL?), nor if they perform as LVM based ones.

So far, up to this point, the partition layout would look like this:

1- ESP (512 MB, FAT32)
2- Hypervisor (10 GB)
3- LVM Storage (Remaining 3.9 TB or so)
3.1- Basic Dom0 storage (100 GB or so)
3.2- Gaming Windows XP VM (120 GB or so)
3.3- Everyday/browsing Arch Linux VM (60 GB or so)
3.x- Everything as logical volume, created or resized on-demand

File Systems themselves seems to be simple until you add the next generation File Systems. There are some choices which seem very straightforward. The ESP for UEFI must be FAT32, that is a fixed choice. The standalone physical partition where the Hypervisor will sit at, will possibly be EXT4, as that seems to be the mainstream standard. In the LVM partition, I should have both raw logical volumes for DomU storage, and also formatted logical volumes for basic storage like ISO files or such. I don't know how much LVM influences File Systems choices for logical partitions instead of physical ones, but I suppose EXT4 for general data storage would do. Up to this point everything seems pretty simple...

...This is where ZFS and BTRFS comes into play. They seem extremely similar on most features. BTRFS supposedly is going to be EXT4 replacement as a File System at some point in the future, but according to some benchmarks I saw on Phoronix, performance is inferior to EXT4 by a notable margin. BTRFS also is experimental, and while most people says that it is usable, it still has some quirks. On the other side, ZFS is fully production ready. Finally, some people claims that ZFS is better than BTRFS in other metrics:

I suppose that I could drop BTRFS as a potential candidate. Regardless, while BTRFS is Linux native, ZFS support on Linux doesn't seems to work out of the box due licensing issues, which means I have to read a few guides on how to get it working on Arch Linux. This seems to be easy, because Arch Linux Wiki has articles on getting that done:

But I have no idea if Xen can directly work with ZFS/ZVOLs/whatever for DomU storage, or I need special considerations. I suppose that Xen at the worst case scenario would be able to use file based storage from a ZFS partition as I do today, but no idea how it behaves with logical volumes or a LVM vs ZFS comparison.

Another thing which I find irritating is that all comments about ZFS are about how good it is for redundancy and performance from external RAID systems with tons of HDs, but is hard to come by with info on a single disk. I don't even know if all the added complexity that ZFS requires to make it work will be worth it in my setup. At the bare least, I know that bitrot protection isn't available on single disk unless I devote half of the HD to duplicate absolutely everything. I'm aware that ZFS also loves RAM for caching purposes. As I have 32 GB RAM, this is a non issue. But I don't know about actual performance scaling, or how it does on smaller systems with less RAM, etc. Overally, I know that everyone loves ZFS in big storage RAID arrays, but I'm not sure how it scales down, or if I am going to be better served by a traditional LVM + EXT4 setup.

At this point, choices lookes like this:

1- ESP (512 MB, FAT32)
2- Hypervisor (10 GB, EXT4)
3- LVM Storage (Remaining 3.9 TB or so)
3.1- Basic Dom0 storage (100 GB or so, EXT4)
3.2- Gaming Windows XP VM (120 GB or so, unformatted)
3.3- Everyday/browsing Arch Linux VM (60 GB or so, unformatted)
3.x- Everything as logical volume, created or resized on-demand

Or this...

1- ESP (512 MB, FAT32)
2- Hypervisor (10 GB, EXT4)
3- ZFS Storage (Remaining 3.9 TB or so)
3.1- Basic Dom0 storage (100 GB or so, ZFS)
3.2- Gaming Windows XP VM (120 GB or so, unformatted ZVOL?)
3.3- Everyday/browsing Arch Linux VM (60 GB or so, unformatted ZVOL?)
3.x- Everything as logical volume, created or resized on-demand

Another thing that I was pondering about, was regarding where and how to store general data. While the DomU image files or logical volumes with the Windows installations and all that is self explanatory, as that is local data exclusive for each DomU, I will have also data that will be shared or needs to be easily accessible between many DomUs, even if only temporarily. Examples included ISO collections of applications and games, movies or videos, etc. While I could store them in a LVM partition or a IMG file that I could assign to a given VM at boot via the Xen Configuration File (I know storage is hotplug capable, but didn't looked into that), if I were to use a ZFS partition for such purpose, Windows will not able to see them directly. I suppose that in order to take advantage of ZFS or BTRFS, I will have a dedicated Linux VM dedicated for storage and allowing Windows to access it via Shared Folders as in a network. Otherwise, I will have to store stuff in NTFS formatted IMG files.

Due to the fact that I have 32 GB of RAM, I was thinking on the possibility of making use of the excess of RAM as a RAMDisk, which provides beyond SSD I/O performance (Which I didn't had remaining budget for). Many games should be able to fit on a 20 GB or so RAMDisk while still have plenty of RAM for Dom0 and some DomUs, and as the computer is 24/7 on, RAMDisk volatility is a non issue for as long as the important stuff like Saved Games are backuped often to the HD. This is assuming that ZFS isn't very demanding, I don't know how much RAM that thing eats...
I had experience working with a RAMDisk on WXP SP3 (Using all the 32 GB of RAM on 32 Bits via PAE, long story) and had some success using symlinks (With NTFS does supports), through to get the most out of it you require batch files to copy, rename, and make the symlinks. I see it much more workable on Xen, because as a IMG file, I can copy it to the RAMDisk and backuping it after use in a single go, without having to bother with more complicated Windows NTFS symlinks.

That is all what I have thinked about the storage part of my system. I expect that there will be people that already experimented and have decided on a way or style for managing storage that may want to share, to help me take a choice on what to do and how. It has been more than 3 months since the last time I toyed with the configuration of this system, as after getting it to a usable state I decided to enjoy it instead of further optimizing (I was out of gaming a whole 3 weeks until getting VGA Passthrough working, it was pretty much equivalent to the Dark Ages). However, as Xen 4.4 was released recently, I was intending on starting from scratch, applying any ideas I had in the meantime for a final setup.

Thank you if you managed to read all of this, I know it has been long. Even more thanks if you have an insightful reply, so I can stop thinking on choices and start acting.

Xen-users mailing list

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.