List-id: Xen user discussion <xen-users.lists.xen.org>
Thanks, again, Adam!
I'm sure that your input is gonna help me out as we begin tuning our SAN! :P
I'm curious: IIUC, you're using LV's as DRBD backing devices. Is that correct? Wouldn't it be more versatile to use empty partitions as DRBD backing devices; import them to the initiators as PV's [using iSCSI/AoE/GNBD/etc] and then allocate LV's on the initiators (after coordinating the clients using cLVM)?
TIA,
Eric Pretorious
From: Adam Goryachev <mailinglists@xxxxxxxxxxxxxxxxxxxxxx> To: xen-users@xxxxxxxxxxxxx Sent: Sunday, October 19, 2014 7:13 PM Subject: Re: [Xen-users] Storage Systems for Virtual Disk Images
On 19/10/14 10:08, Eric wrote:
Thanks, Adam:
That's
all tremendously helpful information!
I do
have two questions:
Layering
- Are you using logical volumes (LV's) as DRBD backing
devices and then using the DRBD resources as iSCSI LUN's? This seems like a fairly
labor-intensive approach that might not work at all with
automated provisioning.
Yes, using:
RAID5
LVM2 (one LV per dom0)
DRBD
iSCSI
Previously we were doing:
RAID5
DRBD
LVM2
iSCSI
However was advised by the DRBD authors/support to split to using
multiple DRBD's to reduce the IO load.
I don't use any sort of automated provisioning, as the config here
is very static. However, it should be relatively easy to automate,
simply add a LV to both primary/secondary, create the DRBD config
file on both primary/secondary, connect/initial sync, and then
create the iscsi export on both primary/secondary. Probably will
also want to remember to adjust your failover system to add the
extra DRBD (change to primary) and iscsi export.
Networking
- What mode are you using to bond the 2 x 1G dom0
interfaces? e.g., balance-rr, active-backup,
balance-xor, broadcast, balance-tlb, or balance-alb?
Not using bonding at all, went through all sort of configs and
variations there. Started with 8 x 1G on the SAN and 1 x 1G on the
dom0. Eventually I've ended up with 1 x 10G on each SAN, plus 1 x
10G for DRBD (crossover). Each dom0 has 3 x 1G ethernet, 2 used for
iSCSI and one used for the "user" LAN. The iSCSI is configured as
two discrete ethernet devices on the same LAN subnet (eg,
10.1.1.21/24 and 10.1.1.31/24), the primary SAN server is 10.1.1.11
and secondary 10.1.1.12.
iSCSI uses multipath to make one connection over each interface to
the same destination (floating IP configured on the SAN servers).
I had considered other options such as creating 4 connections from
each dom0, two to 10.1.1.11 (primary) and two to 10.1.1.12
(secondary), this would remove the need for a floating IP, etc, but
in practice, I've not had any issue with the floating IP.
I can forcibly shutdown the primary, and all VM's proceed without
interruption (few seconds of stalled IO), or else I can nicely
shutdown the primary, and there is no noticeable downtime/delay.
I would be interested to hear more about your
configuration/setup/etc. For me, the system is working well, the IO
layer is still underperforming, but I no longer get IO stalls, and
get "good" performance (ie, the users are happy). I suspect some
parts could be tweaked further but haven't had the time to work on
that.
If you want a lot more information on the problems I had, and the
various configurations (both hardware + software/etc) please search
on the linux-raid mailing list, and on this list (archives of both).
Most of my efforts were over a period of 15+ months starting around
January 2013.
I personally use a Linux HA + DRBD + LVM + Linux
iSCSI solution, and it works very well. Some
things I took a lot of time to solve include:
1) A whole bunch of network cabling/config issues,
now using 10G between DRBD, 10G iscsi server, and
2 x 1G for dom0's with multipath.
2) Unexpected poor performance with HDD's,
concurrent random access from multiple domU's does
not work well with HDD. My solution was to upgrade
to SSD.
3) Unexpected poor performance with SSD. This came
down to testing the wrong thing when calculating
expected performance level. Test with small (eg
4k) random read/write and use those results,
unless your VM's are only doing large read/write,
and these really do get merged, then you will find
performance limited to the 4k request size.
4) Still poor performance from SSD (DRBD). Change
LVM so that it is below DRBD. ie, one LV for each
domU, then on top is DRBD for each domU, then
finally iscsi exports the DRBD devices.
5) Still poor performance from SSD (DRBD). DRBD
needs to do it's own write for every domU write,
plus lvm does it's own, etc. Each layer adds
overhead. Solution for me was to disable DRBD
disk-barrier, disk-flushes, md-flushes
Other things that helped along the way include:
echo noop > /sys/block/${disk}/queue/scheduler
echo 128 > /sys/block/${disk}/queue/nr_requests
echo 4096 > /sys/block/md1/md/stripe_cache_size
# RAID5, test the correct value for your array
Currently, I have 8 dom0's with about 25 domU's
and it is working well, including transparently
failing over on iscsi server failure. If anyone
wants more details, I'm happy to share.
Most of the above is perhaps not specific to xen,
but storage in general, but I hope it will be
relevant here.
I'd also ask that if you get any direct response,
that you please summarise and send back to the
list, and/or update the wiki so others can more
easily find the information.
Regards,
Adam
On 17/10/14 11:34, Eric wrote:
Hello, All:
I'd
built a highly-available, redundant iSCSI
SAN in our lab a while back as a proof of
concept (using Linux-HA, DRBD, and the Linux
iSCSI Target Framework) and it worked pretty
well but, as I'm getting ready to build the
infrastructure for our [production] cloud,
I'm wanting to re-examine the topic again but I
just haven't got enough time to sift
through all of the outdated or speculative
information on the Internet so I'm
reaching out to the list for some guidance
on hosting virtual disk images.