[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Please help estimate number of the domUs



I have not personally seen a performance issue related to the number of VMs per LUN, but I have never tried 100 VMs per LUN. Certain SANs may implement the access queues in such a way to effectively cause 20 VMs to be a sweet spot like that article describes, but I suspect it would vary from one SAN to the next. Even though I have not run into that issue, it is very possible that article is correct. Especially in your case with 100 VMs, you may want to follow the advice in that article and split your VMs across several LUNs.

What ever you do, always leave extra room in each LUN for special operations (such as snapshots, migrations, etc). For example, any time you need to migrate storage with XCP/XenServer, you will need a lot of disk space available on both the target and source SRs to complete the task. The extra disk space will be allocated and used behind the scenes, but the operation will fail if there is not enough disk space available. If your VMs are each 30GB, make sure you plan to leave 30-60GB of additional free space in each LUN to make sure you have room for those operations.

On 1/16/2013 11:31 PM, Andrey wrote:
Thank you very much for the clear and detailed explanations. Many things became clear to me now. I really appreciate that.

As for the number of LUNs I've read http://blogs.citrix.com/2011/06/01/sizing-luns-a-citrix-perspective/
article where it's recommended to create separate LUN for each 20-30 VMs.

So for example for 100 VMs there can be 4 LUNs each stores 25 VMs. Initially we'll have one XCP host, so 3.6TB with the average size of the VM = 30GB there will be 5 LUNs/SRs each 720GB though it complicates managing VMs a little. Does it make sense for the performance?

17.01.2013 01:07, admin@xxxxxxxxxxx ÐÐÑÐÑ:
For web server VPS instances, I usually see real world performance trend
most closely with the 4k random 67% write 33% read tests.  The reason
those VPS instances tend to skew toward vastly more writes than reads is
the http log files. The most heavily access web pages are cached in
memory in the VPS, so there are fewer read operations hitting the SAN.
The logs still need to be written, though.  The log file are being
written in random bursts when there are lots of different sites, even
though the individual log files are sequential.  These results for your
SAN were 4915 to 6317 (depending on queue depth).  This is the upper end
of my initial guess for your SAN (my guess was 2000 to 5000). Based on
your benchmarks, your SAN can deliver about 5000 IOPS in the test that I
personally think most closely resembles the real world usage pattern for
a web server running in a VPS.

Note, the "4k random 67%read33%write" is actually mislabeled. It should
say "4k random 67% write 33% read".

The wild card is database access, especially if you hosting databases
for people with a variety of skill levels.  If the tables are well
designed and properly indexed, then there will be very little disk
access.  If the tables are poorly designed and not indexed properly,
there will be a lot of disk access.  I have seen some customer sites
that need hundreds of IOPS just to service a tiny amount of traffic due
to poor database design.  On the other hand, I have seen a well designed
(and very well indexed) DB that averages 40 IOPS while servicing
millions of queries per day.

The sequential IO tests are an excellent test for how fast you will be
able to copy large files, which is important when you are migrating a
VPS between multiple SAN targets.  Generally speaking, sequential access
performance is usually far less important than random access
performance.  Random IO is far more common than sequential.  And when
you run a bunch of VPS instances, even sequential IO becomes random IO
simply because of all of the VPS instances accessing different areas of
the storage volume.  So I tend to look more at the random performance.

If you have multiple pools, create a separate LUN for each pool. If you
have only one pool, just create one LUN.  This is true regardless of how
many physical XCP/XenServer nodes exist within that pool.  XCP and
XenServer are smart enough to make sure each VPS can only access its own
data blocks even when many VPS share the same LUN.  If you have only one
pool, simply create one LUN. Then on the XCP/XenServer side, add that
iSCSI target as a storage repository.  Then create your VPS instances on
the storage repository.  XCP/XenServer will handle everything else under
the hood.  You don't need to manually install a cluster aware file
system or use a separate LUN per VPS.


On 1/16/2013 6:55 AM, Andrey wrote:
Ok. I performed tests with icf in Iometer-config-file.zip file (8
workers and 120 GB max file size) on RAID1+0 LUN, please see attached.
In this tests IOPS are much smaller. What is the real word performance
then? I'm little confused. Also is that right that I should not create
one big LUN for VMs and create few LUNs with the LUN size = (size of
20-30 VMs)*(average size of VM) for better performance?

16.01.2013 02:07, admin@xxxxxxxxxxx ÐÐÑÐÑ:
Those numbers are higher than I would have expected given the hardware
you listed.  For mixed random access, I expected your hardware would
have delivered 2000 to 5000, not 49747.  Of course, I test with 100%
random and 67% writes.  You were testing with 60% random and 35%
writes.  There could be considerable caching involved (especially with
read tests), but it is hard to say without more data points.

If you want to run more benchmarks with IOmeter, I would suggest trying
the ones that ZFSBuild uses from
http://www.zfsbuild.com/pics/Graphs/Iometer-config-file.zip . That zip
file contains an IOMeter.icf file. More details about those benchmarks
are at
http://www.zfsbuild.com/2012/12/14/zfsbuild2012-benchmark-methods/

Anyway, I am a lot more familiar with the benchmarks from ZFSBuild. If you run those benchmarks and post those results, then I could give you a
very good idea what level of real world performance to expect.

Here are some InfiniBand based benchmarks using that the ZFSBuild
IOmeter file:
http://www.zfsbuild.com/2012/12/15/zfsbuild2012-infiniband-performance/

Here are some graphs of single ethernet port benchmarks: (comparing some
hardware from 2010 with hardware from 2012)
http://www.zfsbuild.com/2012/12/14/zfsbuild2012-performance-compared-to-zfsbuild2010/




On 1/15/2013 2:33 PM, Andrey wrote:
Just finished measuring SAN performance with IOmeter
(http://vmktree.org/iometer/OpenPerformanceTest.icf and 5 minutes each test) on RAID10 (data, 16GB maximum test file) and RAID50 (backup, 8GB
maximum test file) both 3.6TB with one ext4 partition. SAN is
configure in dual-path configuration and server has multipath
configured with 2 HBA adapters. Here are the results:

RAID 5+0:
-----------------------------------------------------------------
|       Test name        |   Avg iops     |    AvgMBps |
-----------------------------------------------------------------
| Max Throughput-100%Read     |    47528    |    1485    |
| RealLife-60%Rand-65%Read     |    24760    |    193    |
| Max Throughput-50%Read     |    6959    |    217    |
| Random-8k-70%Read         |    26612    |    207    |
-----------------------------------------------------------------

RAID 1+0:
-----------------------------------------------------------------
|       Test name        |   Avg iops     |    AvgMBps |
-----------------------------------------------------------------
| Max Throughput-100%Read     |    44031    |    1375    |
| RealLife-60%Rand-65%Read     |    49474    |    386    |
| Max Throughput-50%Read     |    43002    |    1343    |
| Random-8k-70%Read         |    49930    |    390    |
-----------------------------------------------------------------

Caching is in action or else?

13.01.2013 23:52, admin@xxxxxxxxxxx ÐÐÑÐÑ:
You should measure the performance of the SAN using something like
IOmeter (running IOmeter on the hardware you plan to run XenServer or XCP on). Assuming you configure those drives in RAID10, I would guess
that SAN would deliver about 2,000 to 5,000 IOPS.  If you use RAID5
(please don't), then you will see far less IOPS during mixed read and
write tests.

If you want to deploy 100 VMs onto that SAN, then each VM is only have to have 20-50 IOPS (assuming RAID10). The performance in each VM will be less than fantastic. If the VMs need to do any IO intensive tasks,
the owners of the VMs are probably going to complain about sluggish
performance. I don't think the SAN you listed can deliver enough IOPS
to satisfy 100 VMs.

On 1/13/2013 12:17 PM, Andrey wrote:
Well, storage is the direct-connect HP P2000 G3 FC dual-controller
array with 600GBx24 disks in dual-path configuration (two HBA
ports ->
two controllers ports). I guess it is quite enough.

13.01.2013 20:45, admin@xxxxxxxxxxx ÐÐÑÐÑ:
You will probably run out of disk IO before you run into any hard
limits
in XenServer or XCP.

What type of SAN are you going to use?  What type of network
interconnect will you use to link your XenServer/XCP nodes to your
SAN?
How many IOPS does your SAN deliver over your chosen network
interconnect?

On 1/13/2013 9:03 AM, Andrey wrote:
Sure, will try. I see in XenServer 6.1 FAQ that maximum supported
number of guests is 150 and it requires increasing dom0_mem to max
4096. It's obvious that internal limits are not quite realistic
so it
will be good result for me if we able to run at least 100
guests. It
seems that it is more realistic number although some resources note
maximum number of VMs as 4-10 per CPU core (so 32-80 in my
case). But
in all these cases 192 GB RAM would be redundant I think.

With regards, Andrey

11.01.2013 16:43, Wei Liu ÐÐÑÐÑ:
On Fri, 2013-01-11 at 12:24 +0000, Andrey wrote:
Thank you for the answer

I'm really consider the case with creating as many DomUs as
possible
with typical load and get practical info.

What about network capacity? Does this math implies to the
network
resources? Should we shape the DomUs bandwidth to prevent network
overload? Can CPU be bottleneck in this configuration?


The math I did was to show you some internal infrastructure limits
that
I know.

CPU / network overloading is another topic. TBH I haven't done
stress
tests on CPUs and network.

And whether you will hit any bottlenecks in CPU / network or not
relates
closely to your use case. Boot up DomUs and do some typical
workload is
a good idea.


Wei.




_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.