[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-users] Recommendations for Virtulization Hardware
On 2012-09-24 05:45, ShadesOfGrey wrote:
Sorry
for the late response, I've had a lot to digest.
On 09/21/2012 11:22 AM, Robin Axelsson wrote:
If you want to be able to use PCI and VGA
passthrough you basically need to make sure that your hardware
supports either AMD-Vi (formerly known as AMD-IOMMU) or Intel
VT-d extensions. In the Intel case it limits your choice of
Motherboard (it must be supported in the BIOS) and CPU. In the
AMD case it limits only your choice of motherboard. A good start
is to check out one of these pages:
http://wiki.xensource.com/xenwiki/VTdHowTo
http://wiki.xen.org/wiki/VTd_HowTo
A word of warning here is that parts of the documentation is
somewhat dated. You can also communicate with e.g. Gigabyte,
Asus or ASRock customer support and ask them if a particular
motherboard supports these extensions. Most motherboards also
have downloadable user manuals, if the BIOS settings in those
pages shows options to enable/disable VT-d or AMD-Vi/IOMMU
extensions then you will be ok with that motherboard.
The lack of current information about Xen (and KVM) online has
been frustrating — especially finding the many proof of concept
videos that demonstrated possibilities but offered no real
specifics. Looking for specifics, I sought information from gaming
and enthusiast sites; I figured finding confirmation of VT-d and
AMD-Vi support on such sites would be more likely. However, I
found that wasn't often the case. I did determine that ASRock
motherboards seem to be the most likely to support VT-d, ASUS
least likely (unless equipped with an Intel 'sanctioned' VT-d
chipset). I had narrowed my choices to two motherboards that
appear to offer VT-d support and was intending to contact the
manufacturer before purchase. Both choices are a bit pricey and
I've been reconsidering whether I should look to other
motherboards to reduce costs.
Some motherboards support IOMMU even though it is not found in the
user manual or specified on the website. Your best bet is to ask
customer support. A guy posted here that he got it working on an
Intel motherboard that doesn't even have options for it in the BIOS,
so it seems that in some cases it is only up to the CPU. This is not
the case with AMD though as I stated before. I have bought a couple
of Gigabyte GA990FX-UD7 myself, they are stable and have a good
layout. They have support for IOMMU but I haven't tested it
thoroughly enough to fully confirm this although I don't believe
there would be any problem.
It surprises me that ASRock and ASUS are so different. ASRock is, or
at least used to be a subsidiary of ASUS so there shouldn't be that
much difference between them.
The other thing is choice of GPU for VGA
passthrough and it is preferable that the GPU supports FLR or
Function Level Reset as it is called. Thing is that the hardware
needs to be reset somehow as it is passed through to the host.
This is best done with FLR and nVidia is known to supply
firmware patches for some of their Geforce cards with this
support and it is said to be supported by default with their
Quadro cards. FLR is not the only way to reset a PCI device, a
reset could be trigged through the ACPI power management
framework by temporarily cutting power to the affected PCI slot.
These reset methods are called d3d0 and bus reset. The question
however, is if this works on PCI cards that use auxiliary power
directly from the PSU. There is a pdf document on the VMWare
website
(http://www.vmware.com/files/pdf/techpaper/vsp_4_vmdirectpath_host.pdf)
about this:
-----------------------
Reset Method
Possible values for the reset method include flr, d3d0, link,
bridge, or default.
The default setting is described as follows. If a device
supports function level reset (FLR), ESX always uses FLR. If the
device does not support FLR, ESX next defaults to link reset and
bus reset in that order. Link reset and bus reset might prevent
some devices from being assigned to different virtual machines,
or from being assigned between the VMkernel and virtual
machines. In the absence of FLR, it is possible to use PCI Power
Management capability (D3 to D0 transitions) to trigger a reset.
Most of the Intel NICs and various other HBAs support this mode.
-----------------------
There are indications from people that d3d0 also work with PCI
cards that take power from auxiliary inputs. I suggest that you
take a look at the following youtube clip and read the comments
there:
http://www.youtube.com/watch?v=Gtmwnx-k2qg
So it seems that it works although it may be a bit more quirky.
It doesn't hurt to take that discussion (particularly about FLR
support) with nVidia and/or AMD.
This is precisely the kind of information I was looking for from
the threads I started on Ars Technica. It's just unfortunate that
FLR and D3 D0 support aren't often found in the tech specs of must
expansion hardware. However, now that I know what to ask, I'll try
contacting hardware manufacturers prior to purchasing any
expansion hardware. Thank you!
D3 and D0 are power states defined for devices in the ACPI
specification and can be used to control the supply voltage (Vcc) to
PCI and PCIe devices. You can find more information about it here
for example:
http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface
---------------
Device states
The device states D0-D3 are device-dependent:
- D0 Fully On is the operating state.
- D1 and D2 are intermediate power-states whose
definition varies by device.
- D3 Off has the device powered off and
unresponsive to its bus.
---------------
So, either it works for a certain type of hardware or it doesn't and
I wouldn't expect a vendor to state this "support" in the
specifications since it isn't a "feature" in and of itself if you
get me. But maybe this will change and maybe FLR support will become
more widespread.
When it comes to virtualization, the
technology has come very far, but it is still lacking
considerably when it comes to sharing GPUs and also to some
degree when it comes to sharing I/O devices (especially when you
intend to run many virtual machines on a single system). The GPU
today consists of three types of components; the processing
unit, graphics memory and the video output unit/adapter and it
is not clear as to how to share these components seamlessly
between the host and virtual machines with minimal overhead.
Whereas there are VT-x extensions that allows you to pretty
seamlessly share CPU cores between VMs and the host there are
currently none for the processing unit. It is also not clear how
the hardware can assist with sharing TV/monitor screen estate
between machines with all 3D effects such as Aqua for Win7 and
the whatnot enabled for all machines. Especially when
considering the dynamics of plugging and unplugging computer
monitors to multiport/eyefinity graphics cards and the ability
to change screen resolution. Things are improving for sure and a
lot of research is likely going into this. I don't know what's
happening in the GPU frontline but I know that the next thing
with passthrough is the SR-IOV that allows PCI units to present
several virtual instances of oneself to several virtual
machines. It's a cool thing, I recommend further reading about
this here:
http://www.intel.com/content/www/us/en/pci-express/pci-sig-sr-iov-primer-sr-iov-technology-paper.html
http://blog.scottlowe.org/2009/12/02/what-is-sr-iov/
That is fascinating. Extending virtualization to expansion
hardware via SR-IOV, sure would make the kind of setup I'm
attempting a lot easier. However, if I can replicate what I've
seen in proof of concept videos (namely Casey DeLorme's), I think
that will meet my needs for now. As it stands, I initially intend
to reserve any discrete GPU(s) for Windows and rely on an
integrated GPU for all other VMs using PV drivers (wherever
possible). Afterward, I want to experiment with re-assigning the
whatever discrete GPU(s) for GPGPU functions under a Linux VM
whenever the GPU is not going to used for gaming (if at all
possible).
It is likely to take a few years before
something useful will come out of it. In the meanwhile, unless
you want to use several GPUs which might not be a bad thing as a
lot of monitors these days have several inputs, you can resort
to using a remote desktop client to integrate one machine with
another. Virtualbox for example use RDP through which you can
interact with your virtual machine. In a similar manner you can
set up a VNC server on your Linux host and establish a
connection to it through your Windows VM. You will not get full
3D functionality (such as Aqua) through the client although
there is a growing support for it through VirtualGL extensions
that are coming to VNC and perhaps the Spice protocol. But some
clients might even allow for seamless mode that lets you mix
Linux and Windows windows on the same desktop like this for
example:
http://i.techrepublic.com.com/blogs/seamless.png
http://www.youtube.com/watch?v=eQr8iI0yZH4
Just keep in mind that this is still a little bit of uncharted
territory so there may be a few bumps on the way and it may not
work as smooth as you would desire.
From everything I've read, solutions that rely on any form of
remote display protocols would be limited to a subset of Direct3D
functions. Furthermore, these would vary from one implementation
to another, thus making them far less attractive for gaming than
VGA passthrough... Well, in my opinion anyway.
VirtualBox's seamless mode is pretty nifty. But it's a Type 2
Hypervisor and relies on paravirtualized drivers that also suffer
from the same limitations as remote display protocols. It's great
for most things, but gaming is not one of them. And I'm speaking
from personal experience. Though I haven't used them myself, the
same would seem to hold true of Parallel's and VMWare's
'Workstation' offerings. At least, as far as I've gathered.
FYI, the Type 1 Hypervisors from Parallel's and VMWare* are priced
waaayyy outside my budget.
I understand that you want full 3D functionality for Windows gaming
but maybe you'll find the subset of 3D functionality for the Linux
machine acceptable. I have looked into VirtualGL and with TurboVNC,
you might get a pretty decent desktop environment and it seems like
most of the features are there already. It appears that the 3D is
rendered by hardware/GPU before it is streamed through VNC or Spice.
So it seems that you would need another GPU for that. You can find
more info on VirtualGL here:
http://www.virtualgl.org/
Also the line between a type 1 and type 2 hypervisor tend to get a
bit blurry. The point with type 1 is that it has access to ring-0 so
that it can get access directly to the hardware to be passed through
to the guests (I did confuse 'host' and 'guest' in my prior post).
It also doesn't need to ask the host OS for permission in the same
way as a type 2 hypervisor which is likely to give performance
advantages in some cases.
However, even a type 2 hypervisor, although it is run as an
application inside the OS can get "type 1" like privileges. By
patching into the kernel and/or using special "dummy drivers" for
hardware to be shared with VMs you can achieve pretty much the same
thing, ergo it is no longer clear whether the hypervisor is a type 1
or type 2.
There is an article about it from the old IBM Mainframe days but I
can't seem to find it.
*I only found out about VMWare's 'free' vSphere after I'd written
this response.
I see that your demands are somewhat multifaceted. I believe
that you also want to use diffent services such as using your
machine as a file server with the possible intention of using
filesystems such as ZFS. If you do, you should be careful with
your selection of hardware for these particular purposes. If you
want to get full protection against data corruption from ZFS,
your choice of hardware gets rather limited when it comes to
choice of hard drives, host bus adapter and network controller.
The most stable implementation of ZFS is found with Illumos
based operating systems (such as OpenIndiana, SmartOS, OmniOS,
Belenix etc) or Solaris if you choose to download it from
Oracle's website. With these operating systems you are most
likely to want to use hardware that has certified drivers for
it. That way you are less likely to run into problems later on.
That implies that you will be limited to choosing Intel based
network adapters and LSI based SAS controllers. There should be
_no_ hardware RAID functionality in the SAS controller that
merely should be run in IT mode (or Initiator-Target mode). That
requires the LSI controller to be flashed with IT firmware in
most cases. The objective here is to make sure that _all_ errors
that might occur with the hard drives are reported all the way
to the software level and that nothing is concealed of
obfuscated by internal error handling in the hardware. It is
therefore recommended to use SAS hard drives instead of S-ATA
(which also are fully compatible with SAS controllers). SAS hard
drives are not much more expensive than similar SATA drives and
you get a higher reliability out of them. It is also recommended
to have at least two drive redundancy simply because if one
drive is dead and you swap it, it is not uncommon that another
drive dies in the rebuild process of the RAID cluster because of
the added strain the rebuild process (or 'resilvering' as it is
called in Solaris terms) put on the drives. Of course, the
system should communicate directly to the hard drive hardware
and not be obfuscated by some virtual abstraction layer in
between which means that you either run ZFS on the metal or
through PCI passthrough of the SAS (and perhaps also network)
adapters. Also, it is highly recommended that you use ECC RAM
for such applications and it doesn't hurt to dedicate a few gigs
of it to the ZFS as RAM is used for cache. The good news is that
most motherboards with good chipsets support ECC RAM even though
you might not find anything about it in the user manuals.
Again, thanks for the thorough explanation. This gives me a great
deal to think about. The more I learn about ZFS, the less
appealing it becomes. And by that I mean the confusion over which
version of ZFS is in what OS? And just how well maintained the
OSes supporting ZFS are? Now I have additional hardware
considerations to keep in mind that may (or may not) make the cost
of ZFS RAID-Z pool comparable to a hardware RAID5/6 solution
anyway. Do you have any suggestions as to which of LSI HBAs I
should be considering? I haven't found an HCL for ZFS in my
searches.
Out of curiosity — and if you would happen to know — do you think
what you suggest about the HBA and SAS drives for ZFS also applies
to Btrfs? I'm assuming it would, but I'd appreciate some
confirmation.
It's funny how the "I" in RAID never really seems to apply...
Especially since it looks more and more like using ZFS or Btrfs
will require I commit myself, from the start, to one or the other
and a discrete HBA. Transitioning from an integrated SATA
controller(s) and mdadm seems rather impractical. If I understand
what's involved in doing so correctly. It may turn out that
anything other than mdadm is price prohibitive.
I don't think you will have a problem with getting ZFS to run and if
that's your only goal then you don't need to be very picky with your
choice of hardware. I find ZFS pretty easy and handy to use. I has
really great functionality and I don't have many bad things to say
about it so far. ZFS is a filesystem (along with a couple of
software tools to administrate it) just like EXT4 or NTFS so
hardware support depends on the platform it runs on.
But the point with using ZFS is to get maximum protection against
data corruption and that's where the selection of hardware gets
limited and there are "best practices" set up to achieve that. I
have not tested ZFS on any other platform than on OpenSolaris and
OpenIndiana but I do know that it is well implemented on that
platform and more mature there than on any other (non-solaris)
platform. Another advantage with the OSOL/OI platform is that the
CIFS functionality is implemented in the kernel space and not in the
userland which will give advantages performance wise if you intend
to share files with other windows computers. (I don't deny that
Samba is pretty good on Linux too. There are some benchmarks on the
phoronix website comparing samba with NFS and they are in favor of
Samba on those benchmarks...) The second best implementation is
found with FreeBSD and it is probably fairly mature but I haven't
tested it myself and some people have run into problems with it in
the past. The Linux version is probably merely at infancy stage and
likely not yet mature enough for regular use. It is probably not as
"bad" as btrfs though. There is quite a bit of information about it
on the phoronix.com website (and probably also at lwn.net):
http://www.phoronix.com/scan.php?page=news_item&px=MTE4Nzc
A search there on ZFS will give more articles. The latest official
version of ZFS is 28 and is probably implemented in both Linux, and
FreeBSD by now. Later versions have been released since Oracle
killed the OpenSolaris project and can be found with the commercial
closed-source Solaris platform that is supplied by Oracle. Things
have happened since Oracle pulled the plug on OSOL project and
leading developers behind the ZFS project such as Jeff Bonwick left
Sun (after the acquisition by Oracle) and joined up with the Illumos
team instead. So you cannot determine the stability of ZFS and ZPOOL
merely by looking into the version number unfortunately and I
wouldn't expect the FreeBSD implementation to be as stable as the
Solaris implementation. It just takes time for the implementation to
mature and the bugs to be weeded out and it just happens to have
been around for Solaris/OpenSolaris/Illumos for much longer than the
other platforms and the Solaris/Illumos version also happens to get
first dibs on the features. Among the Illumos people there is an
ambition to drop the version numbering altogether and instead talk
about available features.
The recommendation to use SAS hard drives is not so much about the
quality of the hard drives themselves as it is about the SAS
protocol. The SAS protocol simply handles SCSI transport commands in
a better and more reliable manner than do SATA. I believe any decent
SAS drive would do. As for HBAs I wrote a list with LSI based
hardware a while ago here:
https://www.illumos.org/boards/1/topics/572
the thing is that a lot of OEMs such as IBM, HP, Cisco,
Fujitsu-Siemens, Dell, ... supply their branded HBAs with LSI
circuitry on them. What hardware to choose depends on what you're
looking for. If you want an 8-port controller I would go for Intel
SASUC8I or LSI SAS3801E-R. If you want SAS/SATA3 with 6.0 Gb/s then
LSI's SAS 9200 series cards would be a better choice. I don't know
what OEMs have come up with in the SATA3 department since I wrote
that list but the chips to look for in that case are the LSI
MegaRAID 2004/2008/2016e depending on how many ports you want.
If you want to read a further discussion about reliability of
different RAID setups I made a post about this in the following
thread (last post):
http://communities.intel.com/thread/25945
I admire your persistence with pursuing
this undertaking and wish you the best of luck with it!
Robin.
Thanks. I've invested too much time in research to not at least
make the attempt. Besides, if all else fails, I can fallback to a
two box solution. That is, if I can get my hypothetical
virtualization box to fit in my budget envelope...
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users
.
|
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users
|