[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance

  • To: xen-users@xxxxxxxxxxxxx
  • From: Gordan Bobic <gordan@xxxxxxxxxx>
  • Date: Sat, 28 Jun 2014 13:26:22 +0100
  • Delivery-date: Sat, 28 Jun 2014 12:26:30 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

On 06/28/2014 12:25 PM, lee wrote:
Kuba <kuba.0000@xxxxx> writes:

W dniu 2014-06-28 09:45, lee pisze:

I don't know about ZFS, though, never used that.  How much CPU overhead
is involved with that?  I don't need any more CPU overhead like comes
with software raid.

ZFS offers you two things RAID controller AFAIK cannot do for you:
end-to-end data checksumming and SSD caching.

There might be RAID controllers that can do SSD caching.

I never heard of one.

SSD caching
means two extra disks for the cache (or what happens when the cache disk

For ZIL (write caching), yes, you can use a mirrored device. For read caching it obviously doesn't matter.

and ZFS doesn't increase the number of SAS/SATA ports you have.

No, but it does deprecate the RAID and caching parts of a controller, so you might as well just use an HBA (cheaper). Covering the whole stack, ZFS can also make much better use of on-disk caches (my 4TB HGSTs have 64MB of RAM each. If you have 20 of them on a 4-port SATA card with a 5-port multiplier on each port, that's 1280MB of cache - more than any comparably priced caching controller. Being aware of FS level operations, ZFS can be much cleverer about exactly when to flush what data to what disk. A caching controller, in contrast, being unaware of what is actually going on at file system level, cannot leverage the on-disk cache for write-caching, it has to rely on it's own on-board cache for write-caching, thus effectively wasting those 1280MB of disk cache.

How does it do the checksumming?

Every block is checksummed, and this is stored and checked on every read of that block. In addition, every block (including it's checksum) are encoded for any extra redundancy specified (e.g. mirroring or n+1, n+2 or n+3). So if you read the block, you also read the checksum stored with it, and if it checks out, you hand the data to the app with nothing else to be done. If the checksum doesn't match the data (silent corruption), or read of one of the disks containing a piece of the block fails (non-silent corruption, failed sector)), ZFS will go and

Read everything after it's been written to verify?

No, just written with a checksum on the block and encoded for extra redundancy. If you have Seagate disks that support the feature you can enable Write-Read-Verify at disk level. I wrote a patch for hdparm for toggling the feature.

I'll consider using it next time I need to create a file system.

ZFS is one of those things that once you start using them you soon afterwards have no idea how you ever managed without them. And when you have to make do without them, it feels like you're trying to read braille with hooks.

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.