June 23rd, 2009


Previous Entry Next Entry
jimbojones
08:53 pm - ZFS and RAIDZ performance
A comment on the Ars Technica Linux Kung Fu forum a couple of weeks ago got me curious - a user there said that as far as he knew, RAIDZ was not supposed to be a performance configuration, with RAIDZ performance not much better, on average, than that of any single disk in the RAIDZ.

I just happened to have a RAID storage server in the shop that was due for a complete wipe anyway, so I decided to take the opportunity to do some benchmarking. Somewhat to my surprise, ZFS turned out to be quite a good performer - despite its advanced data-protection features, it was the fastest filesystem tested for single-process reads, with or without RAIDZ. RAIDZ did quite well too; on multiple concurrent reads it is significantly slower than RAID5/ext3, but still manages to nearly double single-drive performance across the board.




Hardware used:
AMD Athlon 64+ 3500
2GB DDR2 SDRAM
1 WD 250GB HDD (operating system)
5 Seagate Barracuda 750GB SATA-II HDD (RAID array drives)

Operating systems used:
FreeBSD 7.2-RELEASE amd64 (UFS2 and ZFS testing)
Ubuntu Server 8.04-LTS amd64 (ext3fs testing)

no filesystem tuning was done for any test - all filesystems were left in their default configuration.





 

(16 comments | Leave a comment)

Comments:



 
[User Picture] From: markm
Date: June 24th, 2009 - 10:00 am
  (Link)
What benchmark did you run?

I've got a feeling that zfs was bottlenecked by the limited ram available. I'm curious what the results would be with 4GB or 8GB instead.


 
[User Picture] From: jimbojones
Date: June 24th, 2009 - 12:42 pm
  (Link)
Home-grown. Last time I benchmarked RAID setups, I couldn't find a free benchmark app that would benchmark the actual array instead of my RAM - Bonnie++ is what everybody recommends, but it kept returning results like 800MB/sec.

So what I came up with is first writing 5GB of data to the disk/array to be tested in (chunksize) chunks, then reading those chunks back in random order with a given number of processes. Before each run, 8GB of data is read from another drive in the attempt to clear the filesystem cache as much as possible (because I couldn't, and still can't, find any programmatic way of dumping the cache).

What I'm considering doing next is adding a mode that includes doing a smallish amount of writes concurrently with the reads - something like doing 1MB/sec of random access writes from /dev/zero while running the read benchmark.


 
[User Picture] From: markm
Date: June 24th, 2009 - 03:14 pm
  (Link)
If you do use /dev/zero for you data source, it would also be interesting to see the difference between compression=off and compression=on for the dataset that you're testing.


 
[User Picture] From: jimbojones
Date: June 24th, 2009 - 03:21 pm
  (Link)
Hm. Yeah, really I hate using /dev/zero - the existing benchmark uses data originally pulled from /dev/urandom - but I'm a little at a loss for a FAST and non-CPU-intensive data source of random-ish traffic. FreeBSD's urandom is pretty quick, but Linux's is abysmally slow.


 
[User Picture] From: markm
Date: June 24th, 2009 - 03:28 pm
  (Link)

Yeah, with compression=on, I'd expect zfs performance to fly, as blocks of all nil are compressed away to nothing (zfs just stores the metadata for the block). You could try some incompressible data, like 'cat *.jpg > datafile', and pull random chunks out of there.

One thing I do want to point out is that, while testing the filesystem's ability to physically run the hard drives is a useful data point, a typical real-world workload /is/ going to get some cache hits now and then. One of zfs's features is the ARC, which uses a better caching algorithm than ext3.


 
[User Picture] From: jimbojones
Date: June 24th, 2009 - 04:00 pm
  (Link)
Problem with cat *.jpg > datafile is that, unless you are DAMN sure whatever volume *.jpg is mounted on reads faster than your array writes, you may be benchmarking the read on the source instead of the write on the target.

Maybe piping /dev/zero through tar, and writing the result of [i]that[/i]?


 
[User Picture] From: jimbojones
Date: June 24th, 2009 - 04:01 pm
  (Link)
Incidentally, I'm curious - how'd you come across this post so quickly?


 
[User Picture] From: markm
Date: June 24th, 2009 - 04:29 pm
  (Link)
I've got a Google Alert set up for "zfs".


 
[User Picture] From: jimbojones
Date: June 24th, 2009 - 04:42 pm
  (Link)
Involved professionally, or just interested in it in general?

(I'm an independent consultant mostly serving small business; my interests generally run along the lines of "What is this? What is it good for? Can I use it?" in that scope.)


 
[User Picture] From: markm
Date: June 24th, 2009 - 06:48 pm
  (Link)
> Involved professionally, or just interested in it in general?

Professionally - I work on zfs for a living.


 
[User Picture] From: jimbojones
Date: June 24th, 2009 - 06:54 pm
  (Link)
FWIW, I like ZFS a lot so far - I'm pretty thrilled with the wide array of options for handling data and how easy it is to access them (I started here for info used in actual implementation, as opposed to just reading about it), and the performance is pretty impressive so far - for my applications, it doesn't look as performant as Linux filesystems so far, but it certainly doesn't look shabby. And the data protection features are definitely worth performing a little slower than "the fastest thing available".

I just really wish there was some chance of resolving the CDDL/GPL licensing incompaitibility issues in order to make ZFS available on the Linux side. I am a FreeBSD guy from way back, but I've been migrating more to Linux lately (Ubuntu specifically) in favor of better package management... and better disk performance.


 
[User Picture] From: jimbojones
Date: June 24th, 2009 - 07:21 pm
  (Link)
OK, here's a question for you - apparently, FreeBSD 7.2-RELEASE has ZFS v6 in it, whereas FreeBSD 8.0-CURRENT has ZFS v13.

Were there any changes between ZFS v6 and ZFS v13 which would make re-benchmarking with 8.0-CURRENT before I tear down the bench worthwhile?


 
[User Picture] From: markm
Date: June 24th, 2009 - 11:19 pm
  (Link)
There have been quite a few performance improvements put into zfs between v6 and v13, so I'd say it would be worth checking those out.


 
[User Picture] From: jimbojones
Date: June 24th, 2009 - 04:45 pm
  (Link)
Incidentally, on the LKF forum somebody suggested that a large part of the performance gap between the Linux results and the BSD results is the failure of FreeBSD to support SATA native command queueing - which I hadn't even known about until the comment.

For my own purposes, that pretty much boils down to irrelevant - either you perform well or you don't - but it would be interesting to know how much of the rather large gap that constitutes. So I'm going to take a quick stab at re-running some of the Linux data with NCQ support disabled later to see how much of the gap it constitutes.


 
[User Picture] From: jimbojones
Date: June 24th, 2009 - 06:49 pm
  (Link)
FWIW, no, it wasn't NCQ... Linux doesn't support NCQ on this hardware either. Not sure if that's due to the controller, or to the drives - both are a few years old.

root@ubuntu:~# dmesg | grep NCQ
[ 32.103186] ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 0/32)
[ 32.682482] ata2.00: 1465149168 sectors, multi 1: LBA48 NCQ (depth 0/32)
[ 33.311928] ata3.00: 1465149168 sectors, multi 1: LBA48 NCQ (depth 0/32)
[ 33.939770] ata4.00: 1465149168 sectors, multi 1: LBA48 NCQ (depth 0/32)
[ 34.583488] ata5.00: 1465149168 sectors, multi 1: LBA48 NCQ (depth 0/32)
[ 35.208808] ata6.00: 1465149168 sectors, multi 1: LBA48 NCQ (depth 0/32)


 
[User Picture] From: jimbojones
Date: June 27th, 2009 - 02:56 pm
  (Link)
If you're still interested, another set of results with a mixed read/write workload are available here.


> Go to Top
LiveJournal.com