|
|
Building a Linux File Server - 2004
Configuration, installation, and performance
Hardware
We were running out of shared disk space, so began a project to build a new
fileserver. The equipment was all purchased in late February and early March
of 2004, so you can "date" the pricing. Here's what we got:
- Intel SHG2 ServerWorks Grand Champion LE Dual Xeon MP Motherboard, $133(!)
- Intel Xeon 2.4GHz 400MHz FSB Xeon w 512Kb cache ($223).
- 2x 512 Mb Kingston PC2100 ECC RAM ($105 each).
- Supermicro SC833 3U rackmount case with 8-drive SATA backplane ($499).
- ASUS CD-ROM ($20)
- Intel Pro/1000MT dual-gigabit server ethernet card ($166).
- WD 10K RPM U160 9.1Gb SCSI drive, for system disk ($20).
- 3WARE Escalade 8506-9 SATA RAID card ($499)
- 8x WD 250Gb 7200RPM SATA drives (2500JD model), $205 each.
Total cost: $3410, plus shipping, etc, for 2Tb raw storage.
(Compare with approx. $4500, from Aberdeen Comp. or similar.)
Installation
Putting it all together was straightforward. The SC833 case is really very
nice to work with, good cable routing, fan arrangements, etc. Documentation
on the SATA backplane is sketchy, but the only problem we found was
that the fan alarms are jumpered "active" by default, and we'd plugged the
case fans into the motherboard instead of the backplane. The SHG2 motherboard
fits in this case, but the power cable only barely makes it to the connector.
Getting all the lights and switches connected required finding
the pin diagram in the SHG2 documentation. Finally, with this backplane and
the 3ware card, it is not possible to get the hard-drive light to register
activity of the SATA drives.
Red Hat 9 installed without a hitch, correctly detecting everything (sans RAID;
we put that in afterwards.)
Raid configuration and testing
This machine will provide medium-term storage for the results of molecular simulations,
which tend to be large numbers of files somewhat smaller than 1~Gb. Many of these
will be output directly to the machine via NFS, so network and NFS performance
will be important to us. Some level of RAID redundancy is necessary, as this
server will not be backed up. The 3Ware RAID card provides RAID levels
0, 1, 10, and 5, as well as JBOD. We have therefore benchmarked numerous configurations
using both hardware RAID and software RAID (the linux "md" driver, as provided
default with Redhat 9.) We used Bonnie++
for this testing, running on the machine itself (no NFS, yet). The full "raw" results, including
file create/destroy information, are available here.
Hardware raid, using the 3Ware card
- RAID level 5, ReiserFS filesytem:
Version 1.02c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
castle 256M 5984 99 259184 99 497269 10 5820 99 +++++ +++ +++++ +++
castle 512M 5954 98 252266 98 467629 99 5812 99 +++++ +++ +++++ +++
castle 1G 5878 97 105843 47 18131 4 5368 91 127510 13 1266 2
castle 2G 5787 96 36603 18 12539 3 5425 86 75505 7 531.2 1
castle 4G 5719 95 24290 12 12872 3 5657 89 63078 8 392.3 1
castle 8G 5752 96 21127 11 12386 3 5666 89 57845 8 319.5 0
- RAID level 5, EXT3 filesystem, default options:
Version 1.02c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
castle 256M 6053 99 354460 96 472571 99 5841 99 +++++ +++ +++++ +++
castle 512M 6078 99 193925 52 463586 99 5850 99 +++++ +++ +++++ +++
castle 1G 6081 99 68234 18 14029 3 5656 96 197022 13 3156 3
castle 2G 6205 99 27359 8 12948 3 5720 91 65156 6 499.1 0
castle 4G 6150 99 21101 6 13838 3 5773 91 58134 5 339.6 0
castle 8G 6163 99 20062 6 14239 3 5698 89 52153 6 296.7 0
The filesystem differences are relatively small, with ReiserFS appearing to win over
EXT3 under these conditions, especially for relatively small files. Even though
Bonnie++ must be told to use only a fraction of the machine's 1G RAM for the first
three tests, the OS is clearly caching; only for 2G+ files does the actual RAID
performance become visible. Needless to say, we were quite disappointed with these
results; 20 Mb/sec block output for large files is slower than we would
expect from a single drive.
- RAID level 10, ReiserFS
Version 1.02c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
castle 256M 6003 99 242145 100 456771 99 5847 99 +++++ +++ +++++ +++
castle 512M 5987 99 237664 99 419935 100 5847 99 +++++ +++ +++++ +++
castle 1G 5974 99 153404 67 44920 12 5626 95 202321 13 2419 4
castle 2G 6071 99 96810 46 30223 9 5704 92 87888 9 580.5 1
castle 4G 5923 99 84649 42 28195 7 5849 92 74219 9 424.4 0
castle 8G 5876 98 83718 42 27240 7 5866 92 67335 9 343.0 1
Much better! The 3ware card does a good job in striped/mirrored mode; the only problem
here is that fully half of our 2TB raw disk storage is gone, which is not acceptable.
At this point, we decided to try out Linux's software RAID drivers, using the 3ware
in JBOD mode, which displays the 8 drives to the OS as SCSI disks.
Software RAID performance
- RAID level 5, 64Kb "chunk size", ReiserFS
Version 1.02c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
castle 256M 4801 81 219174 92 454535 98 4812 83 +++++ +++ +++++ +++
castle 512M 5001 83 177323 77 309866 82 4781 83 +++++ +++ +++++ +++
castle 1G 4852 82 108248 53 41367 13 4201 72 295542 28 1840 2
castle 2G 4814 82 57421 31 26558 8 3953 63 154145 26 528.7 1
castle 4G 4744 81 47526 26 26054 9 4105 65 144300 27 417.2 1
castle 8G 4746 81 42858 24 26356 9 4184 66 122137 26 343.0 1
- RAID level 5, 128Kb "chunk size", ReiserFS
Version 1.02c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
castle 256M 4814 82 254884 100 496005 100 4992 86 +++++ +++ +++++ +++
castle 512M 4956 83 192262 83 327311 74 4933 84 +++++ +++ +++++ +++
castle 1G 4873 83 109966 53 40895 12 4338 74 271537 34 1623 2
castle 2G 4875 82 59728 31 28797 9 4122 66 190785 30 526.4 1
castle 4G 4818 82 49067 26 29307 10 4318 69 193799 38 364.5 1
castle 8G 4799 82 43793 24 29106 10 4394 70 169292 35 346.6 1
Ahh.... much better. The large-file block input and output performance is nearly
doubled over the hardware RAID 5 performance. You can see that the percentage
processor utilization has been reduced to 70-80%; that's because the software raid
driver itself is taking up the other 20-30%. A dual-processor box ought to
perform better in this regard, though we doubt it will matter for NFS serving.
- RAID level 5, 128 Kb "chunk size", EXT3 (default options)
Version 1.02c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
castle 256M 5082 85 85934 26 259821 66 4961 85 +++++ +++ +++++ +++
castle 512M 5033 84 220605 69 307081 76 4887 84 +++++ +++ +++++ +++
castle 1G 5026 84 104240 32 47792 14 4426 76 277147 20 2787 2
castle 2G 5061 84 61395 19 40743 13 4184 67 212945 28 452.8 1
castle 4G 5048 84 58734 18 40743 13 4205 67 182419 29 367.2 1
castle 8G 5054 83 58077 19 40541 13 4167 66 181217 31 337.0 0
The EXT3 filesystem provides for much better block output, and slightly worse
block-input, than ReiserFS. Then, of course, I read the manual and found out
about the tunable EXT3 "stride" parameter:
- RAID level 5, 128Kb "chunk size", EXT3: "mkfs -t ext3 -b 4096 -m 0 -R stride=16 /dev/md0"
Version 1.02c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
castle 256M 5008 83 347390 94 81200 18 4714 82 +++++ +++ +++++ +++
castle 512M 4985 83 111411 33 94203 21 4902 84 +++++ +++ +++++ +++
castle 1G 5035 84 84395 26 61275 17 4345 74 217801 19 3237 3
castle 2G 5096 84 69271 22 37681 12 6006 95 213366 27 560.9 1
castle 4G 6013 97 59336 19 42032 13 5959 94 195055 30 392.9 1
castle 8G 6045 97 59485 19 42121 14 5993 94 178706 31 339.3 1
Small improvements, etc., in some areas. Finally, we pulled another 1 Gb RAM out of a
different box, and re-ran the benchmark with double the system memory:
- RAID level 5, 128Kb "chunk size", EXT3: "mkfs -t ext3 -b 4096 -m 0 -R stride=16 /dev/md0"
Version 1.02c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
castle 256M 5952 97 282165 88 415818 99 5736 97 +++++ +++ +++++ +++
castle 512M 5960 97 269846 86 292862 75 5728 97 +++++ +++ +++++ +++
castle 1G 5950 97 197219 67 63362 17 5809 99 1545895 100 +++++ +++
castle 2G 5947 97 98936 33 71277 20 6038 98 450913 42 2663 2
castle 4G 5989 97 81899 28 50738 17 6154 97 265835 38 543.0 1
castle 8G 6012 97 76457 26 52119 17 6092 96 229029 37 396.6 1
Clearly, spending another few $100 on RAM is going to get us more performance than anything
else at this point!
Summary
We're going with software RAID 5 from here. We will get some NFS benchmarks together
once the system is configured, and we'll see if it is possible to get the network working
fast enough to saturate the RAID (it ought to be; the machine has three gigabit ethernet
interfaces, which I can link-aggregate). The big (unanswered) question is, of course,
whether the 3ware card is worth $500 just as an 8-port SATA interface? It may well be;
I've no idea how good Linux SATA performance really is, whereas the SCSI subsystem is
excellent. The 64bit/66MHz PCI bus should saturate at about 500 MB/sec file transfer,
so we're clearly not near that limit yet. Unfortunately, we haven't got any other RAID
cards or SATA controllers available for further testing at this time.
Other tests
Just for yuks, we tried a few more speculative benchmarks, none of which worked well
(using ReiserFS for these):
- Software Raid 50 (two striped software RAID-5 arrays):
Version 1.02c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
castle 256M 4901 83 235452 94 397748 83 4825 83 +++++ +++ +++++ +++
castle 512M 4786 81 148968 67 316798 76 4681 82 +++++ +++ +++++ +++
castle 1G 4754 81 96042 48 34655 11 4622 79 183327 22 1253 2
castle 2G 4703 80 55631 30 23220 7 4747 76 95241 15 541.3 1
Not bad, but not as good as a single RAID 5, and smaller!
- Raid 50; software-striped across two 4-drive Hardware RAID-5 arrays:
Version 1.02c ------Sequential Output------ --Sequential Input- --Random
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
castle 256M 5900 98 223742 99 456602 101 5835 99 +++++ +++ +++++ +++
castle 512M 5615 93 220022 99 81589 18 5711 97 +++++ +++ +++++ +++
castle 1G 5765 96 116467 54 15541 4 5212 89 226567 17 1724 1
castle 2G 5670 92 30382 16 9801 2 5841 93 108015 14 519.7 0
castle 4G 5656 93 19545 10 10787 3 5816 92 82008 12 375.3 0
castle 8G 5779 96 18956 10 10634 3 6039 95 73641 12 306.0 0
Sucks! Why? Lets look at a single 4-drive Hardware RAID-5 array:
- Hardware RAID 5, 4 drives:
Version 1.02c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
castle 256M 5971 99 235095 100 490467 99 5882 99 +++++ +++ +++++ +++
castle 512M 5871 97 221081 95 62062 13 5831 99 +++++ +++ +++++ +++
castle 1G 5910 98 70075 32 12646 3 5355 91 93778 7 1048 2
castle 2G 6015 98 17549 9 8600 2 5547 88 52798 6 415.4 0
castle 4G 5926 98 14047 7 9099 2 5582 88 41071 5 293.9 0
castle 8G 5826 98 12863 6 9165 2 5618 88 38265 5 234.0 0
That's why - the 3Ware card's performance on Raid 5 varies with the number of
drives attached. This suggests that using two 4-drive hardware RAID cards and
striping them via software might be competitive with the all-software solution
above, but it would depend very much on the performance of the RAID cards.
|
|