Software RAID on Linux


 
Thread Tools Search this Thread
Operating Systems Linux Software RAID on Linux
# 1  
Old 01-30-2009
Software RAID on Linux

Hey,

I have worked with Linux for some time, but have not gotten into the specifics of hard drive tuning or software RAID. This is about to change. I have a Dell PowerEdge T105 at home and I am purchasing the following:

1GBx4 DDR2 ECC PC6400 RAM
Rosewill RSV-5 E-Sata 5 bay disk enclosure with 1x PCI-e Silicon Image E-Sata card

I will soon purchase:
1TBx4 Western Digital Caviar Green drives and modify the firmware to disable standard TLER
It currently already has an 80GB boot drive, which I plan to keep solitary for now, but may mirror in the future when I buy another drive.


I am running OpenSUSE 11.0 and may upgrade to 11.1 if necessary. I plan on running software RAID-5 (md) on the 4 drives and at a later point putting in a 5th drive as a spare for the set.

I have read that there should be a significant amount of tuning to the file system and drives themselves. For the bulk of my storage, I plan on running JFS, even though it has limited support from OpenSUSE as it is still included and has some features I really like, like the inode dynamic allocation and resilience.

I have found info to set the chunk size to about 64-256K and test, as well as setting noatime. What other information can you give me to help make this reliable and fast.

Speed is not as big an issue, as I will be pulling files from a 100Mbit network, but it may be upgraded to gigabit in time.

The applications I plan to run on this machine are:

HypericHQ
OpenVPN
SAMBA
MondoRescue/Mindi
Maybe VMware (testing only)
Maybe DIMDIM
Perhaps a small local Apache or Lighttpd server
Maybe a test ORACLE instance, kept small
# 2  
Old 02-01-2009
It seems to me that putting all the drives on one bus/SATA adaptor, is a sure way to degrade performance. I cannot answer about JFS, but another technique is to use the Logical Volume suite, which can deal with dynamically growing partitions. You can also use logical volumes for RAID 5, or you can use them for single partitions, use the "md" software raid technique to create a RAID 5 on top of those single partitions. Either way, you mount JFS on top of the create RAID device.

Quote:
I have read that there should be a significant amount of tuning to the file system and drives themselves
For IDE and slow ATA drives yes. But for S/ATA? I have not heard this.

As far as fs tuning, an optimum block size is good, and periodic defragmenting is good. For that reason, it makes sense to partition into 3 filesystems: 1 for infrequent writes, such as program images and libraries; 1 for write-mostly activity, such as for logfiles and /var activity; 1 for write-read-often, such as for configuration files and dynamic data. The first two you will rarely need to defragment. The third one you should defragment somewhat frequently. Also, you can disable atime on the first two, but can be kept on the third (IMHO).
# 3  
Old 02-01-2009
I don't see how putting them on the same channel will degrade performance to a significant degree. The theoretical limit on SATA is 300MB/s and I have not seen any drives that reliably push more than 90MB/sec on for more than a few minutes at a time. The drives I am going for are lower power 5400-7200rpm variable rotational drives.

As far as the drives themselves, I was going to use a tool WD provides to enable RAID optimized TLER to prevent deep recovery on failed write/read as this could cause a drop on RAID performance (The main difference in their RE series drives).


As for the file systems, I usually break into more than 3, giving /boot, /, /usr, /srv, /home, var and /tmp their own, as well as probably going with a swap partition on the boot drive and a swap file on the array, at least until I get the boot drives mirrored. Then I will not use the swap file any longer and only use the partition on the boot mirror.
I had always planned on LVM for the devices, to aid in the management, or if not LVM, I thought perhaps about using EVMS, but with the reduced support of it, it made me hesitant to try it.
# 4  
Old 02-02-2009
That's a good point about Sata throughput and drive throughput. What you must consider, however, is that RAID tries to write 5 blocks to five different drives right after another. Each contoller gets the data, sends a write request, waits for it to finish, sends the next, etc. Each drive gets the SATA command, positions the head to the correct place, then writes the data. When done serially, this process can reduce your throughput from 90 MB/s to 18 MB/s (90/5).

Since the PCI bus is much faster than the SATA bus, the OS can send the data to multiple controllers by the time the first disk is about to write. Thus, the OS can take advantage of multiple SATA controllers. (It could be that some SATA PCI controllers have multiple busses and multiple drive controllers, which would be nice). But if you're fine with the 20 MB/s, then just a single controller is fine.

As far as partitions, leave /boot, /, and /usr on your 80 GB drive. I don't know why you need to separate /tmp from / unless you have a large need for /tmp space (which is very uncommon these days -- most programs use the current working directory for scratch space). If you have a need for /tmp, but that consists of lots of small data files, use a memory-based solution.

You almost won't need swap at all with 4+ GB. If you have one, keep it on the array, but not on the slower 80 gb drive.
# 5  
Old 02-02-2009
I plan on putting none of the system partitions on the RAID5. Everything system related will be on the 80GB drive, including tmp and swap, but there won't be much of it, maybe 1GB. I don't like the idea of swap on RAID5, but plan on having it on the 80GB for when I get a second drive and mirror.

I keep /tmp separated due to the fact that I dont want a full /tmp to interfere with the root volume. It is how I have always done it and see no need to change to something that may be less resilient, even if it is slightly more convenient.

Also, I believe that the OS can cache the requests until such time as they can get written, and then do a fsync.

If the host controller supports it, the single channel should not have a problem:
FIS(Frame Information Structure)-based switching

FIS-based switching is similar to a USB hub. In this method of switching the host controller can issue commands to send and receive data from any drive at any time. A balancing algorithm ensures a fair allocation of available bandwidth to each drive. FIS-based switching allows the aggregated saturation of the host link and does not interfere with NCQ.

from: Port multiplier - Wikipedia, the free encyclopedia

It turns out, the controller that ships with my enclosure, the Sil3132 supports FIS.
# 6  
Old 02-03-2009
The FIS stuff sounds decent, then. Additional cards will get you some benefits, but for the costs, I suppose those benefits would be rather small.

One reason you might have put /tmp on its own partition is that in the older days of UNIX, lots of programs used /tmp for critical things ... like email for temporary folders, editors' swap files, and such. These days, it's hardly used, except for creating very small files by ssh, XWindows, and such. Even if /tmp does get full, it's unlikely to interfere with the system operation, whereas before, when it got full, things like editing a system file became dangerous.

The only things writing to / should be in /tmp, pretty much, so putting /var and /tmp onto the same volume isn't a bad idea -- then / can practically be read-only (unless you need to update a config file). The only hitch is mtab, the linker cache, and adjtime need to be soft-links to somewhere in /var or /tmp.
# 7  
Old 02-03-2009
I got the enclosure. In case anybody was wondering, it is the Rosewill RSV-S5 and it is $199 till the end of the day today at the egg.

As for FIS with multiple cards, the only issue I can see is that they will be limited by the 250MB/s on my PCI-E 1x slot.

As far as what you said about /tmp, that makes sense. I came from an HP-UX world before I went to Linux (Well, played with Linux a little bit in college, and then did not really use it as much as HP-UX and some AIX till we started using it at that company
As for that, if /tmp was full, you could not even install a patch or software, as they were written out to /tmp before install.

I still feel more comfortable with it on its own partition, but the things you said about / being near read only piqued my interest. I doubt I will give it a try except maybe in a virtual instance, but it is something to consider.

Last edited by mark54g; 02-03-2009 at 05:13 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Solaris

Hardware to software RAID migration

We have hardware RAID configured on our T6320 server and two LDOMs are running on this server. One of our disk got failed and replaced. After replacemnt the newly installed disk not detected by RAID controlled so Oracle suggested to upgrade the REM firmware. As this is the standalone production... (0 Replies)
Discussion started by: rock123
0 Replies

2. Solaris

Software RAID on top of Hardware RAID

Server Model: T5120 with 146G x4 disks. OS: Solaris 10 - installed on c1t0d0. Plan to use software raid (veritas volume mgr) on c1t2d0 disk. After format and label the disk, still not able to detect using vxdiskadm. Question: Should I remove the hardware raid on c1t2d0 first? My... (4 Replies)
Discussion started by: KhawHL
4 Replies

3. Red Hat

Software RAID configuration

We have configured software based RAID5 with LVM on our RHEL5 servers. Please let us know if its good to configure software RAID on live environment servers. What can be the disadvantages of software RAID against hardware RAID (4 Replies)
Discussion started by: mitchnelson
4 Replies

4. UNIX for Dummies Questions & Answers

RAID software vs hardware RAID

Hi Can someone tell me what are the differences between software and hardware raid ? thx for help. (2 Replies)
Discussion started by: presul
2 Replies

5. Filesystems, Disks and Memory

Software RAID

Hello, My company has inherited a Centos based machine that has 7 hard drives and a software based raid system. Supposedly one of the drives has failed. I need to replace the hardrive. How can I about telling which hard drive needs replacing? I have looked in the logs and there clearly is a... (5 Replies)
Discussion started by: mojoman
5 Replies

6. Red Hat

Software RAID doubt

hi friends, I am having issues with adding a spare device to a failed array. I have created RAID 1 with 3 partitions using mdadm command. Later I added a spare with mdadm --add /dev/md0 /dev/sdb6 This works fine and when I check this with mdadm --detail command it just sits there as a spare... (7 Replies)
Discussion started by: saagar
7 Replies

7. HP-UX

Software RAID (0+1)

Hi! A couple of months ago a disk failed in our JBOD cabinett and I have finally got a new disk to replace it with. It was a RAID 0 so we have to create and configure the whole thing again. First we thought of RAID 1+0 but it seems you can't do this with LVM. If you read my last thread, you can... (0 Replies)
Discussion started by: hoff
0 Replies

8. UNIX for Advanced & Expert Users

Software RAID ...

Hi all, I m just trying using software RAID in RHEL 4, without problem , then i wanna simulate if disk 1 is fail (thereis an bootloader), i plug off my 1st disk. My problems is the second disk cannot boot? just stuck in grub, the computer is hang. Sorry for poor concept in RAID? I use a RAID 1.... (0 Replies)
Discussion started by: blesets
0 Replies

9. SuSE

Raid software besides Veritass

Hello Lunix people, I am looking for Raid software or solution besides Veritas. Veritas has some great software but are way too costly. Does anyone know of good raid software that but NOT Veritas. I need the funcations but not the cost. (7 Replies)
Discussion started by: xtmeisel
7 Replies
Login or Register to Ask a Question