Extremely slow file writing with many small files on mounted NAS
I am working on a CentOS release 6.4 server which has two mounted NAS devices, one with 20 x 3TB HDD running in FreeBSD with Zfs2 and one NAS which I don't know much about, but which has 7 HDDs in RAID-6.
I was running tar -zxvf on a tarball that is 80Mb with 50,000 small files inside. Even with no other programs running it was taking a very long time (>2 days before I cancelled, but it was probably going to take much much longer than that). The problem only occurs if I run it in the NAS (running FreeBSD, zfs2 60TB, 20x3TB drives) as the working dir.
If I run the same command on the tarball on the server itself (On the HDD that's in the server) it completes in 3 minutes. I also tried it on the other NAS and that took 4-5 hours, so much faster, but still extremely slow.
I tried copying the directory containing the unpacked tarball contents over to the NAS which was more or less identical speed to unpacking from the NAS itself (not surprising.) As a test I unpacked the tarball on my desktop PC and set up an SCP to try copying that, but the SCP command failed with a 255 error. I also tried this from a Mac desktop with completely different commands and the same results. The SCP completed in ~1minute or so if I copied the unpacked tarball from the HDD attached to the server to the desktop PC. With FTP from my desktop it was successful to write files to the NAS but the speed was the same as with copying from the Centos machine.
During the operations, I was looking at iotop and I see that there is a huge % of usage. 99.99% iowait for the slower larger NAS and ~30% iowait for the 7-drive NAS. I also see that all the volumes have %usage over 100% (oscillates, but probably on average going above 100% or at 100% and this occurs on all the drives). I checked zpool status and didn't see any degraded volumes. For whatever reason I can't get smartctl to work on my FreeBSD NAS and I can't ssh into the other NAS at all to do anything with it. So I have not done a comprehensive smart test like I would like. I could in theory use another tool.
My next step is to use Wireshark and see if maybe there is some problem with the TCP settings. Potentially it is set somehow that is not ideal? I also recognize that this Centos version is quite old.
If there are any suggestions, that would be helpful.
With such a huge difference in timing, I would check if the network is happy. There is a chance that the network card and the other end (switch or other network card, depending how you have it connected) do not agree on the speed.
Have a look at the manual pages for ethtool
You might see the problem in the output of:-
.....adjusting for the appropriate ethernet card. There is quite a bit of output, but you really want to see very low discard values. Anything else suggests that there is a conflict.
You may need to confirm with the network team that all the hops in between are behaving correctly and that you are not getting throttled (quality/class of service) with a packet shaper or other load manager/balancer.
If these are normal hardware RAID controllers (or some software RAID controllers for that matter) you might be able to set the cache; to write-through, write-around and/or write-back.
If the RAID controller cache was set to write-through then the controller will await completion each I/O before proceeding. In the case of millions of small files that would be very painful. You wouldn't see the same degradation with large files because each I/O is bigger.
Search Google (and of course THIS FORUM) for the pro's and con's of these cache settings.
There are only so many components involved, so the debugging should be straightforward:
How is the NAS connected to your system? I suppose it is an NFS-mount, no?
If so, the following has to be checked:
- network equipment: scp a larger file somewhere to test. Measure time.
- name resolution: put NAS server into /etc/hosts and have local files (=/etc/hosts) take precedence over DNS in /etc/resolv.conf
- mount options: look carefully how the FS from the NAS is mounted. Typical problems include CIO (concurrent I/O), user authentication via outside sources (i.e. user authorization via Kerberos and a slow, unresponsive Kerberos server) and the like.
- If you use NFSv4 (you shouldn't - it is crap) check the NFS-domain. It has to be set on every client and server.
Dear friends,
I have been facing an issue with one of my red hat unix machine, suddenly lost to switch sudo users. My all colleagues lost to switch to access sudo users.
Then, we have realized its related to NAS issue which does not allowing to write the file. because of this we got so many... (1 Reply)
Hi,
I am facing chgrp issue for a directory on a NAS mounted partation.
issue details :
user1 belongs to two groups grp1(primary) and grp2(secondary) not able to change directory group to secondary.
WORKING on /tmp
#mkdir /tmp/a
#ls -ld /tmp/a
drwxr-xr-x 2 user1 grp1 117 Mar 24... (7 Replies)
there are few nas shares that would be mounted on the local zone. should i add an entry into the add an entry in zone.xml file so that it gets mounted automatically when the zone gets rebooted? or whats the correct way to get it mounted automatically when the zone reboots (2 Replies)
Hi guys,
I am trying something. I wrote a simple shell program to test something where continuous while loop writes on a file over the nfs. The time taken to write "hello" 3000 times take about 10 sec which is not right. Ideally it should take fraction of seconds. If I write on the local disk, it... (1 Reply)
Hi, I facing an NFS problem. I have machine1, which has diskA and diskB, and machine2, both are Mandriva 2009 Linux.
When I am on machine2 and NFS mount both diskA and diskB of machine1. Writing to diskA is very fast, but writing to diskB is very slow. I tried different mount rsize and wsize... (2 Replies)
I got the following info from this forum, in regards to configuring my
Solaris x86 to link to the Net:
# echo 192.168.0.1 > /etc/defaultrouter
# route add default 192.168.0.1
# echo nameserver 192.168.0.1 >> /etc/resolv.conf
# cp /etc/nsswitch.dns /etc/nsswitch.conf
So I did... (1 Reply)
Hi
I want to split a file that has 'n' number of records into 16 small files.
Can some one suggest me how to do this using Unix script?
Thanks
rrkk (10 Replies)
Hi I am trying to do the following:
run a command that looks at a binary file and formats it
find the first line that has what I am looking for (there will be multiple)
print the name of the file that has it
So I am running this but ti's not working
ls -ltr *20080612* | while read line... (3 Replies)
Hello Guru,
I am using a Pro*C program to prepare some reports
usaually the report file size is greater than 1GB.
But nowadays program is very slow.
I found out the program is taking much time to write data to file .....
is there any unix related reason to be slow down, the file writting... (2 Replies)