Extremely slow file writing with many small files on mounted NAS


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Extremely slow file writing with many small files on mounted NAS
# 1  
Old 06-08-2015
Hammer & Screwdriver Extremely slow file writing with many small files on mounted NAS

I am working on a CentOS release 6.4 server which has two mounted NAS devices, one with 20 x 3TB HDD running in FreeBSD with Zfs2 and one NAS which I don't know much about, but which has 7 HDDs in RAID-6.

I was running tar -zxvf on a tarball that is 80Mb with 50,000 small files inside. Even with no other programs running it was taking a very long time (>2 days before I cancelled, but it was probably going to take much much longer than that). The problem only occurs if I run it in the NAS (running FreeBSD, zfs2 60TB, 20x3TB drives) as the working dir.

If I run the same command on the tarball on the server itself (On the HDD that's in the server) it completes in 3 minutes. I also tried it on the other NAS and that took 4-5 hours, so much faster, but still extremely slow.

I tried copying the directory containing the unpacked tarball contents over to the NAS which was more or less identical speed to unpacking from the NAS itself (not surprising.) As a test I unpacked the tarball on my desktop PC and set up an SCP to try copying that, but the SCP command failed with a 255 error. I also tried this from a Mac desktop with completely different commands and the same results. The SCP completed in ~1minute or so if I copied the unpacked tarball from the HDD attached to the server to the desktop PC. With FTP from my desktop it was successful to write files to the NAS but the speed was the same as with copying from the Centos machine.

During the operations, I was looking at iotop and I see that there is a huge % of usage. 99.99% iowait for the slower larger NAS and ~30% iowait for the 7-drive NAS. I also see that all the volumes have %usage over 100% (oscillates, but probably on average going above 100% or at 100% and this occurs on all the drives). I checked zpool status and didn't see any degraded volumes. For whatever reason I can't get smartctl to work on my FreeBSD NAS and I can't ssh into the other NAS at all to do anything with it. So I have not done a comprehensive smart test like I would like. I could in theory use another tool.

My next step is to use Wireshark and see if maybe there is some problem with the TCP settings. Potentially it is set somehow that is not ideal? I also recognize that this Centos version is quite old.

If there are any suggestions, that would be helpful.
# 2  
Old 06-09-2015
With such a huge difference in timing, I would check if the network is happy. There is a chance that the network card and the other end (switch or other network card, depending how you have it connected) do not agree on the speed.

Have a look at the manual pages for ethtool

You might see the problem in the output of:-
Code:
ethtool -S eth0

.....adjusting for the appropriate ethernet card. There is quite a bit of output, but you really want to see very low discard values. Anything else suggests that there is a conflict.

You may need to confirm with the network team that all the hops in between are behaving correctly and that you are not getting throttled (quality/class of service) with a packet shaper or other load manager/balancer.


I hope that this helps,
Robin
# 3  
Old 06-09-2015
Hmm. This is the output. I don't see discard, but there seems to be no restart or error or failed.

Code:
NIC statistics:
     rx_packets: 9967685
     tx_packets: 108944
     rx_bytes: 1310706679
     tx_bytes: 75177246
     rx_broadcast: 9741613
     tx_broadcast: 105
     rx_multicast: 135906
     tx_multicast: 0
     multicast: 135906
     collisions: 0
     rx_crc_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 2051
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 1310706679
     tx_dma_out_of_sync: 0
     tx_smbus: 7135
     rx_smbus: 7482
     dropped_smbus: 0
     os2bmc_rx_by_bmc: 0
     os2bmc_tx_by_bmc: 0
     os2bmc_tx_by_host: 0
     os2bmc_rx_by_host: 0
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_frame_errors: 0
     rx_fifo_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_queue_0_packets: 54
     tx_queue_0_bytes: 6701
     tx_queue_0_restart: 0
     tx_queue_1_packets: 1454
     tx_queue_1_bytes: 871343
     tx_queue_1_restart: 0
     tx_queue_2_packets: 1324
     tx_queue_2_bytes: 368760
     tx_queue_2_restart: 0
     tx_queue_3_packets: 51323
     tx_queue_3_bytes: 54837394
     tx_queue_3_restart: 122
     tx_queue_4_packets: 30319
     tx_queue_4_bytes: 9017297
     tx_queue_4_restart: 4
     tx_queue_5_packets: 3536
     tx_queue_5_bytes: 756463
     tx_queue_5_restart: 0
     tx_queue_6_packets: 4443
     tx_queue_6_bytes: 854941
     tx_queue_6_restart: 0
     tx_queue_7_packets: 9356
     tx_queue_7_bytes: 7374377
     tx_queue_7_restart: 0
     rx_queue_0_packets: 8651647
     rx_queue_0_bytes: 1101786394
     rx_queue_0_drops: 0
     rx_queue_0_csum_err: 0
     rx_queue_0_alloc_failed: 0
     rx_queue_1_packets: 154827
     rx_queue_1_bytes: 46526877
     rx_queue_1_drops: 0
     rx_queue_1_csum_err: 0
     rx_queue_1_alloc_failed: 0
     rx_queue_2_packets: 186676
     rx_queue_2_bytes: 20129412
     rx_queue_2_drops: 0
     rx_queue_2_csum_err: 0
     rx_queue_2_alloc_failed: 0
     rx_queue_3_packets: 119480
     rx_queue_3_bytes: 13647399
     rx_queue_3_drops: 0
     rx_queue_3_csum_err: 0
     rx_queue_3_alloc_failed: 0
     rx_queue_4_packets: 409409
     rx_queue_4_bytes: 41368708
     rx_queue_4_drops: 0
     rx_queue_4_csum_err: 0
     rx_queue_4_alloc_failed: 0
     rx_queue_5_packets: 128512
     rx_queue_5_bytes: 13611456
     rx_queue_5_drops: 0
     rx_queue_5_csum_err: 0
     rx_queue_5_alloc_failed: 0
     rx_queue_6_packets: 182499
     rx_queue_6_bytes: 20083394
     rx_queue_6_drops: 0
     rx_queue_6_csum_err: 0
     rx_queue_6_alloc_failed: 0
     rx_queue_7_packets: 127112
     rx_queue_7_bytes: 12855042
     rx_queue_7_drops: 0
     rx_queue_7_csum_err: 0
     rx_queue_7_alloc_failed: 0

# 4  
Old 06-11-2015
If these are normal hardware RAID controllers (or some software RAID controllers for that matter) you might be able to set the cache; to write-through, write-around and/or write-back.

If the RAID controller cache was set to write-through then the controller will await completion each I/O before proceeding. In the case of millions of small files that would be very painful. You wouldn't see the same degradation with large files because each I/O is bigger.

Search Google (and of course THIS FORUM) for the pro's and con's of these cache settings.
# 5  
Old 06-12-2015
There are only so many components involved, so the debugging should be straightforward:

How is the NAS connected to your system? I suppose it is an NFS-mount, no?

If so, the following has to be checked:

- network equipment: scp a larger file somewhere to test. Measure time.
- name resolution: put NAS server into /etc/hosts and have local files (=/etc/hosts) take precedence over DNS in /etc/resolv.conf
- mount options: look carefully how the FS from the NAS is mounted. Typical problems include CIO (concurrent I/O), user authentication via outside sources (i.e. user authorization via Kerberos and a slow, unresponsive Kerberos server) and the like.

- If you use NFSv4 (you shouldn't - it is crap) check the NFS-domain. It has to be set on every client and server.

I hope this helps.

bakunin
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Red Hat

Related to "NAS" some file system (mounted volumes) were not writable

Dear friends, I have been facing an issue with one of my red hat unix machine, suddenly lost to switch sudo users. My all colleagues lost to switch to access sudo users. Then, we have realized its related to NAS issue which does not allowing to write the file. because of this we got so many... (1 Reply)
Discussion started by: Chand
1 Replies

2. Solaris

Chgrp failed on NAS mounted

Hi, I am facing chgrp issue for a directory on a NAS mounted partation. issue details : user1 belongs to two groups grp1(primary) and grp2(secondary) not able to change directory group to secondary. WORKING on /tmp #mkdir /tmp/a #ls -ld /tmp/a drwxr-xr-x 2 user1 grp1 117 Mar 24... (7 Replies)
Discussion started by: naveen.surisett
7 Replies

3. Solaris

how to make nas share mounted in zones persistent across reboots?

there are few nas shares that would be mounted on the local zone. should i add an entry into the add an entry in zone.xml file so that it gets mounted automatically when the zone gets rebooted? or whats the correct way to get it mounted automatically when the zone reboots (2 Replies)
Discussion started by: chidori
2 Replies

4. Shell Programming and Scripting

Check if NAS filesystem is mounted

Anyone know the best way to check and see if a NAS filesystem is mounted on a linux box. I have no idea where to start :wall:. (2 Replies)
Discussion started by: d3mon_spawn
2 Replies

5. Red Hat

file writing over nfs very slow

Hi guys, I am trying something. I wrote a simple shell program to test something where continuous while loop writes on a file over the nfs. The time taken to write "hello" 3000 times take about 10 sec which is not right. Ideally it should take fraction of seconds. If I write on the local disk, it... (1 Reply)
Discussion started by: abhig
1 Replies

6. Red Hat

NFS writing so slow

Hi, I facing an NFS problem. I have machine1, which has diskA and diskB, and machine2, both are Mandriva 2009 Linux. When I am on machine2 and NFS mount both diskA and diskB of machine1. Writing to diskA is very fast, but writing to diskB is very slow. I tried different mount rsize and wsize... (2 Replies)
Discussion started by: hiepng
2 Replies

7. Solaris

why telnet from PC to Solaris x86 is extremely slow?

I got the following info from this forum, in regards to configuring my Solaris x86 to link to the Net: # echo 192.168.0.1 > /etc/defaultrouter # route add default 192.168.0.1 # echo nameserver 192.168.0.1 >> /etc/resolv.conf # cp /etc/nsswitch.dns /etc/nsswitch.conf So I did... (1 Reply)
Discussion started by: newbie09
1 Replies

8. Shell Programming and Scripting

Split a file into 16 small files

Hi I want to split a file that has 'n' number of records into 16 small files. Can some one suggest me how to do this using Unix script? Thanks rrkk (10 Replies)
Discussion started by: rrkks
10 Replies

9. Shell Programming and Scripting

Need help writing a small script

Hi I am trying to do the following: run a command that looks at a binary file and formats it find the first line that has what I am looking for (there will be multiple) print the name of the file that has it So I am running this but ti's not working ls -ltr *20080612* | while read line... (3 Replies)
Discussion started by: llsmr777
3 Replies

10. UNIX for Advanced & Expert Users

File writing is slow

Hello Guru, I am using a Pro*C program to prepare some reports usaually the report file size is greater than 1GB. But nowadays program is very slow. I found out the program is taking much time to write data to file ..... is there any unix related reason to be slow down, the file writting... (2 Replies)
Discussion started by: bhagyaraj.p
2 Replies
Login or Register to Ask a Question