Fastest way calculating directory


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Fastest way calculating directory
# 1  
Old 10-25-2013
Fastest way calculating directory

Hi expert,

Is there any fastest way to calculate recursive directory, and I have total 600 directories have 100000 files and 10 directory approximately 9000000 - 10000000 each files per directory. currently using this command "du -k --max-depth=0" to get the size but very slow it take 24 hours until now not yet done. is there any work around something snapshot that size of that folder and sum up the additional file if anyone upload to that directory. anyway my goal is to calculate fastest way.


Thanks more power
# 2  
Old 10-25-2013
Code:
df -h

will run much faster.

When you have gigantic directories many kinds of filesystems perform very poorly. du reads information on a per file basis, in your case millions of file reads (calls to stat). df gets information stored in the kernel about whole filesystems. One read per filesystem.

At some point you should attempt to reorganize your directories so that you don't have what appears to me to be an unmanageable mess.
# 3  
Old 10-25-2013
Any access to such a file system will tend to be slow because of the many necessary calls to stat(), not only the reading of the (basically) inode-structure by du. To minimize not only your problem at hand but all similar problems regarding this filesystem i suggest you move the vital FS information (that is: inodes and the like - all the metainformation) to some high-bandwidth storage, like an SSD.

I saw a similar problem (backup/restore of a huge GPFS with ~500TB of data) solved by introducing a 150GB SSD holding just the metadata. It reduced the necessary time from ~6 hours to ~90 minutes using the same hardware.

I hope this helps.

bakunin
# 4  
Old 10-26-2013
I don't think df command can get total size per folder? is there's away to do df the folder?

thanks
# 5  
Old 10-26-2013
Short answer for df: no.
A not so great answer to using du:

You can run du parallel. It is still going to take a very long time.
Assume all of your directories live on two mountpoints (directories): dira and dirb
Code:
cnt=0
> /tmp/summary_sizes.txt   # set the file to zero length
find /dira /dirb -type d | while read dirname 
do
     du -s $dirname >> /tmp/summary_sizes.txt &   # run du in the background
     cnt=$(( $cnt + 1 ))                                          # count background processes
     [  $(( $cnt % 15 )) -eq 0 ]  && wait                   # when 15 active -- wait
done
wait                                                                      # wait for any leftover processes

15 is arbitrary. There may be so much I/O on your filesystem(s) that you need to lower that number. If there is little impact (see output ofiostat -zxnm 1 10) you may want to bump it up. Also since you did not post the directory hierarchy, and I am guessing, the result of find may cause du to read the same directories multiple times, which impedes performance ex:

Code:
/dira
     foo
          dir1
             subdir1
             subdir2
          dir2
     foo1

So if you know the correct full names of all of the directories you want to monitor
put them in a file (call it dir.txt) like this:
Code:
/dira/foo/big1
/dira/foo/big2
/dira/foo2/big1
/dirb/foo/big2

so that du does each "endpoint" directory just one time. This may or may not be feasible.

Change the above code this:

Code:
cnt=0
> /tmp/summary_sizes.txt   # set the file to zero length
while read dirname 
do
     [ "$dirname" = "/dira" ]  && continue                # skip highlevel dirs
     [ "$dirname" = "/dirb" ]  && continue
     du -s $dirname >> /tmp/summary_sizes.txt &   # run du in the background
     cnt=$(( $cnt + 1 ))                                          # count background processes
     [  $(( $cnt % 15 )) -eq 0 ]  && wait                   # when 15 active -- wait
done  < /path/to/dir.txt
wait                                                                      # wait for any leftover processes

# 6  
Old 10-28-2013
thanks for response, but not working here still slow.
# 7  
Old 10-28-2013
Define "not working". Your directory structure is beyond awful, performance wise, so you will never get an answer to du in a reasonable time using standard UNIX tools. du reads directories as we explained earlier.

You would have to develop a fairly complex daemon to constantly monitor each of the huge directories and then store the output on a file system separate from the big directories. Or simply wait a very long time to get an answer using UNIX tools.

If you do something about where the directory data lives, as bakunin suggested, things would get better. Not perfect.

What OS are you on? Maybe you can tune ufs or whatever filesystem you have.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Check fastest server and using it

hello we have upload some data in 15 servers in usa asia ... i consider to add new feature , script can detect download speed between localhost and destination and use fastest server, i have cut this part from a script which have this feature, download a xx MB file from all its source and... (0 Replies)
Discussion started by: nimafire
0 Replies

2. Shell Programming and Scripting

How to Calculating space used in GB for any particular directory in UNIX?

How to Calculating space used in GB for any particular directory in unix Currently I am using : df -h which gives me space for each mout point ldndyn1:/vol/v01/dyn/sbcexp/dyn 1.1T 999G 29G 98% /sbcimp/dyn but I need for some internal particular directory... (3 Replies)
Discussion started by: RahulJoshi
3 Replies

3. Shell Programming and Scripting

Calculating the epoch time from standard time using awk and calculating the duration

Hi All, I have the following time stamp data in 2 columns Date TimeStamp(also with milliseconds) 05/23/2012 08:30:11.250 05/23/2012 08:30:15.500 05/23/2012 08:31.15.500 . . etc From this data I need the following output. 0.00( row1-row1 in seconds) 04.25( row2-row1 in... (5 Replies)
Discussion started by: ks_reddy
5 Replies

4. Shell Programming and Scripting

calculating column summation in a directory of flat files

Hello Guru s I need your kind help to solve my below issue I have a directory of flat files and have to calculate sum of some columns from the flat file . Say for flat file 302 I need the column summation of 2 and 3 rd column For flat file 303 I need the column summation of 5 and... (2 Replies)
Discussion started by: Pratik4891
2 Replies

5. Solaris

The FASTEST copy method?

Hi Experts, I've been asked if there is a fast way to duplicate a file(10GB) and zip it at the same time. The zipped file would be FTP'd.....management is asking this. Maybe there is a better method all together? any ideas? CP will not cut it. Thanks in advance Harley (1 Reply)
Discussion started by: Harleyrci
1 Replies

6. AIX

Fastest way to count big amount of files in sub directory

Hi, what happened is we want to count all the files in a directory and inside this directory got many folders and so take long time to count it. Already run for about few minutes but still not done. The command we use to count is find . -type f | wc -l Just wondering if there is any other... (9 Replies)
Discussion started by: ngaisteve1
9 Replies

7. UNIX for Advanced & Expert Users

Fastest way for searching the file

I want to search a file in fastest manner. Presently I am using 'find' command. But it is taking around 15min for searching. Is there any other method through which I can do it fast? (3 Replies)
Discussion started by: vaibhavbhat
3 Replies

8. Shell Programming and Scripting

how to delete/remove directory in fastest way

hello i need help to remove directory . The directory is not empty ., it contains several sub directories and files inside that.. total number of files in one directory is 12,24,446 . rm -rf doesnt work . it is prompting for every file .. i want to delete without prompting and... (6 Replies)
Discussion started by: getdpg
6 Replies

9. Shell Programming and Scripting

Scripts for calculating size and remaining space of a directory automatically.

I would like to create a script for calculating size and remaining space of a directory automatically every 24 hours, then send an email to report to the admin. * POSIX and PERL are preferred. Can anyone help, please? (1 Reply)
Discussion started by: leonall
1 Replies

10. UNIX for Dummies Questions & Answers

fastest copy command

wich is the fastest command in HP-UX to copy an entire disk to dat tapes, or even disk to disk? thanks (0 Replies)
Discussion started by: vascobrito
0 Replies
Login or Register to Ask a Question