Visit Our UNIX and Linux User Community


Fine Tune - Huge files/directory - Purging


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Fine Tune - Huge files/directory - Purging
# 1  
Old 06-09-2011
Lightbulb Fine Tune - Huge files/directory - Purging

Hi Expert's,
I need your assitance in tunning one script. I have a mount point where almost 4848008 files and 864739 directories are present. The script search for specific pattern files and specfic period then delete them to free up space. The script is designed to run daily and its taking around 3 complete days to complete. So the task of tunning came to me.
Initial the Script has 43 find command to delete files 5 find commands to delete the empty directory.
Code:
find ${PD} -type f -name '*(WEEK)*' -mtime +14 -exec rm {} \;
find ${PD} -type d -name '[0-9][0-9][0-9][0-9]*' -exec $rmdir {} \; > /dev/null 2>&1

Two ways I took to tune this script.
1)Combine all seraching pattern/time to reduce the find command like below
Code:
find ${PD} -type f \( -name '*(WEEK)*' -o -name '*(MON)*' -o -name '*(TUE)*' \
-o -name '*(WED)*' -o -name '*(THU)*' -o -name '*(FRI)*' -o -name '*(SAT)*' \
-o -name '*(SUN)*' -o -name '*(WEEKLY)*' \) -mtime +14 -exec rm {} \;

So I got only 7 find command for files and one command for directory. I think(not tested the approach yet) this reduce the search and reduce the time too.
2) Since the first one has -exec command along with the find command what i think is it will take more time, So second approach what I have is finding the files which i need to delete and the delete it with the below loop.
Code:
find ${PD} -type f \( -name '*(WEEK)*' -o -name '*(MON)*' -o -name '*(TUE)*' \
-o -name '*(WED)*' -o -name '*(THU)*' -o -name '*(FRI)*' -o -name '*(SAT)*' \
-o -name '*(SUN)*' -o -name '*(WEEKLY)*' \) -mtime +14 -print > remove.log
cat remove.log | while read ENTRY
do
if [ -f $ENTRY ]; then
rm -f $ENTRY
elif [ -d $ENTRY ]; then
rmdir $ENTRY
fi
done

So, What I request is please let me know the pros & cons on approach 1 & 2. Also please let me know find -exec will take more time or not.
Thanks
Senthil
# 2  
Old 06-09-2011
combining search patterns into one find command is a good idea.
Storing the filenames into a file and then looping through the contents of the file is slower than doing -exec, so unless you want to keep a log of what was deleted, it's reduntant.

Faster than doing -exec would be piping the output of find to xargs(1) like this:
Code:
find $PD <all options you need> | xargs rm

which would call rm only once for many files, as opposed to -exec, which will invoke rm for every file.

Calling find on a mountpoint is not ideal -- if at all possible, i'd recommend running the same find command on the machine that physically contains the filesystem.
This User Gave Thanks to mirni For This Post:
# 3  
Old 06-09-2011
I would still go with option 1. However if the list of files to be removed is very large then you might get an error thats bevause it might go beyond the string which can be handled by the rm command.

-exec won't create any problem. Its as good as runing rm command.
This User Gave Thanks to vidyadhar85 For This Post:
# 4  
Old 06-09-2011
@mirni/vidhyadhar
Since I'm combining the find commands by mtime(now its came only 7 mtime) so the removal list wont come big and I am reomving that each and every time. Also if I use the xargs at last is it fine?
Code:
 
find ${PD} -type f \( -name '*(WEEK)*' -o -name '*(MON)*' -o -name '*(TUE)*' \
    -o -name '*(WED)*' -o -name '*(THU)*' -o -name '*(FRI)*' -o -name '*(SAT)*' \
    -o -name '*(SUN)*' -o -name '*(WEEKLY)*' \) -mtime +14 -print | xargs rm

Also I'm runing the script where the mount is physicaly mounted.
# 5  
Old 06-09-2011
That looks good. No need for the -print switch, but it shouldn't influence performance.
Quote:
Also I'm runing the script where the mount is physicaly mounted.
You misunderstood. If the directory tree is on machine A's hard drive, and it's mounted on machine B's /mnt, running
Code:
find /mnt

on machine B is much slower than running
Code:
find /dirThatsExprted

directly on machine A (e.g. through ssh).
# 6  
Old 06-09-2011
Fine Tune - Huge files/directory - Purging

Can you please replace this line : -o -name '*(MON)*' with below code
-o -name '*(MON|TUE|WED|THU|FRI|SAT|SUN)*'

Hope this works 4 u SmilieSmilie

---------- Post updated at 05:29 AM ---------- Previous update was at 05:25 AM ----------

Quote:
Originally Posted by mirni
That looks good. No need for the -print switch, but it shouldn't influence performance.

You misunderstood. If the directory tree is on machine A's hard drive, and it's mounted on machine B's /mnt, running
Code:
find /mnt

on machine B is much slower than running
Code:
find /dirThatsExprted

directly on machine A (e.g. through ssh).
I think , by having xargs in command will add burden on tunning, since first it will add files in buffer then it will remove where as in direct command , it will keep removing once it finds the file/dir.
This User Gave Thanks to mann2719 For This Post:
# 7  
Old 06-09-2011
@Mirni,
By the statement from mann2719 xargs will delay the process, so shall I use the -exec flag?
@mann2719,
For Mtime +14 I'm having 9 seraching pattern, Shall i combine them in one like
Code:
-name '*(MON|TUE|WED|THU|FRI|SAT|SUN|WEEK|WEEKLY)*'

..? The file will have name like
Code:
2011_(MON)(RERUN).CSV

Also if the find has more than one search pattern is it will loop for more than once or loop for once and search for all pattern.

Previous Thread | Next Thread
Test Your Knowledge in Computers #502
Difficulty: Medium
If a function uses a particular process or algorithm such as a Fast Fourier Transform to perform an operation, it would not be appropriate to document it in a series of comments in the source code.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Need help with listing file name and modified date on a huge directory

hi, We have a huge directory that ha 5.1 Million files in it. We are trying to get the file name and modified timestamp of the most recent 3 years from this huge directory for a migration project. However, the ls command (background process) to list the file names and timestamp is running for... (2 Replies)
Discussion started by: subbu
2 Replies

2. Shell Programming and Scripting

Disc space issues and purging of files

Hi All, I am looking forward to create a unix shell script to purge the files. The requirement is: 1) Do df -k and check the current space occupied for the /a1 folder. 2) If the space consumed is greater than 90 %, delete all the DEF* files from a subfolder /a1/archive. Example: df... (4 Replies)
Discussion started by: shilpa_acc
4 Replies

3. Shell Programming and Scripting

Fine tune this perl script to add router

Hi, I have this routine that reads a microsoft dhcp.netsh dump. Where it finds optionvalue 3 STRING "0.0.0.0" Replace it with the router IP based on the network !/usr/bin/perl while ( <> ) { if ( /\# NET / ) { $net = $'; $net =~ s///g; } else { s/set optionvalue 3... (1 Reply)
Discussion started by: richsark
1 Replies

4. Programming

SQL : Fine tune Insert by query

i would like to know how can i fine tune the following query since the cost of the query is too high .. insert into temp temp_1 select a,b,c,d from xxxx .. database used is IDS.. (1 Reply)
Discussion started by: expert
1 Replies

5. Shell Programming and Scripting

purging of Files

Hello All, I want to delete the files based on the days. like, Files available under directory /abc want to delete if they are older than 15 days. Files available under directory /pqr want to delete if they are 7 days old and some files under directory /xyz should get deleted if they are... (5 Replies)
Discussion started by: ssachins
5 Replies

6. Shell Programming and Scripting

Shell script for purging the 3 days old files

Hi all, I try to write shell script to the below requirement. I have Hard coded the oratab location and take the list of databases from oratab and find out archive log locations for each database, and list more than 3 days old files for each location and purge those. ... (2 Replies)
Discussion started by: mak_boop
2 Replies

7. Cybersecurity

How to fine Tune and Harden the Linux kernel

Hi, As a a security audit, how can I proceed further with Fine tuning and Hardening the linux kernel... I am not sure with the steps how to proceed further... If i do some thing wrong, then its comes with the Kernel panic error. So, I am afraid, how to do the tuning with the kernel.. (1 Reply)
Discussion started by: gsiva
1 Replies

8. Shell Programming and Scripting

Best way to diff two huge directory trees

Hi I have a job that will be running nightly incremental backsup of a large directory tree. I did the initial backup, now I want to write a script to verify that all the files were transferred correctly. I did something like this which works in principle on small trees: diff -r -q... (6 Replies)
Discussion started by: same1290
6 Replies

9. Shell Programming and Scripting

Error While Purging Files

find /filearchive/ -type f -mtime +7 -exec rm weblogs*.log {} \; This worked only if this comand is executed int he unix comand prompt, but when i put this in the shell script it is not recognizing the file.It says weblogs: No such file or directory Am i doing anything wrong here ? (4 Replies)
Discussion started by: svishh123
4 Replies

10. Shell Programming and Scripting

Purging a Set of Files

Hi Frineds, I want to delete a set of files which are older than 7 days from teh current date.I am totally enw to shell scripting, can anyone help me with a sample code to list out the files which are older and then remove them from the directory. Please help THanks Viswa (5 Replies)
Discussion started by: svishh123
5 Replies

Featured Tech Videos