Need advice on approach for script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need advice on approach for script
# 1  
Old 03-01-2012
Need advice on approach for script

Greetings all. I have a repository server which receives, without exhaggeration, several million files a week. The majority of these files are in .csv format, which means they're highly compressable. They are spread throughout numerous directories where there are configured monitoring utilities which parse them as they arrive. They parsers will only parse files as old as 7 days, so at day 8 I would like the files to be tarred/gzipped.

I would like to do this, and am trying to figure out the best approach. I could use find, which would go through and recursively tar/gzip everything it finds. However this doesn't move the files to a new location. Granted, I could use a second find command and move everything to a mirror of the repository directory structure.

I was also considering just using tar, creating an archive using the directory structure of the source data, setting it to remove the files, but I can't find a way in tar to remove only files older than N days old.

I was hoping I could get some suggestions on the best approach to take.

To illustrate:
Right now I have these directories:
/splunk/MSP/
/splunk/CEMP/
/splunk/CEMP/PBTS/
/splunk/CEMP/BTS/
/splunk/CEMP/cedar/
/splunk/CEMP/sbc/

and in each one there are several thousand files. Imagine there are files in each dir called a and b, where a is 8 days old, b is 1 day old.

In a perfect world I would have the script generate a single tar/gz file in a specified directory, which had the complete directory structure contained inside with only file a from each dir. After running the script, inside each directory only file b should remain in the actual repo.

Your thoughts and suggestions are appreciated.
# 2  
Old 03-01-2012
Hi,

Simple approach would be to use find like;
Code:
find /splunk/MSP -type f -mtime +8| tar cf /afiles.tar -

Then run;
Code:
find /splunk/MSP -type f -mtime +8 -exec rm {} \;

Or you could run find with xargs options/

Or to zip
Code:
find /splunk/MSP -type f -mtime +8 | tar cf - | gzip - > /afiles.tar.gz

Regards

Dave

Moderator's Comments:
Mod Comment Please use next time
code tags for your code and data

Last edited by vbe; 03-01-2012 at 12:41 PM.. Reason: code tags...
# 3  
Old 03-01-2012
you can actualy zip whole dir recursed and zip will update archive with only those 8 days old .cvs files and delete them after that
so you will get 1 updating archive with all those files
tip78
# 4  
Old 03-01-2012
Quote:
Originally Posted by gull04
Hi,

Simple approach would be to use find like;
Code:
find /splunk/MSP -type f -mtime +8| tar cf /afiles.tar -

Then run;
Code:
find /splunk/MSP -type f -mtime +8 -exec rm {} \;

If the data is important, that is a terrible approach. There's a race: a file on the cusp of -mtime +8 may be excluded from the tar list yet included in the rm list. The result would be the silent deletion of a file that was never archived.

Regards,
Alister
# 5  
Old 03-02-2012
With millions of files the find command will take some time to complete, so invoking it 2 times sounds like a bad idea. I'd redirect the output of the find command to a file and use that to identify which files need to be processed.
This way the problem that alister addresses is solved too.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with approach and developing script

Hi- I need to develop a script for following scenario in AIX and K shell environment.I am from windows server background for most my career ,so please bear with me and advise suitable approach and technical assistance.Having said that I am aware of unix shell commands but never pput together at... (1 Reply)
Discussion started by: nirasm
1 Replies

2. Shell Programming and Scripting

Script advice

All I have 2 parent directories - input and output. Each parent has multiple sub-directories...each sub-directory has multiple files. Each parent directory structure is a mirror image of itself I need to poll the imput directory and if a new file is found, encrypt the file, move the file to... (2 Replies)
Discussion started by: davidra
2 Replies

3. SuSE

Write shell script using menu-driven approach to show various system

QUESTION: Write shell script using menu-driven approach to show various system configuration like 1) Currently logged user and his logname 2) Your current shell 3) Your home directory 4) Your current path setting 5) Your current working directory 6) Show Currently logged number of... (1 Reply)
Discussion started by: bboyjervis
1 Replies

4. Shell Programming and Scripting

Advice on script

Hi folks, I use following script:- #!/bin/sh # cd Linbread TODAY=`date +"%m%d"` DATA=`grep $TODAY linbread.dat` HOUR=`date +"%H"` if then TOD="Morning" elif then TOD="Afternoon" else TOD="Evening" fi echo $DATA | gawk -F"|" '{printf("%s\n\n%s",$2,$3)}' > $$tmp fold -s -w60... (0 Replies)
Discussion started by: satimis
0 Replies

5. Shell Programming and Scripting

Approach to writting a script

Hello all, I've just joined. I did a google search and your site came up, I had a look and thought I'd like to become a member. I'm from Ireland. I've written a few scripts before, but this new task has me foxed. I would like to figure out the best approach to achieving the following ... (15 Replies)
Discussion started by: Bloke
15 Replies

6. Shell Programming and Scripting

advice on shell script

Hello, I have this script running on cron every 20 minutes. By 12pm daily, our system is expecting all input files to be uploaded by the script. After this cutoff time, the script would still be running though, but i need some kind of alerts/logs to know which input files weren't received for... (1 Reply)
Discussion started by: gholdbhurg
1 Replies

7. Shell Programming and Scripting

Script Help/Advice

Alright, I feel like I have a pretty good basic knowledge of shell scripting, but this one is throwing me for a loop. I know I've seen something similar done with awk, but I couldn't find it with the search function. I've grepped through my log file and get results like this: --... (14 Replies)
Discussion started by: earnstaf
14 Replies

8. Shell Programming and Scripting

Script Advice please?

Ok. I want to parse a log file and search only for denied traffic for the previous hour. The log looks like this: Jun 18 17:47:56 routername 36806: Jun 18 17:53:01.088: %SEC-6-IPACCESSLOG: list ingress-filter denied tcp 1.2.3.4(1234) -> 6.7.8.9(53), 4 packets I only really care about the... (12 Replies)
Discussion started by: earnstaf
12 Replies

9. Shell Programming and Scripting

Advice on Script

I would like some advice on how to logically put together a script to handle a daily task of data gathering for the following problem. I have two files, file1 has 125,000 records that I cut and remove unwanted fields through scripts and cron. In file2, I have 25000 records that has the same... (4 Replies)
Discussion started by: greengrass
4 Replies

10. Shell Programming and Scripting

first script. need help and advice.

Hello everyone, This is my first post here and this is the first time I am using UNIX OS (Slackware). I find it really useful and powerful and would like to master it but as you may guess I am expreicing quite a few problems. I've been reading a few documentations about it and bash this week... (17 Replies)
Discussion started by: sanchopansa
17 Replies
Login or Register to Ask a Question