Help to reduce time of archiving


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help to reduce time of archiving
# 1  
Old 12-21-2012
Help to reduce time of archiving

hi all,

i have written the following script that does this work:
1. copy large logs files from one server to another.
2. then unzip this files and extraxt from these large https logs only those fields that are neccesary.
3. then archive the extracted logs to new files.

BUT the problem is that all this proccess is TOO SLOW and it takes much time to be completed.
Could you please take a look at my script and suggest me any improvement?

Thank you in advance!!

Code:
#!/bin/bash
set -x

FINALDIR=/data/media/LOGS/archive/_cdrc2

DATE=`date +%Y%m%d`
#echo "`date +'%Y/%m/%d %H:%M:%S'` Starting getting CDR's"

mkdir /data/media/LOGS/archive/logs2/$DATE/
/bin/chmod 750 /data/media/LOGS/archive/logs2/$DATE
/usr/bin/setfacl -m g:ITdep:rx /data/media/LOGS/archive/logs2/$DATE
scp httplogs@192.168.70.123:/data/backup/cdr/hotcdr/MEGAFLOW*$DATE* /data/media/LOGS/current/logs2/

cd /data/media/LOGS/current/logs2

for i in `ls *.gz`
do
        gunzip $i
        a=`echo $i | cut -d'.' -f1-2`
        awk -F '\t' '{ print $2"\t"$4"\t"$6"\t"$16"\t"$17"\t"$18"\t"$22"\t"$37"\t"$43"\t"$44 }' $a >> LOGS_Extracted_CDR_$DATE    #extract selected fields from the CDR and append them to new cdr file
        7za a -bd -mfb=255 $a.7z $a
        rm -f $a
        mv $a.7z /data/media/LOGS/archive/logs2/$DATE/
done

7za a -bd -mfb=150 LOGS_Extracted_CDR_$DATE.7z LOGS_Extracted_CDR_$DATE   #compress with 7za the new cdr file
rm LOGS_Extracted_CDR_$DATE -f
mv LOGS_Extracted_CDR_$DATE.7z $FINALDIR


Last edited by arrals_vl; 12-21-2012 at 07:00 AM..
# 2  
Old 12-21-2012
Pls use code tags as advised! You can use the time command prefix to find out which step is taking what time, then post that result.
# 3  
Old 12-21-2012
Ok, i will post result with TIME prefix, but it will take much time for the script to be executed, approximately 2 hours Smilie
Maybe any prior suggestions...i see that 7z archiving takes much time..

Last edited by arrals_vl; 12-21-2012 at 07:10 AM..
# 4  
Old 12-21-2012
Thanks for the code tags.
If you know it's the 7z archiving taking the most of the time, then there's little chance to reduce the overall time except for playing around with 7z's options/parameters... or replace 7z by another compressing tool.
# 5  
Old 12-21-2012
Yes, i know that 7z takes much time to compress, but i also want a very good compression.
could please give me any ideas for 7z setting, how to use them?
Or maybe an idea how to organize better my script...maybe to devide it in two scripts and run each other on different servers?...just an idea..i dont know

Thank you very much!
# 6  
Old 12-21-2012
I am not sure how "7za" works, but probably similar to "gzip"/"gunzip" in the regard of being single-threaded. Regardless of how many CPUs you have in your system only one of them is used to execute it. You can speed things up greatly by introducing parallel threads and process files in parallel.

Instead of working on the files serially:

Code:
process_file firstfile
process_file secondfile
process_file thirdfile
... etc.

you work on them in parallel (pseudocode):

Code:
num_jobs=0
num_CPUs=<enter number of cores here>

(list-files-to-process) | while read FILE ; do
     if [ $num_jobs -lt num_CPUs ] ; then
          process_file "$FILE" &
     fi
     num_jobs=$(jobs | wc -l)
     sleep 5
done

Basically this does: read a list of the files to process into a loop. The loop will query the number of background jobs and limit this to the number of available CPU cores. Whenever a job has finished the number of running jobs "jobs|wc -l") will decrement and a new job will be started instead. Depending on the output of your "jobs" command you may have to adjust the expression by some corrective factor.

I hope this helps.

bakunin
# 7  
Old 12-21-2012
Thank you bakunin for your suggestions.
I was reading about LZMA...does any body knows how to use this option and how it will affect the time of my script?

About the TIME command prefix, these are the results:

Code:
 
real 116m57.289s
user 198m15.878s
sys 5m46.687s

Thank you for you time!

Last edited by Scott; 12-21-2012 at 09:55 AM.. Reason: Code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to reduce the execution time

We are trying to execute below script for finding out the occurrence of a particular word in a log file Need suggestions to optimize the script. Test.log size - Approx to 500 to 600 MB $wc -l Test.log 16609852 Test.log po_numbers - 11 to 12k po's to search $more po_numbers xxx1335... (10 Replies)
Discussion started by: KumarPiyush7225
10 Replies

2. Shell Programming and Scripting

Optimizing script to reduce execution time

AFILENAME=glow.sh FILENAME="/${AFILENAME}" WIDTHA=$(echo ${FILENAME} | wc -c) NTIME=0 RESULTS=$(for eachletter in $(echo ${FILENAME} | fold -w 1) do WIDTHTIMES=$(awk "BEGIN{printf... (5 Replies)
Discussion started by: SkySmart
5 Replies

3. Shell Programming and Scripting

Archiving or removing few data from log file in real time

Hi, I have a log file that gets updated every second. Currently the size has grown to 20+ GB. I need to have a command/script, that will try to get the actual size of the file and will remove 50% of the data that are in the log file. I don't mind removing the data as the size has grown to huge... (8 Replies)
Discussion started by: Souvik Patra
8 Replies

4. Shell Programming and Scripting

Automation script to reduce the installation time

DELETED. (0 Replies)
Discussion started by: vasuvv
0 Replies

5. Shell Programming and Scripting

need inputs on how i can change my script to reduce amount of time the script takes

HI , I have a list1 which consists of data that i have to search and a list2 which has the files that need to be searched .So basically i am using list1 on list2 to see if list1 data is present if found replace it .I have written the code using foreach loop for each list .This is taking the... (1 Reply)
Discussion started by: madhul2002
1 Replies

6. UNIX for Dummies Questions & Answers

Archiving and move files in the same time

Hi All, I have tried so many command but none work like i wanted. I would like archive which i assume it will move the files and archive it somewhere. for example: if i have a folder and files: /home/blah/test /home/blah/hello /home/blah/foo/bar i would like to archive folder... (6 Replies)
Discussion started by: c00kie88
6 Replies

7. Shell Programming and Scripting

Archiving by Time

Hi all. I am trying to set up archiving of directories, such that I keep every directory made in the past week, but just one directory per week beyond that. Using the find command, I can easily delete everything more than one week old, but can not figure out how to save one. Each directory... (4 Replies)
Discussion started by: aefskysa
4 Replies

8. Shell Programming and Scripting

To reduce execution time

Hi All, The below script I run daily and it consumes 2 hours approx. In this I am calling another script and executing the same twice. Is the loop below the cause for the slow process?Is it possible to finetune the program so that it runs in a much faster way? The first script: #!/bin/ksh... (4 Replies)
Discussion started by: Sreejith_VK
4 Replies

9. BSD

Reduce boot-time delay on FreeBSD?

Say for instance, I would like to reduce the delay/waiting time for the boot-time menu from 10 seconds to 5 seconds, how would I go about doing it? From what I've been able to find, entering "autoboot 5" into the right file would take care of that for me, but the man pages are unclear as to... (1 Reply)
Discussion started by: DownSouthMoe
1 Replies
Login or Register to Ask a Question