gzip parallelized


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers gzip parallelized
# 1  
Old 08-25-2009
gzip parallelized

Hello everyone,

I've got a question regarding the gzip command.
I regulary use gzip to pack huge ammounts of files.

Is it ok to start 'gzip *' several times in the same directory to parallelize the packing process or can this result in problems, e.g. broken or unpacked files?

My tests till now showed no problems, but perhaps I missed something.

Thanks for your hints,

Basch
# 2  
Old 08-25-2009
In the ssame directory, but of course not the same files... how to you do that, Subdir or using find?
The answer depends on how you do things, and the only serious issue I see could be the impact on performance (it may not be very "Other Users firendly" on a multiuser host...) since it will consume quite some resource
# 3  
Old 08-25-2009
I am pretty novice to UNIX so my approach was/is to issue the following command several times 'gzip * &' in a directory where all contained files should be zipped.
As I understood you, this is not the best way to do it.
So I will have a look at subdir and find to improve my knowledge.

Thanks for your reply, it was really helpfull.
# 4  
Old 08-25-2009
by subdir I ment you would be having separate sub directories to archive... it is not a command...
# 5  
Old 08-26-2009
Detailed problem description

ah, ok, already got a little bit confused because i found no info about a subdir-command http://unixforums.lg1x8zmax.simplecd...lies/smile.gif find seems also not to be the right tool for the job.

Let me explain with diffrent words, what i want to do:
I have one directory without subdirectories. In this directory i have round about 100 files
Each file is bigger than 1 GB, the biggest file is about 100 GB. The whole directory contains
round about 500 GB of data.
All these files shall be zipped. If i use 'gzip *' once it will take some time to finish zipping all the files.
To speed up the the process I want to use more, maybe 2 or 3, gzip-processes to finish the job.

Now I search for an elegant and safe way to do this.
My first approach was to send the command 'gzip * &' several times.
This seems to work, but I dont feel comfortable with this solution,
because i know it is not elegant and because I don't know if it is safe.
Safe in sense of possible data loss because two gzip-processes try to
zip the same file at the same time or because sth. else goes wrong.

So there are two final questions:
1. Can the problem be solved using the approach I described or are there any dangers?
2. Does anyone know a more sophisticated solution to the problem?

Thanks in advance,

Basch


***Edited on 31.08.09 to correct formating ***

Last edited by Basch; 08-31-2009 at 10:41 AM..
# 6  
Old 08-27-2009
What OS?, what, how many CPU? and memory? and swap?.
Are the swap on the same disk as your data?
Lets say corruption I doubt a little ( but not impossible...) since the most obvious situation:
Trying to zip a file already in treatment would lead gzip to ask you what to do (and so that process would be stopped). What can happen is performance issue but then only you know (is it a multi-user box, a linux PC, or a monolithic server?)
# 7  
Old 08-28-2009
You are right that the gzips will try to do eachother's job with multiple "gzip *"'s. Instead you should carve up the amount of work and give each gzip a limited scope.

You could try:
Code:
#!/usr/bin/ksh
DIR=$1
MAX_PARALLEL=8
parallel=1
for f in $DIR/*; do
  gzip $f &
  (( parallel++ ))
  if (( parallel > MAX_PARALLEL )); then
    wait
    parallel=1
  fi
done
wait

or perhaps use xargs to do the job

Code:
#!/usr/bin/ksh
DIR=$1
setsize=10
ls -1 $DIR/* | xargs -n $setsize | while read workset; do
  gzip $workset &
done
wait

Better yet you could determine the size of the workset on the basis of the amount of files available.

Code:
#!/usr/bin/ksh
DIR=$1
MAX_PARALLEL=8
nroffiles=$(ls $DIR|wc -w)
(( setsize=nroffiles/MAX_PARALLEL ))
ls -1 $DIR/* | xargs -n $setsize | while read workset; do
  gzip $workset&
done
wait

On the command line you could determine the setsize manually. E.g. for 1000 files you would have 125 files per gzip, using 8 gzips so it becomes:

Code:
cd dir
ls -1 * | xargs -n 125 | while read workset; do
  gzip $workset&
done

or

Code:
cd dir
ls -1 * |xargs -P 8 -n 125 gzip

note: I used ls -1 (one) not ls -l

Last edited by Scrutinizer; 08-28-2009 at 04:21 AM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Need help with counting within parallelized block - OpenMP/C-Programming

Hello together, I have a question for you. Since a few weeks I am trying to programm a small bruteforce application in C on Ubuntu 14.04.1 using Code::Blocks with gcc-4.8.2. My programm is nothing special. It's just for practise, Without hashing or something like that. For me the focus is on... (11 Replies)
Discussion started by: DaveX
11 Replies

2. Shell Programming and Scripting

Gzip

if ;then echo "mrnet greater 5000" gzip /var/log/mrnet.log /var/log/mrnet.log.1.gz fi i'm looking if mrnet.log is bigger then 5000 then compress mrnet.log to mrnet.log.1.gz but it won't compress. (3 Replies)
Discussion started by: Froob
3 Replies

3. Shell Programming and Scripting

gzip

how to zip all log file in a folder expect the latest gzip * ---> will zip all log files but I don't want the latest file to be zipped ex: file1, file2, file3, file4, file5 any single command to gzip all files excpet file5 ? (2 Replies)
Discussion started by: rmann
2 Replies

4. Shell Programming and Scripting

Help with GZIP

Hi Gurus, I have a requirement to zip a file using gzip and ftp it to target server. I am using a gzip script as below. gzip.sh #!/bin/ksh /usr/bin/gzip -9 $1 Filename for gzip.sh is passed by an application program. so the output for ./gzip.sh Test_YYYYMMDDHHMMSS.txt (file name is... (1 Reply)
Discussion started by: PRVARMA
1 Replies

5. UNIX for Advanced & Expert Users

gzip vs pipe gzip: produce different file size

Hi All, I have a random test file: test.txt, size: 146 $ ll test.txt $ 146 test.txt Take 1: $ cat test.txt | gzip > test.txt.gz $ ll test.txt.gz $ 124 test.txt.gz Take 2: $ gzip test.txt $ ll test.txt.gz $ 133 test.txt.gz As you can see, gzipping a file and piping into gzip... (1 Reply)
Discussion started by: hanfresco
1 Replies

6. Shell Programming and Scripting

gzip

Hi, I want to gzip files in loop. for i in `ls *.xml`; do gzip $i; done But i am gettin error like "/usr/bin/ls: Arg list too long" Also please tell me how to do it with while loop and also using find and then exec. (7 Replies)
Discussion started by: tushar_tus
7 Replies

7. UNIX for Dummies Questions & Answers

gzip

Hi All, I have some files which are 01.tar.gz, 02.tar.gz,03.tar.gz ........30.tar.gz. when I want to extract the files I 'm using this command "gzip -dc *.tar.gz | tar -xvf -" but it just uncompress 01.tar.gz how can I uncompress all of them ? thanx alice (2 Replies)
Discussion started by: alisevA3
2 Replies

8. SCO

gzip

ciao a tutti, premesso che sono un principiante di unix, avrei bisogno di gzip/gunzip e SOPRATTUTTO delle istruzioni (ahimè dettagliatissime, come per un bimbo!) per installarlo... grazie mille, ciao (1 Reply)
Discussion started by: mfran2002
1 Replies

9. UNIX for Dummies Questions & Answers

GZIP help, please!

Gurus, My own stupidity (or ignorance...or both) put me in the situation I am in and I need help please: 1-My shell account (OS: HP UX v11) contains several work directories (/docs, /scripts...) 2-Our sysadmin has implemented aggressive disk quotas so I have to compress the files I put here... (2 Replies)
Discussion started by: alan
2 Replies

10. UNIX for Advanced & Expert Users

using gzip

Hi, I am trying to unzip a file that I unmounted onto a unix machine from a cd I had burned in a Windows machine. The file I am trying to unzip is a .tar file... it was originally a .tar.gz file because it was zipped using gzip. I have tried: % gzip -d hpux.tar (where hpux.tar is the file... (2 Replies)
Discussion started by: nattie_h
2 Replies
Login or Register to Ask a Question