The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
unzip particular gzip files among the normal data files thepurple Shell Programming and Scripting 4 11-30-2007 11:17 AM
gzip all the files in a directory er_ashu UNIX for Dummies Questions & Answers 2 11-06-2007 09:05 AM
Need to gzip LARGE files LordJezo UNIX for Dummies Questions & Answers 2 05-02-2005 04:18 PM
gzip, multiple files smbodnar UNIX for Dummies Questions & Answers 2 11-11-2002 04:29 PM
Two Files Created For Every One? Atama UNIX for Dummies Questions & Answers 1 04-12-2002 04:44 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 10-13-2008
eisenhorn eisenhorn is offline
Registered User
  
 

Join Date: Oct 2008
Posts: 3
Gzip files as they are created

Hello. I have a scripting query that I am stumped on which I hope you can help with.

Basically, I have a ksh script that calls a process to create n number of binary files. These files have a maximum size of 1Gb. The process can write n number of files at once (parallel operation) based on the paralellisation parameter fed into the script at the start. Normally we would wait for this process to complete and then gzip all the files individually (gzip *.dmp for example). However, on some systems we don't have enough disk space to wait until all the 1Gb files have been produced.

I have previously written some code to gzip the files in parallel (see below), however, I now need to gzip them in parallel whilst the first process runs. I need to be careful not to attempt to gzip any files currently being written (up to n from the parallel command), so some sort of looping will be required. And I want to maintain the option of parallel gzip if possible.

Code:
...
gzip_func() {
started=0
threads=4
for filename in `ls -1 ${EXP_DIR}/*.dmp`
do
 if [[ ${started} -lt ${threads} ]]; then
  let started=started+1
  echo "gzip ${filename}"
  ( $GZIPCMD ${filename} ) &
  list_of_pids="${list_of_pids} $!"
 else
  print "wait ${list_of_pids}"
  wait ${list_of_pids}
  list_of_pids=""
  started=0
 fi
done
}
...
my_binary_file_creation_process
...
while [ `find ${EXP_DIR} -name \*.dmp|wc -l` -gt "0" ]; do
 gzip_func
 print "wait ${list_of_pids}"
 wait ${list_of_pids}
 list_of_pids=""
done
Can anyone help me write some code for this using standard solaris 8/9/10 tools using the korn shell. Perl commands should be possible (vers 5.6.1 installed).

Many thanks and Best Regards,
Stephen.
  #2 (permalink)  
Old 10-13-2008
cfajohnson's Avatar
cfajohnson cfajohnson is offline Forum Advisor  
Shell programmer, author
  
 

Join Date: Mar 2007
Location: Toronto, Canada
Posts: 2,310
Quote:
Originally Posted by eisenhorn View Post
Code:
...
gzip_func() {
started=0
threads=4
for filename in `ls -1 ${EXP_DIR}/*.dmp`

Not only is -1 unnecessary, but so is ls itself. Also, ls will break your script if there are any spaces in the filenames.

Code:
for filename in ${EXP_DIR}/*.dmp
Quote:
Code:
do
 if [[ ${started} -lt ${threads} ]]; then
  let started=started+1

Use standard syntax:

Code:
 if [ ${started} -lt ${threads} ]; then
 started=$(( $started + 1 ))
Quote:
Code:
  echo "gzip ${filename}"
  ( $GZIPCMD ${filename} ) &

Quote the variable, or your script will break if there are spaces in the filename (and there's no need for the parentheses):

Code:
  $GZIPCMD "$filename" &
Quote:
Code:
  list_of_pids="${list_of_pids} $!"
 else
  print "wait ${list_of_pids}"
  wait ${list_of_pids}
  list_of_pids=""
  started=0
 fi
done
}
...
my_binary_file_creation_process
...
while [ `find ${EXP_DIR} -name \*.dmp|wc -l` -gt "0" ]; do

What wrong with:

[code]
for
Quote:
Code:
 gzip_func
 print "wait ${list_of_pids}"
 wait ${list_of_pids}
 list_of_pids=""
done
Can anyone help me write some code for this using standard solaris 8/9/10 tools using the korn shell. Perl commands should be possible (vers 5.6.1 installed).

Your code looks far more complicated than it needs to be.

It's not clear from your code how you tell whether a file is finished being written to so that you can compress it.

Do you have any control over the process that is writing the binary files?
  #3 (permalink)  
Old 10-14-2008
eisenhorn eisenhorn is offline
Registered User
  
 

Join Date: Oct 2008
Posts: 3
Thanks cfajohnson. I will try to incorporate your recommendations.

However, for my real problem, the oracle export utility (the process that creates the binary files) will create a file and then populate it with data up until it reaches 1Gb in size, then it will create a new file. If we use parallelisation, it will create n number of files (one for each parallel process) and fill them. The final binary files created could and probably would be less the 1Gb.

My thought was to call the gzip func before the export utility and then have it wait for files to gzip, i.e. only gzip files if there are more than the parallel number n. So if parallel was set to 4, only gzip the 5th file.

Thinking it through, I find it hard to identify which file the gzip program should gzip as we can't just zip files of 1Gb in size as it could still be finishing writing to the file, etc. Could I use something like fuser to identify if the export tool has finished with the file? perhaps some form of looping gzip that waits for the fuser to return no pid for an export file and then zips it? I have looked at an export and can see that when the utility is finished writing the file it no longer locks it so this could be feasible.

I would welcome your ideas.

Best Regards.
  #4 (permalink)  
Old 10-14-2008
cfajohnson's Avatar
cfajohnson cfajohnson is offline Forum Advisor  
Shell programmer, author
  
 

Join Date: Mar 2007
Location: Toronto, Canada
Posts: 2,310

As soon as a new file is created, you can gzip the previous one.
  #5 (permalink)  
Old 10-15-2008
eisenhorn eisenhorn is offline
Registered User
  
 

Join Date: Oct 2008
Posts: 3
Quote:
Originally Posted by cfajohnson View Post
As soon as a new file is created, you can gzip the previous one.
Not the most enlightening statement but I understand what you mean.

I did some playing and found that the export program will create a file of 4k initially, then stop using it while it builds up a list of objects to export. It then returns a lock on the file and fills the file up to the 1gb size.

I modified my code to use a du and fuser test to check that the file was bigger than 4k and was not being used by any user. I find that if i use parallelism on the export the program will create n number of files and then populate them, it may start with files 1,2,3, and 4, but 1,3, and 4 suddenly reach 1Gb so it creates 5,6,7 to continue the parallel tasks. file 2 is still not full (for whatever reason).

Files 1,3, and 4 are now unused but the gzip_func does not seem to want to gzip the files until file 2 is also unused - which often might not be until the end of the export. Can you please have a look at the code below and see if you can spot an obvious error? I want the code to really start gzipping when the 2 tests are passed, whether it can only gzip 1 file or up to n threads. Any ideas?

Code:
gzip_func() {
started=0
threads=4
for filename in ${EXP_DIR}/*.dmp
do
 # Check if file is bigger than 8K and is not being used
 if [ `du -sk "${filename}"|awk '{print $1}'` -gt "8" ] && [ `fuser "${filename}" 2>/dev/null | wc -m` -eq "0" ]; then
   # Loop through files until 4 are started (to match threads)
   if [ ${started} -lt ${threads} ]; then
    started=$(( $started + 1 ))
    echo "gzip ${filename}"
    $GZIPCMD "${filename}" &
    list_of_pids="${list_of_pids} $!"
   else
    print "wait ${list_of_pids}"
    wait ${list_of_pids}
    list_of_pids=""
    started=0
   fi
 else
   echo "${filename} is still being written to, trying next file..."
 fi
done
}
 
# Export creation - note done in background to allow gzip loop to run
expdp '"/ as sysdba"' directory=DPUMP_DIR_ADHOC dumpfile=ram6_full%U.dmp logfile=ram6_full.log filesize=1024m full=y parallel=4 &
 
while [ `find ${EXP_DIR} -name \*.dmp|wc -l` -gt "0" ]; do
 gzip_func
 print "out of loop wait ${list_of_pids}"
 wait ${list_of_pids}
 list_of_pids=""
 sleep 5
done
Sponsored Links
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 10:26 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language translation by Google.
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0