Decompress (with gunzip) recursively, but do not delete original gz file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Decompress (with gunzip) recursively, but do not delete original gz file
# 1  
Old 01-12-2011
Lightbulb Decompress (with gunzip) recursively, but do not delete original gz file

Hi all,

I have a folder hierarchy with many gz files in them. I would like to recursively decompress them, but keep the original files. I would also like to move all the decompressed files (these are very large HDF5 files with .CP12 extension) to another data folder.

Currently I am using four steps to achieve this:

1. Make a copy of the source directory hierarchy:
Code:
cp -R old_archive/ new_archive/

2. Inside new_archive, gunzip recursively:
Code:
find . -name "*.gz" -exec gunzip {} \;

3. Move all decompressed files from new_archive hierarchy to a data folder:
Code:
find . -name "*.CP12" | xargs -I {} mv -iv {} ~/data/

4. Remove new_archive (empty hierarchy)
Code:
rm -rf new_archive/

This works. old_archive contains gz files and data contains the decompressed versions. But is time consuming.

My question is: how can I perform this recursive extraction efficiently? I would like to avoid step 1, since it takes a very long time (terabyte size datasets).

I need to prevent gunzip's default behavior (removing the original gz file). Since, grep -d or gunzip has a "-c" option to extract to stdout, how I can recursively extract and put in data?

If I write a simple shell script for this, would running gunzip -c hdf-file.gz> hdf-file recursively be more efficient that doing cp and then gunzip? Note that the decompressed files can be very big (gigabyte size each) and so I also want to prevent sized related errors during the pipe process. Could someone comment on this. Thanks in advance!

---------- Post updated at 06:21 PM ---------- Previous update was at 06:03 PM ----------

Here is another script version for the same:

Code:
#!/bin/bash

shopt -s extglob

for dir in *
do
  if [ -d $dir ]
  then
    echo "--- Entering directory $dir ---"
        for file in "${dir}"/*.gz
         do
            fname=`basename "$file" .gz`
            echo "Now processing $fname ..."
            gunzip -cv "$fname.gz" > "$fname"
            mv -iv "$fname" ~/data
         done
  fi
done

Is there a better way?

Last edited by gansvv; 01-12-2011 at 07:26 PM.. Reason: added another version
# 2  
Old 01-12-2011
What Operating System and version are you running?
What Shell do you use?
Can you post a sample directory listing of a representative directory?
Are old_archive and new_archive on the same filesystem? This question is very important because of the way "mv" works.
Scheduling. How often do you run this job? There would appear to be opportunity to carry out the online backup of the original files in advance.

Quote:
But is time consuming.
How much time? Seconds, Minutes, Hours, Days, Weeks ?


Quote:
sized related errors during the pipe process
Not clear what this means. Please post what you typed, what you expected to happen, what actually happened. Don't forget to include any error messages and a "ls -la" directory listing of any files involved.
# 3  
Old 01-13-2011
Lightbulb

@Methyl: I am running Ubuntu Server 10.10 and using bash.

The directory structure is like this:
/old_archive/
--/year2009/
----/001/
----/002/
.
.
----/300/
--/year2008/
----/001/
----/002/
.
.
(each of the 001/ to 300/ folders has 10+ large gz files).

Yes, currently both archive locations are on the same filesystem. BTW, the disks are setup as RAID 0.

And, the entire process takes to the order of days. Its not run often (perhaps once a month) but I like your "online" backup idea. I will try that. That brings up another interesting idea: Is there a was to parallelize the cp or mv operations? Can I break it into execution threads running simultaneously?

About the piping error I mentioned: I did not actually see any such errors. But I was wondering if sending gigabyte sized decompressed file to stdout (and piping to a file) has a chance of generating errors. Has anyone seen such problems?

Thanks for your reply!
# 4  
Old 01-13-2011
#!/bin/bash

Quote:
shopt -s extglob

for dir in *
do
if [ -d $dir ]
then
echo "--- Entering directory $dir ---"
for file in "${dir}"/*.gz
do
fname=`basename "$file" .gz`
echo "Now processing $fname ..."
gunzip -cv "$fname.gz" > "$fname"
mv -iv "$fname" ~/data
done
fi
done
Assuming that I have understood this correctly, I think that the script contains fundamental design errors which makes it slow. Writing gigabytes using Shell redirect ">" is not a good idea.

It would be considerably faster to copy the zipped files directly to the target directory then unzip in the target directory using "gunzip" (not "gunzip -c") on the file copy. Maybe you had an issue copying the directory tree?

The original process describes copying the original tree, decompressing each file, then copying the decompressed files to the target tree. It is much easier to copy the whole tree of compressed files to the target directory using "find ." piped to "cpio -pdum /target_directory" then decompress in the target directory. This technique for copying files is described in the man pages for "find" and "cpio" - do read both manuals and try on a test system first. Not clear whether there is anything present in the target directories already.

My idea only makes sense if you are copying all files. If ".CP12" files are a selection then we need a different technique. It also matters if the various directories are on different filesystems (because "mv" becomes a copy rather than a rename if they are).

Last edited by methyl; 01-13-2011 at 09:09 AM.. Reason: typos
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to decompress files using gunzip?

I have compressed files under directory '/root/data' and i need the uncompressed files in another directory '/root/uncom'. I running a shell script below shell script from directory '/root/' gunzip /root/data/*.gz -d /root/uncom But this is failing with : gunzip: /root/uncom is a directory... (2 Replies)
Discussion started by: hoyanet
2 Replies

2. Shell Programming and Scripting

How to decompress files using gunzip?

I have compressed files under directory '/root/data' and i need the uncompressed files in another directory '/root/uncom'. I running a shell script below shell script from directory '/root/' gunzip /root/data/*.gz -d /root/uncom But this is failing with gunzip: /root/uncom is a directory... (2 Replies)
Discussion started by: vel4ever
2 Replies

3. UNIX for Dummies Questions & Answers

How to delete original files after using a tar operation.

I have a list of log files in a directory. Once i tar them I need to remove the original log files. How do i do it? (4 Replies)
Discussion started by: manutd
4 Replies

4. Shell Programming and Scripting

How to delete a duplicate line and original with sed.

I am completely new to shell scripting but have been assigned the task of creating several batch files to manipulate data. My final task requires me to find lines that have duplicates present then delete not only the duplicate but the original as well. The script will be used in a windows... (9 Replies)
Discussion started by: chino_1
9 Replies

5. UNIX for Advanced & Expert Users

Delete empty directories recursively - HP-UX

Hi, I want to delete all empty directories in a long directore tree structure. I want to use that from a script that will run on HP-UX 11. My definition of empty directory is that there is no regular file under it and directly beneath it. To elaborate, I have below directories. /app/dev/java... (14 Replies)
Discussion started by: asutoshch
14 Replies

6. Shell Programming and Scripting

recursively delete the text between 2 strings from a file

i have 200000bytes size of a unix file i need to delete some text between two strings recursively using a loop with sed or awk . these two strings are : 1st string getting from a file :::2 nd string is fi...its constant . can anyone help me sed -n'/<1 st string >/,/fi/' <input_filename> is the... (2 Replies)
Discussion started by: santosh1234
2 Replies

7. UNIX for Dummies Questions & Answers

Using gunzip to decompress .zip file

Hi, I have a zipped Oralce software (linux.x64_11gR1_database.zip) and I need to unzip it. On the Linux server, I only see the gunzip utility listed: $ ls -ltr *zip* -rwxr-xr-x 3 root root 60320 Jan 15 2008 gzip -rwxr-xr-x 3 root root 60320 Jan 15 2008 gunzip Can I use the command... (1 Reply)
Discussion started by: syang68
1 Replies

8. Shell Programming and Scripting

Delete original wav file if lame was successful encoding.

In a bash script: src=”cooltrack.wav” dst=”cooltrack.mp3” lame $src $dst I would like to add some line that would delete the source wav file like: rm $src but I would like this only if the encoding was successful. What should I include before deleting the original to check that the... (2 Replies)
Discussion started by: Aia
2 Replies

9. UNIX for Dummies Questions & Answers

Decompress a .SP file

Hi to all, I'm new in this forum and I just need a quick answer from the experts out there, I have a 2761.sp file, in short a ".SP" file, it was given to me compressed, and I need to decompress it before Monday. I saw a software named Splint, but it shows an error, a DMI error message, so I dont... (3 Replies)
Discussion started by: jeco
3 Replies

10. HP-UX

decompress in HPUX11 by Gunzip and gzip

Hello evrebody , I have HPUX-11, i try to install "Mozilla" and "unzip utillity" I cannot decompress file (F.tar.gz) or file(F.gz) by commandes: gunzip -dv F.tar.gz | tar -xvf gunzip F.tar.gz or gzip -dv F.tar.gz |tar -xvf gzip F.gz maybe someone know What's the reason? maybe i... (1 Reply)
Discussion started by: yanly64
1 Replies
Login or Register to Ask a Question