Decompress (with gunzip) recursively, but do not delete original gz file
Hi all,
I have a folder hierarchy with many gz files in them. I would like to recursively decompress them, but keep the original files. I would also like to move all the decompressed files (these are very large HDF5 files with .CP12 extension) to another data folder.
Currently I am using four steps to achieve this:
1. Make a copy of the source directory hierarchy:
2. Inside new_archive, gunzip recursively:
3. Move all decompressed files from new_archive hierarchy to a data folder:
4. Remove new_archive (empty hierarchy)
This works. old_archive contains gz files and data contains the decompressed versions. But is time consuming.
My question is: how can I perform this recursive extraction efficiently? I would like to avoid step 1, since it takes a very long time (terabyte size datasets).
I need to prevent gunzip's default behavior (removing the original gz file). Since, grep -d or gunzip has a "-c" option to extract to stdout, how I can recursively extract and put in data?
If I write a simple shell script for this, would running gunzip -c hdf-file.gz> hdf-file recursively be more efficient that doing cp and then gunzip? Note that the decompressed files can be very big (gigabyte size each) and so I also want to prevent sized related errors during the pipe process. Could someone comment on this. Thanks in advance!
---------- Post updated at 06:21 PM ---------- Previous update was at 06:03 PM ----------
Here is another script version for the same:
Is there a better way?
Last edited by gansvv; 01-12-2011 at 07:26 PM..
Reason: added another version
What Operating System and version are you running?
What Shell do you use?
Can you post a sample directory listing of a representative directory?
Are old_archive and new_archive on the same filesystem? This question is very important because of the way "mv" works.
Scheduling. How often do you run this job? There would appear to be opportunity to carry out the online backup of the original files in advance.
Quote:
But is time consuming.
How much time? Seconds, Minutes, Hours, Days, Weeks ?
Quote:
sized related errors during the pipe process
Not clear what this means. Please post what you typed, what you expected to happen, what actually happened. Don't forget to include any error messages and a "ls -la" directory listing of any files involved.
@Methyl: I am running Ubuntu Server 10.10 and using bash.
The directory structure is like this: /old_archive/
--/year2009/
----/001/
----/002/
.
.
----/300/
--/year2008/
----/001/
----/002/
.
.
(each of the 001/ to 300/ folders has 10+ large gz files).
Yes, currently both archive locations are on the same filesystem. BTW, the disks are setup as RAID 0.
And, the entire process takes to the order of days. Its not run often (perhaps once a month) but I like your "online" backup idea. I will try that. That brings up another interesting idea: Is there a was to parallelize the cp or mv operations? Can I break it into execution threads running simultaneously?
About the piping error I mentioned: I did not actually see any such errors. But I was wondering if sending gigabyte sized decompressed file to stdout (and piping to a file) has a chance of generating errors. Has anyone seen such problems?
for dir in *
do
if [ -d $dir ]
then
echo "--- Entering directory $dir ---"
for file in "${dir}"/*.gz
do
fname=`basename "$file" .gz`
echo "Now processing $fname ..."
gunzip -cv "$fname.gz" > "$fname"
mv -iv "$fname" ~/data
done
fi
done
Assuming that I have understood this correctly, I think that the script contains fundamental design errors which makes it slow. Writing gigabytes using Shell redirect ">" is not a good idea.
It would be considerably faster to copy the zipped files directly to the target directory then unzip in the target directory using "gunzip" (not "gunzip -c") on the file copy. Maybe you had an issue copying the directory tree?
The original process describes copying the original tree, decompressing each file, then copying the decompressed files to the target tree. It is much easier to copy the whole tree of compressed files to the target directory using "find ." piped to "cpio -pdum /target_directory" then decompress in the target directory. This technique for copying files is described in the man pages for "find" and "cpio" - do read both manuals and try on a test system first. Not clear whether there is anything present in the target directories already.
My idea only makes sense if you are copying all files. If ".CP12" files are a selection then we need a different technique. It also matters if the various directories are on different filesystems (because "mv" becomes a copy rather than a rename if they are).
Last edited by methyl; 01-13-2011 at 09:09 AM..
Reason: typos
I have compressed files under directory '/root/data' and i need the uncompressed files in another directory
'/root/uncom'. I running a shell script below shell script from directory '/root/'
gunzip /root/data/*.gz -d /root/uncom
But this is failing with :
gunzip: /root/uncom is a directory... (2 Replies)
I have compressed files under directory '/root/data' and i need the uncompressed files in another directory
'/root/uncom'. I running a shell script below shell script from directory '/root/'
gunzip /root/data/*.gz -d /root/uncom
But this is failing with
gunzip: /root/uncom is a directory... (2 Replies)
I am completely new to shell scripting but have been assigned the task of creating several batch files to manipulate data. My final task requires me to find lines that have duplicates present then delete not only the duplicate but the original as well. The script will be used in a windows... (9 Replies)
Hi,
I want to delete all empty directories in a long directore tree structure. I want to use that from a script that will run on HP-UX 11. My definition of empty directory is that there is no regular file under it and directly beneath it. To elaborate, I have below directories.
/app/dev/java... (14 Replies)
i have 200000bytes size of a unix file i need to delete some text between two strings recursively using a loop with sed or awk . these two strings are : 1st string getting from a file :::2 nd string is fi...its constant . can anyone help me sed -n'/<1 st string >/,/fi/' <input_filename> is the... (2 Replies)
Hi,
I have a zipped Oralce software (linux.x64_11gR1_database.zip) and I need to unzip it. On the Linux server, I only see the gunzip utility listed:
$ ls -ltr *zip*
-rwxr-xr-x 3 root root 60320 Jan 15 2008 gzip
-rwxr-xr-x 3 root root 60320 Jan 15 2008 gunzip
Can I use the command... (1 Reply)
In a bash script:
src=”cooltrack.wav”
dst=”cooltrack.mp3”
lame $src $dst
I would like to add some line that would delete the source wav file like:
rm $src
but I would like this only if the encoding was successful.
What should I include before deleting the original to check that the... (2 Replies)
Hi to all, I'm new in this forum and I just need a quick answer from the experts out there, I have a 2761.sp file, in short a ".SP" file, it was given to me compressed, and I need to decompress it before Monday. I saw a software named Splint, but it shows an error, a DMI error message, so I dont... (3 Replies)
Hello evrebody ,
I have HPUX-11, i try to install "Mozilla" and "unzip utillity"
I cannot decompress file (F.tar.gz) or file(F.gz) by commandes:
gunzip -dv F.tar.gz | tar -xvf
gunzip F.tar.gz
or
gzip -dv F.tar.gz |tar -xvf
gzip F.gz
maybe someone know What's the reason?
maybe i... (1 Reply)