I have a 13G gz archive... The problem is that when I expand it, it goes to 300G and I don't have so much of hdd space. The file is a one huge file: rrc.tar.gz. What I want to do is to extract the archive but at each step gzip the resulting file.
So, if
gives me an uncompressed directory, I want each of the files to be gzipped as and when they are extracted. So for example, if the resulting directory is something like
2007/fileA.txt
2007/fileB.txt
I want fileA.txt to be gzipped into fileA.txt.gz before it goes and extracts fileB.txt. Is there a way that this is possible?
I was succesful with this
gunzip -c rr.tar.gz | tar -tf - > contents
for f in `cat contents`; do gunzip -c rr.tar.gz | tar -xf - $f; gzip $f; done
but it has a huge drawback of doing gunzip of the whole file and extracting just one.
It'd be much better to do "gunzip -c" once, and then parse the output
google for two interesting tar's options: --to-stdout (-O) and --to-command=
I need to go now, but Ill be glad if you share the solution with us. It's interesting.
Thank You for the advice. There is a heavy resource constraint so I will try to explore more. I could think of one solution and I would appreciate if someone could provide a better one...
I would do a
And then setup a crontab to look for new files in the current directory with a certain extension. If there is, then I would gzip the file. This is the simplest I could think of. Please let me know if it is the best though
Coming to think of it, I am now facing another problem. If the file is in the middle of execution, there is a chance that the cron will take even this file into consideration and execute a gzip on it which could be a problem. Is there a way to tell the find command to find only those files which are not being accessed by any other process?
I never imagined, I would face so many problems with a directory archived the wrong way In any case, I was able to convert a complete directory archive into a directory of archives. In any case, here's a solution for those interested:
Problem:
A directory was gzipped on the whole. So, it becomes almost impossible to extract data from some particular files efficiently.
Constraint:
The archive is 13G and expands into 250G but the disk capacity is 50G
Conventional Answer:
13G Directory Archive --> Expands to 250G --> Converted into 13G Directory of Archives
Answer:
Convert the directory archive into a directory of archives.
Solution:
Step 1:
Prepare a shell script and place it in the directory where the archive is to be extracted: checkAndGzip.sh Note: Observe the usage of lsof which is a nice utility that tells if the file is in use.
Step 2:
Setup a cron as
Note: Cron runs every two minutes
Step 3:
Run this command in the directory:
Logic:The logic is pretty simple. On the one hand, the extraction takes place and on the other, the cron executes a shell script that checks if a new file has been generated and then gzips it. The reason why we use lsof is to verify if the file is still being extracted (gzip doesn't seem to care about partial files) and if a file is in use, skip it during this run.
If anyone has a better solution, or have an improvement for the above solution, kindly suggest
and a few suggestions:
- You can consolidate all the commands in one script,
- and you can use the sleep command within the script, instead of setting up a cron process, running every 2 min,
Thanks for the improvement Actually, on my system, for some reason, the find command doesn't work. I mean, extraction is taking place but the gzipping part doesn't seem to work.
The first time find runs, it doesn't find any files (or finds only a few files in use) so it exits out of the loop... is that correct by any chance?
Requirement:
Under fuse application we have placeholders called containers;
Every container has their logs under:
<container1>/data/log/fuse.log
<container1>/data/log/fuse.log.1
<container1>/data/log/fuse.log.XX
<container2>/data/log/fuse.log... (6 Replies)
I have a script and it works fine, but I am sure this can be shrunk down to something much better. I would appreciate someone taking a crack at it for me.
What it does is take the ip block submitted and breaks it out down to /24's.
#!/bin/ksh
ipadd=${1}
octet1=`echo $ipadd | nawk -F.... (3 Replies)
hello
I have browsed for the similar requirement i found this
https://www.unix.com/shell-programming-scripting/40163-xls-csv-conversion.html
but my problem is i have multiple tabs in xls file having same metadata
I want to convert it into single csv file
any ways to do it pls... (5 Replies)
Hi All,
I have a problem to find number of lines per column smaller than the values given in a different file. In example, compare the 1st column of file1 with the 1st line of the file2, 2nd column of file1 with the 2nd line of the file2, etc
cat file1
0.2 0.9 0.8 0.5 ...
0.6 0.5... (9 Replies)
Hi there,
I have one huge archive (it's a system image).
I need sometime to create smaller archives with only one or two file from my big archive.
So I'm looking for a command that extracts files from an archive and pipe them to another one.
I tried the following :
tar -xzOf oldarchive.tgz... (5 Replies)
Hi All,
I am trying to grab a term which is just smaller and larger than the assigned value using the below code. But there seems to be some problem. The value i assign is 25 so i would expect it to output a smaller value to be 20 instead of 10 and 20 and larger value to be 30 instead of 30 and... (3 Replies)
I'm writting a shell script to email customers invoices.
after each RCPT to: email@address.com, i've put in a sleep 1 command,
but with 2000 customers to email and about 5 or 6 of these sleep commands it can take a very long time.
Is there any smaller amount of time the sleeper can sleep for... (4 Replies)
Discussion started by: markms
4 Replies
8. Post Here to Contact Site Administrators and Moderators
Not to complain, but the large graphic at the top of the page ensures that I have to scroll down everytime I change pages. Maybe the problem is the 1024x768 of my laptop screen, but it does seem excessive.
Keith (2 Replies)