I have an issue processing a large amount of files. I have around 5 million files (some of them are actually directories) in a server.
I am unable to find out the exact number of files since it's taking "forever" to finish (See this thread for more on the issue).
Anyway, now I want to move these ~5 million files to a different location so my first thought was to tar/gzip the files and SCP them somewhere else as a single file however the tar process is also taking a loooong time to finish (in fact it never finished and I cancelled the job after 10 hours).
Basically I just want to build a single package containing the ~5 million files (zip, tar, cpio, raw data, whatever) so that I can easily move and transfer the files to a different location.
Here is your problem - you are reading millions of directory entries and writing to a tarfile.
Then you copy the tarfile somewhere, then extract. Tons of I/O writing the tarball, I/O copying it, I/O extracting it.
Eliminate the "middleman I/O".
FWIW:
Plus, assuming you actually want the data, you are perpetuating the problem - way too many file entries per directory. You really should reorganize the directory structure. It is probably not possible that users are reading those files very often or you would have lots of user complaints 'It takes forever to read a file...'
That said:
moving a directory tree from node to node, eliminate the middleman processing:
using tar to relocate on the same box, eliminate the middleman
BTW don't kill this job off until it is done - it could take forever. You will never make any progress if you kill of these jobs.
This User Gave Thanks to jim mcnamara For This Post:
When you say somewhere else, I guess you mean a different server ?
I imagine the "gzip" and the ssl encryption in scp are both adding quite an overhead.
There is a tradeoff between network speed, cpu speed and disk speed. You may find it more efficient leaving out the gzip if you have a fast network. Also if you are not worried about security using "rsh" instead of "ssh/scp" will be quicker. Probably the quickest would be:
However if network speed is slower and you need security something like "rsync" with the "-z" option may be better for you.
Please state what Operating System you have and describe your hardware configuration including memory and discs and enything relevant to performance.
Is it safe to assume that the filesystem will be quiescent?
Do you have spare discs equivalent to say twice the existing space? Personally I would copy the entire filesystem first to produce a defragmented filesystem which runs at a reasonable speed. This is also intended to prove that the original disc can be read from end-to-end.
Different filesystems on the same server will be orders of magnitude quicker.
Probably one of the fastest is the one mentioned previously:
And this has the added benefit of making the files contiguous on the new filesystem.
This also doesnt compress or encrypt the files, both of which will hit the cpu.
Since the original request was creating more issues than expected I have opted for a whole disk backup (the disk is not that big)... Faster and less problematic.
Hi all i am very new to shell scripting and need some help from you to learn
1)i have some log files that gets generated on daily basis example: i have abc_2017_01_30_1.log ,2017_01_30_2.log like wise so i want to zip this 4 logs which are created on same date into one zip folder.
2)Post zipping... (1 Reply)
Hi all i am very new to shell scripting and need some help from you to learn
1)i have some log files that gets generated on daily basis example: i have abc_2017_01_30_1.log ,2017_01_30_2.log like wise so i want to zip this 4 logs which are created on same date into one zip folder.
2)Post zipping... (2 Replies)
I need a script file for backup (zip or tar or gz) of old log files in our unix server (causing the space problem). Could you please help me to create the zip or gz files for each log files in current directory and sub-directories also?
I found one command which is to create gz file for the... (4 Replies)
Hi
I have a requirement in unix shell where I need to zip multiple files on server to one single .zip file. I dont see zip command in AIX and gzip command not doing completely what I want.
One I do .zip file, I should be able to unzip in my local Computer.
Here is example what I want... (9 Replies)
Need to
1. archive all the files in a directory from the previous month into a tar/gz file, ignoring all already archived 'tar.gz' files
2. Check created .tar.gz file isnt corrupted and has all the required files in it. and then remove the original files.
I am using a function to get the... (1 Reply)
Hi,
I need a help with zip and tar. I have no done any scripts before with zip command.
What I need to achieve is list files in a directory with a specific name (ID_DATE format- given examples) and then zip (or gunzip which I need to use, I am not sure) with timestamp on the file name and then... (15 Replies)
Hello,
I just saw that on my vps (centOS) my oscommerce with a seo script
has created millions of tmp files inside the /html/cache/ directory.
I would need to remove all those files (millions), I tried via shell but the vps
loads goes to very high and it hangs, is there some way to do a... (7 Replies)
Hi Guru's,
I have to write a shell script which groups file names based upon the certain matching string pattern, then creates the Tar file for that particular group of files and then zips the Tar file created for the respective group of files.
For example, In the given directory these files... (3 Replies)
Hi,
I want to backup my SQL database using tar zip but I'm paranoid that I will archive it. What I mean is I want the files to stay where they are but make a zipped copy of the files as well, I don't want to delete the originals.
Is the command?
tar -cvzf databasename.tar.gz... (1 Reply)
Hello,
I am trying to return the name of the resulting file from a .zip archive file using unix unzip command.
unzip c07212007.cef7081.zip
Archive: c07212007.cef7081.zip
SecureZIP for z/OS by PKWARE
inflating: CEP/CEM7080/PPVBILL/PASS/G0063V00
I used the following command to unzip in... (5 Replies)