08-12-2008
Best way to diff two huge directory trees
Hi
I have a job that will be running nightly incremental backsup of a large directory tree.
I did the initial backup, now I want to write a script to verify that all the files were transferred correctly. I did something like this which works in principle on small trees:
diff -r -q $src_dir $dst_dir >& diffreport.txt
The problem with this is that it is very slow. The directory I am backing up is about 2 TB.
I also tried using the tools find and sum to dump the checksums to two file s, one for source directory and one for destination and comparing them. This is the command I used:
find $src_dir -type f -print0 | xargs -0 sum > src_dir_checksums.txt
find $dst_dir -type f -print0 | xargs -0 sum > dst_dir_checksums.txt
diff src_dir_checksums.txt dst_dir_checksums.txt
But for some reason this produces a different search order for the two directories which are on different machines.
Any help would greatly appreciated.
Thanks in advance,
Sam
9 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
What kind of command can i use to search a directory and subdirectories for all files that do not have the same date? i want to find any files that do not match a date of Sep 13, 2002? Or that have a different owner or group?
Any help appreciated! (2 Replies)
Discussion started by: kymberm
2 Replies
2. Programming
helo,
can u tell me what is exact difference among near,far and huge pointer
Amit (1 Reply)
Discussion started by: amitpansuria
1 Replies
3. UNIX for Advanced & Expert Users
I want to backup all the directory tress, including hidden directories, without copying any files.
find . -type d gives the perfect list.
When I tried tar, it won't work for me because it tars all the files.
find . -type d | xargs tar -cvf a.tar
So i tried rsync.
On my own test box, the... (4 Replies)
Discussion started by: fld2007
4 Replies
4. Shell Programming and Scripting
Hi,
I have in the one folder file and directory that have same name. I need make diff from first directory where exists file in folder FOLDER/filename and second file where not exist folder, but FOLDER is filename. I use -N switch for create new file. Scripts report: Not a directory
Sample:... (2 Replies)
Discussion started by: tomix
2 Replies
5. Shell Programming and Scripting
Hi Expert's,
I need your assitance in tunning one script. I have a mount point where almost 4848008 files and 864739 directories are present. The script search for specific pattern files and specfic period then delete them to free up space. The script is designed to run daily and its taking around... (19 Replies)
Discussion started by: senthil.ak
19 Replies
6. Shell Programming and Scripting
Hi Everyone,
I am writing a shell script for the below needs and would like your suggestions and advices.
I have a lot of scripting files(Shell Scripts) under the directory:
/home/risk_dev/dev
I have another directory which has a lot of shell scripts under the directory:
... (2 Replies)
Discussion started by: filter
2 Replies
7. Shell Programming and Scripting
I have constant trouble with XCOPY/s for multi-gigabyte transfers.
I need a utility like XCOPY/S that remembers where it left off if I reboot. Is there such a utility? How about a free utility (free as in free beer)?
How about an md5sum sanity check too?
I posted the above query in another... (3 Replies)
Discussion started by: siegfried
3 Replies
8. Shell Programming and Scripting
Hi
Can somebody please show me how to check from within a KSH script if a directory exists on that same host when parts of the directory tree are unknown?
If these wildcard dirs were the only dirs at that level then ...
RETCODE=$(ls -l /u01/app/oracle/local/*/* | grep target_dir) ... will... (4 Replies)
Discussion started by: user052009
4 Replies
9. UNIX for Beginners Questions & Answers
hi,
We have a huge directory that ha 5.1 Million files in it. We are trying to get the file name and modified timestamp of the most recent 3 years from this huge directory for a migration project.
However, the ls command (background process) to list the file names and timestamp is running for... (2 Replies)
Discussion started by: subbu
2 Replies
slack(8) System Manager's Manual slack(8)
NAME
slack - Sysadmin's lazy autoconfiguration kit
SYNOPSIS
slack [option ...] [role ...]
DESCRIPTION
slack is a master command which coordinates the activities of its backends, which variously:
o determine the list of roles to be installed on this server
o create a local cached copy of the role files from the central repository
o merge file trees from subroles into a single, unified tree
o install files onto the local filesystem
o run scripts before and after installation
Options you give to slack will be generally passed along to the backends where relevant.
OPTIONS
-h, --help
Print a usage statement.
--version
Print the version and exit.
-v, --verbose
Increase verbosity. Can be specified multiple times.
--quiet
Don't be verbose (Overrides previous uses of --verbose).
-C, --config FILE
Use the specfied FILE for configuration instead of the default, /etc/slack.conf.
-s, --source DIR
Source directory for slack files
-e, --rsh COMMAND
Remote shell for rsync
-c, --cache DIR
Local cache directory for slack files
-t, --stage DIR
Local staging directory for slack files
-r, --root DIR
Root destination for slack files
--no-sync
Skip the slack-sync step (useful if you're pushing stuff into the CACHE outside slack).
--no-files
Don't install any files in ROOT, but tell rsync to print what it would do.
--no-scripts
Don't run scripts
-n, --dry-run
Same as --no-files --no-scripts (CACHE, STAGE will still be updated)
--role-list
Role list for slack-getroles(8).
-b, --backup
Make backups of existing files in ROOT that are overwritten. This option defaults to on if it is not set to 0 in a config file or
disabled with --nobackup on the command line.
--backup-dir
Put backups from the --backup option into this directory.
-H, --hostname HOST
Pretend to be running on HOST, instead of the name given by gethostname(2).
--preview MODE
Do a diff of scripts and files before running them. MODE can be one of 'simple' or 'prompt' (See PREVIEW MODES, below).
--diff PROG
Use this diff program for previews.
--sleep TIME
Randomly sleep between 1 and TIME seconds before starting operations. Useful in crontabs.
PREVIEW MODES
Preview functionality is new in slack 0.14.0. I haven't quite worked out how things will work, so this usage is somewhat subject to change
in future versions. I thought I would try it this way and see how people like it.
In 'simple' mode, after syncing and staging the files directory, slack will present a diff of the files and scripts. In this mode, slack
will not run the preinstall or fixfiles scripts, and because of this, it may provide some false output about permissions changes to files.
In 'prompt' mode, after syncing and staging the files directory, slack will diff the script directory. If there are differences, slack
will present them to you and ask you if you want to continue. If you say no, it will exit. If you say yes, it will stage the scripts
directory, run the preinstall and fixfiles scripts, and then diff the files in the stage with those in the root. If there are differences,
slack will present them to you and ask you if you want to continue. If you say no, it will exit. If you say yes, it will install the
files and run the postinstall script.
So, the 'simple' mode is easy to use, and will be accurate if you don't use fixfiles. The 'prompt' mode will be accurate if you use fix-
files, but requires some interaction.
Why can't we just have one mode that works with fixfiles and requires no interaction? Well, that would require slack to understand what
your free-form fixfiles executable was going to do, which would either require some kind of universe simulator or would require you to
write your fixfiles in a less free-form way, which would make slack less like slack.
EXAMPLES
To install all the roles configured in the role list for a server:
slack
To install a specific role:
slack rolename
To test a new role before checking in the changes:
slack --source user@workstation:/home/user/.../slack rolename
To avoid killing your master server when calling from cron:
slack --sleep 3600
FILES
/etc/slack.conf
SEE ALSO
slack.conf(5), rsync(1)
Administrative commands 2004-10-22 slack(8)