Simple directory tree diff script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Simple directory tree diff script
# 15  
Old 03-19-2013
One of the reasons why cmp may not be taking so much time in this case is that most of these files are quite small (<4k). Cygwin is not like a native linux install since it runs on top of windows. I find that many tasks take much longer to complete under cygwin than in a native linux environment. One other difference for me is that I did not evaluate the output of cmp in the script as I did for the two stat operations. I just did a redirect of the cmp output to a file. As far as I understand it, cmp does not produce output when the files are the same. For stat, you are doing two calls to stat, assigning the results of both calls to shell variables, evaluating the variables in a conditional, and then printing output if the conditional evaluates true. For cmp, you are just passing the two file names to cmp. All of the lifting for cmp is done in the compiled binary, where much of what is done for stat is in the script. It may be that for small files, there isn't much difference. I didn't test this extensively. My script has run for more than 24 hours so far. I didn't think to put a counter in to print to the terminal, so I don't have any way to know how close it is to being finished. There is no output to the file yet, so nothing has been found to be different so far. At a rate of 10000 per minute, it should have finished long ago, but there are some very large files here that could take quite a while.

I don't know how long I will let this run. I am fairly well convinced that if there were problems, I would have had something report as different by now.

LMHmedchem
# 16  
Old 03-19-2013
Hi.

I found a copy of cygwin as a guest in virtual machine Windows-7 install. The hardware is a Xeon CPU,the host OS is Debian, virtualization is VMWare, but the memory allocation was decreased because of other VMs running -- down to 300 MB.

I changed my timing script slightly to allow the basic tasks to be run. The results:
Code:
./s1
OS, ker|rel, machine: CYGWIN_NT-6.1, 1.7.16(0.262/5/3), i686
bash GNU bash 4.1.10
stat (GNU coreutils) 8.15
cmp (GNU diffutils) 3.2

-----
 Input data file f1 f2:
==> f1 <==
Preliminary Matter.

This text of Melville's Moby-Dick is based on the Hendricks House edition.

==> f2 <==
Preliminary Matter.

This text of Melville's Moby-Dick is based on the Hendricks House edition.

-----
 Results, time for 20 stat calls:

real    0m12.750s
user    0m0.015s
sys     0m5.947s

-----
 Results, time for 20 cmp calls:

real    0m6.281s
user    0m0.152s
sys     0m2.950s

-----
 Results of internal perl stat calls:
 Length of f1, f2: 1205404, 1205404
 Called stat 101 (-1) times on each file, compared sizes, expected 1205404.

real    0m0.938s
user    0m0.046s
sys     0m0.420s

This agrees with LMHmedchem's comparison of stat and cmp. Note that perl does 5 times as much as the stat portion in less than 1/10 the time. I remain amazed that command stat is so slow.

So if the comparison needs to be re-run, I'd suggest a perl code.

Best wishes ... cheers, drl
# 17  
Old 03-20-2013
I would like to give the perl code a try, but I'm not very good with perl. The code you posted runs the same test many times on random values. What I need to do is to read a path from a file, here is a sample of the sorted find file.
.
./_copy.sh
./_database_project
./_database_project/12-11-10
./_database_project/12-11-10/_database_notes_12-11-10.txt
./_database_project/12-11-10/test.db.sqlite
./_database_project/12-11-10/test.db.sqlite$
./_database_project/12-11-10/test_input1.txt
./_database_project/12-11-10/test_input1.xlsx
./_database_project/12-11-10/test_input2.txt

I need to strip the leading "." and then append two different root paths to create a path for each file in a matching pair on the two drives.

/cygdrive/e/_Data_Level/_copy.sh
/cygdrive/i/_Data_Level/_copy.sh

/cygdrive/e/_Data_Level/_database_project (this is a directory)
/cygdrive/i/_Data_Level/_database_project (this is a directory)

/cygdrive/e/_Data_Level/_database_project/12-11-10 (this is a directory)
/cygdrive/i/_Data_Level/_database_project/12-11-10 (this is a directory)

/cygdrive/e/_Data_Level/_database_project/12-11-10/_database_project/12-11-10/_database_notes_12-11-10.txt
/cygdrive/i/_Data_Level/_database_project/12-11-10/_database_project/12-11-10/_database_notes_12-11-10.txt

Each pair needs to be compared for length.

$s1 = ( stat("/cygdrive/e/_Data_Level/_copy.sh") )[7];
$s2 = ( stat("/cygdrive/i/_Data_Level/_copy.sh") )[7];
if ( $s1 != $s2 ) {
print $f3 " Found mismatch at iteration $i\n";
$j++;
}

I'm not sure what perl will do with the entries in the find file that are directories and not files.

LMHmedchem
# 18  
Old 03-20-2013
Code:
# time find / -iname \* -exec stat -c"%n  %s" {} + > file

real    0m4.829s
user    0m1.432s
sys    0m3.112s
# wc -l file
310854 file

This was 300000 files only, but may be extrapolateable. Do this on both file systems and diff the resulting files. May need to be sorted as find doesn't guarantee a certain ordering. Make sure to use the + sign to end the find command.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to run a script/command on all the directories in a directory tree?

How to run a script/command on all the directories in a directory tree? The below script is just for the files in a single directory, how to run it on all the directories in a directory tree? #!/bin/sh for audio_files in *.mp3 do outfile="${audio_files%.*}.aiff" sox "$audio_files"... (2 Replies)
Discussion started by: temp-usr
2 Replies

2. Shell Programming and Scripting

Shell script to build directory tree and files

Hi all, I'm trying at the moment to write a shell script to build a directory tree and create files within the built directories. I've scoured through sites and text books and I just can't figure out how to go about it. I would assume that I need to use loops of some sort, but I can't seem... (8 Replies)
Discussion started by: Libertad
8 Replies

3. Shell Programming and Scripting

Specific directory parsing in a directory tree

Hi friends, Hello again :) i got stuck in problem. Is there any way to get a special directory from directory tree? Here is my problm.." Suppose i have one fix directory structure "/abc/xyz/pqr/"(this will be fix).Under this directory structure i have some other directory and... (6 Replies)
Discussion started by: harpal singh
6 Replies

4. UNIX for Dummies Questions & Answers

How to copy a tree of directory

Mi question is how can you copy only de three of directory and not the files in it. Only a need the three of directorys not the files (6 Replies)
Discussion started by: enkei17
6 Replies

5. UNIX for Dummies Questions & Answers

directory tree with directory size

find . -type d -print 2>/dev/null|awk '!/\.$/ {for (i=1;i<NF;i++){d=length($i);if ( d < 5 && i != 1 )d=5;printf("%"d"s","|")}print "---"$NF}' FS='/' Can someone explain how this works..?? How can i add directory size to be listed in the above command's output..?? (1 Reply)
Discussion started by: vikram3.r
1 Replies

6. Shell Programming and Scripting

Newbie problem with simple script to create a directory

script is: dirname= "$(date +%b%d)_$(date +%H%M)" mkdir $dirname should create a directory named Nov4_ Instead I get the following returned: root@dchs-pint-001:/=>./test1 ./test1: Nov04_0736: not found. Usage: mkdir Directory ... root@dchs-pint-001:/=> TOO easy, but what am I... (2 Replies)
Discussion started by: gwfay
2 Replies

7. UNIX for Dummies Questions & Answers

Move all files in a directory tree to a signal directory?

Is this possible? Let me know If I need specify further on what I am trying to do- I just want to spare you the boring details of my personal file management. Thanks in advance- Brian- (2 Replies)
Discussion started by: briandanielz
2 Replies

8. Shell Programming and Scripting

Diff. Backup Script Using TAR. Should be simple.

I'm specifically trying to find help or insight on using the --incremental ('-G') option for creating a tar. Please resist the urge to tell me to use --listed-incremental ('-g') option. That's fairly well documented in the GNU tar manual. GNU tar 1.19 This is what the manual does say in section... (0 Replies)
Discussion started by: protienplant
0 Replies

9. Shell Programming and Scripting

directory tree

Hi all, The following is a script for displaying directory tree. D=${1:-`pwd`} (cd $D; pwd) find $D -type d -print | sort | sed -e "s,^$D,,"\ -e "/^$/d"\ -e "s,*/\(*\)$,\:-----\1,"\ -e "s,*/,: ,g" | more exit 0 I am trying to understand the above script.But... (3 Replies)
Discussion started by: ravi raj kumar
3 Replies

10. Programming

directory as tree

hi i have modified a program to display directory entries recursively in a tree like form i need an output with the following guidelines: the prog displays the contents of the directory the directory contents are sorted before printing so that directories come before regular files if an entry... (2 Replies)
Discussion started by: anything2
2 Replies
Login or Register to Ask a Question