Script to compare files recursively using sdiff


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Script to compare files recursively using sdiff
# 1  
Old 04-05-2013
Script to compare files recursively using sdiff

Hi All,

I have been surfing to get some idea on how to compare same files from two different paths.

one path will have oldfiles directory and another path will have newfiles directory. Each main directories will have sub-directories in them and

each sub-directories inturn will have *.txt files(simple plain text file having several lines in them).

Note : one advantage is that :

All the sub-directories name in both oldfiles and newfiles main directory will have same directory names.

All the text filenames(*.txt) in sub-directories in both main folders will be the same as well

Now,
a. script has to accept 2 paths ie, oldfiles and newfiles directory path as arguments

b. read first sub-folder name from oldfiles and search the same sub-folder name in newfiles path. If same sub folder found, then

c. check, if text files present, if present check if both filename matches, if so, then

d. Sort the files first, then do sdiff on those 2 files and store the results as a seperate file.

So, to give example how folder structure will look like :

Main folders:

oldfiles path : /tmp/oldfiles/

newfiles path : /tmp/newfiles/

Each main folders will have sub-folders :

oldfiles :
subdirA
subA.txt

subdirB
subB.txt

subdirC
subC.txt

newfiles :
subdirA
subA.txt

subdirB
subB.txt


Each sub-dirs will ahve *.txt having same filename in them.

From the above ex :
script should generate sdiff results in output folders as:

subdirA_subA_result.txt
subdirB_subB_result.txt

I hope have mentioned the what i tend to achieve clearly.

from the below script which i wrote it doesn;t checks for same sub-folders/files and even not generating seperate result files instead

It reads all the *.txt files and just produce one single result file.
Code:
#!/bin/bash 
 
  # cmp_dir - program to compare two directories 
 
  # Check for required arguments 
  if [ $# -ne 2 ]; then 
      echo "usage: $0 directory_1 directory_2" 1>&2 
      exit 1 
  fi 
 
  # Make sure both arguments are directories 
  if [ ! -d $1 ]; then 
      echo "$1 is not a directory!" 1>&2 
      exit 1 
  fi 
 
  if [ ! -d $2 ]; then 
      echo "$2 is not a directory!" 1>&2 
      exit 1 
  fi 
 
  # Process each file in directory_1, comparing it to directory_2 
  find $1/ -name '*.txt' -print | while read src 
  do 
  #for filename in $1/*.txt; do 
  #echo $filename 
      fn=$(basename "$filename") 
      if [ -f "$filename" ]; then 
          #if [ ! -f "$2/$fn" ]; then 
              #echo "$fn is missing from $2" 
              #missing=$((missing + 1)) 
          #fi 
                  sort $filename 
                  #echo $filename 
                  sort $2/$fn 
                  #echo $2/$fn 
                 sdiff $filename $2/$fn | egrep '>|<|\|' > resultfile.txt 
     fi 
 #done 
 done 
 echo "File comparision done, please see resultfile"

# 2  
Old 04-06-2013
You stated it pretty clearly. It's just rather complex and tedious.

Here's a working version that does basically what you want, as best I can understand. It does not put the results in each directory, but all together in one file. Putting the results in different directories would complicate the script, and it seems having one results file might be easier to use? I think you could modify the script to add other details as needed. I think the script is pretty clear. It uses simple commands, but is just kind of tedious. Let me know if any questions.
Code:
$ cd /tmp
$ ls -R oldfiles newfiles
newfiles:
subdirA  subdirB

newfiles/subdirA:
subA.txt

newfiles/subdirB:
subB.txt

oldfiles:
subdirA  subdirB  subdirC

oldfiles/subdirA:
subA.txt

oldfiles/subdirB:
subB.txt

oldfiles/subdirC:
subC.txt

Code:
$ cat sdiff.sh
# cmp_dir - program to compare two directories

if [ $# -ne 2 ]; then
  echo "usage: $0 old_dir new_dir" 1>&2; exit 1
fi

if [ ! -d "$1" ]; then
  echo "$1 is not a directory!" 1>&2; exit 1
fi

if [ ! -d "$2" ]; then
  echo "$2 is not a directory!" 1>&2; exit 1
fi

rm -f resultfile.txt
missing=0
old_dir="$1" new_dir="$2"
find "$old_dir" -name '*.txt' -print > /tmp/old_paths.x;
while read old_path; do
  echo "old_path = $old_path" >> resultfile.txt
  filename=`basename "$old_path"`
  find "$new_dir" -name "$filename" -print > /tmp/new_path.x
  count=`cat /tmp/new_path.x | wc -l`
  if [ $count -eq 1 ]; then
    new_path=`cat /tmp/new_path.x`
    echo "Comparison with $new_path:" >> resultfile.txt
    sdiff $old_path $new_path >> resultfile.txt
  elif [ $count -gt 1 ]; then
    echo "ERROR: $count found under $new_dir" >> resultfile.txt
  else
    echo "ERROR: does not exist under $new_dir" >> resultfile.txt
    missing=`expr $missing + 1`
  fi
  echo "------------------" >> resultfile.txt
done < /tmp/old_paths.x
echo "Missing: $missing" >> resultfile.txt
echo "File comparision done, please see resultfile.txt"
# echo -n "Press ENTER: "; read; vi resultfile.txt

Code:
$ ./sdiff.sh oldfiles newfiles
File comparision done, please see resultfile.txt

Code:
$ cat resultfile.txt
old_path = oldfiles/subdirB/subB.txt
Comparison with newfiles/subdirB/subB.txt:
Sat Apr  6 01:45:32 PDT 2013                                  | Sat Apr  6 01:44:56 PDT 2013
------------------
old_path = oldfiles/subdirC/subC.txt
ERROR: does not exist under newfiles
------------------
old_path = oldfiles/subdirA/subA.txt
Comparison with newfiles/subdirA/subA.txt:
Sat Apr  6 01:44:36 PDT 2013                                  | Sat Apr  6 01:44:49 PDT 2013
------------------
Missing: 1

# 3  
Old 04-06-2013
To sort a file in itself please use
Code:
sort -o $filename $filename

@hanson44: please avoid useless use of cat
Code:
count=`wc -l < /tmp/new_path.x`

In this case one can even do the efficient but cryptic
Code:
count=`
find "$new_dir" -name "$filename" -print |
 tee /tmp/new_path.x |
 wc -l`

# 4  
Old 04-08-2013
Thanks MadeInGermany and Hanson44.
Script worked as expected. Thank you.
Am trying to improve the script by keeping menu driven approach ie, one script should do both
a. Line by Line file comparision
b. Word by word file comparision
Can you helpme on how to do the Word by word file comparision ?
when i run the script, it gives result for only one sub-dir file comparision even though there are multiple sub-dir's.
Code:
# cmp_dir - program to compare two directories
BASE_DIR=/usr/config_check
ARCHIVE_DIR=$BASE_DIR/archive
OUTPUT_DIR=$BASE_DIR/output
 
function Helps ()
{
  printf "\n"
  echo "Usage: ./filecomp.sh [-lL] [-wW] oldfiles_dir newfile_dir"
  echo "-l|-L: Line by Line file comparision check"
  echo "-w|-W: Word by Word file comparision check"
  echo
  echo "-Example-"
  echo "./filecomp.sh -l oldfile_dir newfile_dir or ./filecomp.sh -L oldfile_dir newfile_dir"
  echo "./filecomp.sh -w oldfile_dir newfile_dir  or ./filecomp.sh -W oldfile_dir newfile_dir"
  echo ""
  printf "\n"
  return 0
}
function filecomp () {
#if [ $# -ne 3 ]; then
#  echo "usage:./filecomp.sh -l old_dir new_dir" 1>&2; exit 1
#  echo "usage:./filecomp.sh -w old_dir new_dir" 1>&2; exit 1
#fi
if [ $1 == "L" ]; then
echo  "Starting Line by Line file comparision execution with opition -L"
#       if [ ! -d "$2" ]; then
#               echo "$2 is not a directory!" 1>&2; exit 1
#       fi
#       if [ ! -d "$3" ]; then
#               echo "$3 is not a directory!"  1>&2;  exit 1
#       fi
        rm -f resultfile_*.txt
        missing=0
        old_dir="$2" new_dir="$3"
 
        find "$old_dir" -name '*.txt' -print > /tmp/old_paths.x;
                while read old_path; do
                        echo "old_path = $old_path" >> resultfile_$(date +%Y%m%d%H%M%S).txt
                        filename=`basename "$old_path"`
                        find "$new_dir" -name "$filename" -print > /tmp/new_path.x
                        count=`cat /tmp/new_path.x | wc -l`
                                if [ $count -eq 1 ]; then
                                        new_path=`cat /tmp/new_path.x`
                                        echo "Comparison with $new_path:" >> resultfile_$(date +%Y%m%d%H%M%S).t                                              xt
                                        sort -o $old_path $old_path
                                        sort -o $new_path $new_path
                                        sdiff $old_path $new_path >> resultfile_$(date +%Y%m%d%H%M%S).txt
                                elif [ $count -gt 1 ]; then
                                        echo "ERROR: $count found under $new_dir" >> resultfile_$(date +%Y%m%d%                                              H%M%S).txt
  else
                                        echo "ERROR: does not exist under $new_dir" >> resultfile_$(date +%Y%m%                                              d%H%M%S).txt
                                         missing=`expr $missing + 1`
                                fi
                        echo "------------------" >> resultfile_$(date +%Y%m%d%H%M%S).txt
                done < /tmp/old_paths.x
                        echo "Missing: $missing" >> resultfile_$(date +%Y%m%d%H%M%S).txt
                        echo "File comparision done, please see resultfile_$(date +%Y%m%d%H%M%S).txt"
                        RES_FILE=resultfile_$(date +%Y%m%d%H%M%S).txt
                        if [ -s $RES_FILE ]; then
                                echo -n "Press ENTER: "; read; vi $RES_FILE
                        else
                        echo "$RES_FILE is empty.."
                        fi
#elif [ $1 == "W" ]; then
#       echo "Word by Word comparsion"
#fi
fi
}
 
##########
## MAIN ##
##########
option=$@
arg=($option)
case ${arg[0]} in
-l|-L)
       #echo  "Starting Line by Line file comparision execution with opition -L"
        filecomp L $2 $3
        ;;
-w|-W)
        echo "Starting word by word file comparision execution with opition -W"
        filecomp W $2 $3
        ;;
-h|?|*)
        Helps
        exit
        ;;
esac

Input :
ls -R oldfiles newfile

Code:
newfile:
dirACP  dirA    dirDCP
newfile/dirACP:
acp.txt
newfile/dirA:
newfile/dirDCP:
dirDCP.txt

Code:
oldfiles:
dirACP   dirB    dirDCP
oldfiles/dirACP:
acp.txt
oldfiles/DirB:
oldfiles/dirDCP:
dirDCP.txt


resultfile :
Code:
cat resultfile_20130408103202.txt
old_path = oldfiles/dirACP/acp.txt
Comparison with newfile/dirACP/acp.txt:
                                                                >
aa                                                                 aa
ccccc                                                           |  cc
ddsdd                                                              ddsdd
weweww                                                          |  weww
xx                                                              |  yyxx

from the above input, resultfile is missing information for dirDCP/dirDCP.txt.

---------- Post updated at 07:06 AM ---------- Previous update was at 06:41 AM ----------

finally found some solution wrt to word by word comparision and does the trick. now i need to incroprate this simple script to the above script, so that one script does both word by word comparision and line by line comparision as well.

Code:
#!/bin/bash
exec 3<a.txt
exec 4<b.txt
while IFS= read -r line1 <&3
IFS= read -r line2 <&4
do
        array1=( `echo $line1` )
        array2=( `echo $line2` )
        for ((X=0; X<="${#array1[@]}"; X++)); do
                if [ "${array1[$X]}" != "${array2[$X]}" ]; then
                        echo "mismatch! file 1: ${array1[$X]}    file 2: ${array2[$X]}";
                fi
        done
done



created 2 dummy file a.txt and b.txt
input file looks like :
Code:
cat a.txt
this is my first line
 
cat b.txt
this is my fust line

result :

Code:
mismatch! file 1: first    file 2: fust


anyhelp please on how to add the filename/line number in the result file. so that it will be come easy to find out on which file had issue
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sdiff doesn't try and compare to closest match

In the example below i would want the extensions to match. Is there any other utility or script to achieve this. Kindly help. Example: sdiff sourceFileNames targetFileNames 17021701P.blf | 17021901P.ibk 17021701P.chn | 17021901P.irk 17021701P.bmr | 17021901P.dyd 17021701P.dpf |... (7 Replies)
Discussion started by: jamilpasha
7 Replies

2. Shell Programming and Scripting

SDiff Two files with space problem

Hello guys, I have a problem. I'm trying to use SDiff with two files which are containing spaces. My problem is that I want to save the output with > in an extra file. If I try to use it like this. sdiff "test file1" "test file2" > OutputfileI get this message: usage: diff ... (11 Replies)
Discussion started by: Mariopart
11 Replies

3. Shell Programming and Scripting

Compare two files using awk command recursively

I want to compare two files, 1) Compare Each query result. 2) Compare Only first row of the Query output 3) Compare Time (3rd column), First file time is lesser than 2nd file then print the PO_NUM else do nothing. File1: C:\script>call transaction 1OPOP C:\script>Select ID, PO_ID, TIME, DES... (3 Replies)
Discussion started by: Ragu14
3 Replies

4. Shell Programming and Scripting

Compare 2 files using sdiff command output

Hi All, I have written the shell script which does the following : a. Reads the *.cnf file content from the specified input folder path b. Grep's some strings from the *.cnf file and writes the output in result file(*.cnf_result.txt) in output folder c. Now, if we get new version of... (5 Replies)
Discussion started by: Optimus81
5 Replies

5. Shell Programming and Scripting

Using sdiff without files

Hi, I'm trying to use sdiff by parsing the output of another command instead of the filename: sdiff <(echo test1) <(echo test2)However, this seems to cause my terminal session to stop working. If I use it with normal diff it works fine: ~$ diff <(echo test1) <(echo test2) 1c1 < test1... (4 Replies)
Discussion started by: Subbeh
4 Replies

6. Shell Programming and Scripting

Problem with script generating files in directory recursively

I have a script which generates recursively some files in folders for a given root folder. I have checks for permissions and it works for all folders except one(i have 777 permission on it). When i try calling the script in problematic folder(problematic folder being root folder), script works as... (2 Replies)
Discussion started by: bb2
2 Replies

7. Shell Programming and Scripting

Recursively move directories along with files/specific files

I would like to transfer all files ending with .log from /tmp and to /tmp/archive (using find ) The directory structure looks like :- /tmp a.log b.log c.log /abcd d.log e.log When I tried the following command , it movies all the log files... (8 Replies)
Discussion started by: frintocf
8 Replies

8. Shell Programming and Scripting

Shell Script - find, recursively, all files that are duplicated

Hi. I have a problem that i can't seem to resolve. I need to create a script that list all the files, that are found recursively, with the same name. For example if a file exists in more than one directory with the same name it list all the files that he founds with all the info. Could someone... (5 Replies)
Discussion started by: KitFisto
5 Replies

9. Shell Programming and Scripting

script to compare files

HI i wil get input from sql query and that too i can get a list o f files or just one. i have to pick up a file from another directory which hads prefix to this prefix.x.x.x.x.x. And we have to discard prefix and use that file name. we have to compare this file name(no need... (0 Replies)
Discussion started by: pulse2india
0 Replies
Login or Register to Ask a Question