How to search & compare paragraphs between two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to search & compare paragraphs between two files
# 1  
Old 08-07-2009
Question How to search & compare paragraphs between two files

Hello Guys, Greetings to All.

I am stuck in my work here today while trying to comapre paragraphs between two files, I need your help on urgent basis, without your inputs I can not proceed. Kindly find some time to answer my question, I'll be grateful to you for ever. My detailed issue is as follows-

I have extracted DDLs of some tables from Production server (saved in PROD.log file), and DDLs of same tables from DEV server (saved in DEV.log file).

The snippet of the contents of these files like this-

Code:
$ cat PROD.log
CREATE TABLE "HELLO"."TABLE1"  (
          "SALARY" DECIMAL(18,0) NOT NULL ,
          "JOB" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;

CREATE TABLE "HELLO"."TABLE2"  (
          "NAME" VARCHAR(18) NOT NULL ,
          "AGE" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;

Code:
$ cat DEV.log
CREATE TABLE "HELLO"."TABLE1"  (
          "SALARY" DECIMAL(18,0) NOT NULL ,
          "JOB" DECIMAL(18,0) NOT NULL )
         DISTRIBUTE BY HASH("SALARY")
           IN "DAT1" INDEX IN "IDX1" ;

CREATE TABLE "HELLO"."TABLE2"  (
          "NAME" VARCHAR(18) NOT NULL ,
          "AGE" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;

You can cleary notice, both the files have DDLs of two tables, TABLE1 & TABLE2. Out of these two, TABLE1 DDL is not same but TABLE2 DDL is same in both the files.
My requirement is- I need to write a shell script which will compare both the files PROD.log and Dev.log and will give me the output of whole DDL paragraph which is not matching/existing in other file, the output should look like this-

Code:
CREATE TABLE "HELLO"."TABLE1"  (
          "SALARY" DECIMAL(18,0) NOT NULL ,
          "JOB" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;

Guys please reply soon, I'll be waiting for your replies eagerly.

Thank you very much.
Naresh

Last edited by Yogesh Sawant; 08-07-2009 at 05:26 AM.. Reason: added code tags
# 2  
Old 08-07-2009
Try this:

Code:
awk 'NR==FNR{a[$3]=$0;next}
a[$3] && a[$3]!=$0 {print $0 RS}
' RS=";" DEV.log PROD.log

Regards
# 3  
Old 08-07-2009
Thanks

Wow !! the script is working fine, Thank you very mcuh Franklin for your timely help. Many many thanks...:-)

Just a small concern, though few paragraphs are same in text content, but in other file DEV.log the same para have few more blankspaces at the start and also in between texts for which I am not getting the desired output. The situation is like this-

Code:
$cat PROD.log
CREATE TABLE "HELLO"."TABLE2" (
"NAME" VARCHAR(18) NOT NULL ,
"AGE" DECIMAL(18,0) NOT NULL )
IN "DAT1" INDEX IN "IDX1" ;

Code:
$cat DEV.log
CREATE TABLE "HELLO"."TABLE2" (
"NAME" VARCHAR(18) NOT NULL ,
"AGE" DECIMAL(18,0) NOT NULL )
IN "DAT1" INDEX IN "IDX1" ;

Because of these blank spaces, these paragraphs are also getting listed in output which should not be. Any workaround for this? I tried to manipulate the code, but all in vain. :-(

Thanks,
Naresh

Last edited by Yogesh Sawant; 08-07-2009 at 05:27 AM.. Reason: added code tags
# 4  
Old 08-07-2009
Maybe u want to remove leading whitespace before front of each line:
Code:
sed 's/^[ \t]*//' PROD.log > PROD.tmp
sed 's/^[ \t]*//' DEV.log > DEV.tmp

and compare these new files
Code:
awk 'BEGIN{RS=";"} NR==FNR{a[$0]} !($0 in a) {print $0 RS}' DEV.tmp PROD.tmp


Last edited by thanhdat; 08-07-2009 at 08:09 AM..
# 5  
Old 08-07-2009
Hope this will be helpful for you.
Code:
$ cat PROD.log
CREATE TABLE "HELLO"."TABLE1"  (
          "SALARY" DECIMAL(18,0) NOT NULL ,
          "JOB" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;

CREATE TABLE "HELLO"."TABLE2"  (
          "NAME" VARCHAR(18) NOT NULL ,
          "AGE" DECIMAL(18,0) NOT NULL )
               IN "DAT1" INDEX IN "IDX1" ;
$ cat DEV.log
CREATE TABLE "HELLO"."TABLE1"  (
          "SALARY" DECIMAL(18,0) NOT NULL ,
          "JOB" DECIMAL(18,0) NOT NULL )
           DISTRIBUTE BY HASH("SALARY")
           IN "DAT1" INDEX IN "IDX1" ;

CREATE TABLE "HELLO"."TABLE2"  (
          "NAME" VARCHAR(18) NOT NULL ,
          "AGE" DECIMAL(18,0) NOT NULL )
           IN "DAT1" INDEX IN "IDX1" ;
$ awk '{$1=$1} NR==FNR {a[$3]=$0;next} a[$3] && a[$3]!=$0 {print $0 RS} ' RS=";" DEV.log PROD.log
CREATE TABLE "HELLO"."TABLE1" ( "SALARY" DECIMAL(18,0) NOT NULL , "JOB" DECIMAL(18,0) NOT NULL ) IN "DAT1" INDEX IN "IDX1";
$

# 6  
Old 08-07-2009
Thanks All

Thanks All for your replies

Thanhdat, I am not only trying to remove first tab lines, but also want to remove the blank spaces between text. My script should recognize them as one blankspaces instead of many. For ex-

$ cat file
CREATE TABLE "HELLO"."TABLE1" <b><b><b> ("SALARY" DECIMAL(18,0) NOT NULL )

Output should be like this-
CREATE TABLE "HELLO"."TABLE1"<b>("SALARY" DECIMAL(18,0) NOT NULL)

Please do suggest.
# 7  
Old 08-07-2009
so, you can change the sed command to:
Code:
sed 's/^[ \t]*//;s/[ \t]\{1,\}/ /g' yourfile > newfile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to Compare local & remote Files over ssh?

I want to make a script to compare list of files in terms of its size on local & remote server whose names are same & this is required over ssh. How can I accomplish this. Any help would be appreciated. (1 Reply)
Discussion started by: m_raheelahmed
1 Replies

2. Shell Programming and Scripting

Search and compare files from two paths

Hi All, I have a 2 path, one with oldfile path in which has several sub folders,each sub folders contains a config file(basically text file), likewise there will be another newfile path which will have sub folders, each sub folders contains a config file. Need to read files from oldfile... (6 Replies)
Discussion started by: Optimus81
6 Replies

3. Shell Programming and Scripting

Compare files & extract column awk

I have two tab delimited files as given below: File_1: PV16 E1 865 2814 1950 PV16 E2 2756 3853 1098 PV16 E4 3333 3620 288 PV16 E5 3850 4101 252 PV16 E6 83 559 477 PV16 E7 562 858 297 PV16 L2 4237 5658 ... (10 Replies)
Discussion started by: vaibhavvsk
10 Replies

4. Shell Programming and Scripting

Format & Compare two huge CSV files

I have two csv files having 90K records each & each row has around 50 columns.Lets say the file names are FILE1 and FILE2. I have to compare both the files and generate a new file that has rows from FILE2 if it differs. FILE1 ----- 2001,"John",25,19901130,21211.41,Unix Forum... (3 Replies)
Discussion started by: Sheel
3 Replies

5. Shell Programming and Scripting

Search compare and determine duplicate files

Hi May i ask if someone know a package that will search a directory recursively and compare determine duplicate files according to each filename, date modified or any attributes that will determine its duplicity If none where should i start or what are those command in shell scripting that... (11 Replies)
Discussion started by: jao_madn
11 Replies

6. Shell Programming and Scripting

How to Read & Compare Two Files

Hi forumers, How is it going. Ok i need some advice on the following problem. I have 2 files to read and compare data.FileA and FileB. FileA will return either status 1 or 0. FileB on the other hand is trickier and has the following details:- Count DeviceID CurrentStatus ... (7 Replies)
Discussion started by: prakash1111
7 Replies

7. Shell Programming and Scripting

Compare two files A & B and accordingly modify file A

Friends, i have two huge complex files (for eg :A & B)as output , the sample contents of the files are as follows : A == ID,DATE,SUM1,SUM2,TOTAL(SUM1+2) A5066,20/04/2010,25000,50000,75000 A5049,20/04/2010,25000,60000,85000 B == ID,DATE,SUM1,SUM2,TOTAL(SUM1+2)... (2 Replies)
Discussion started by: appu2176
2 Replies

8. UNIX for Dummies Questions & Answers

How to compare 2 files & get specific value & replace it in other file.

Hiiii Friends I have 2 files with huge data. I want to compare this 2 files & if they hav same set of vales in specific rows & columns i need to get that value from one file & replace it in other. For example: I have few set data of both files here: a.dat: PDE-W 2009 12 16 5 29 11.11 ... (10 Replies)
Discussion started by: reva
10 Replies

9. Shell Programming and Scripting

How to compare 2 files & get only few columns based on a condition related to both files?

Hiiiii friends I have 2 files which contains huge data & few lines of it are as shown below File1: b.dat(which has 21 columns) SSR 1976 8 12 13 10 44.00 39.0700 70.7800 7.0 0 0.00 0 2.78 0.00 0.00 0 0.00 2.78 0 NULL ISC 1976 8 12 22 32 37.39 36.2942 70.7338... (6 Replies)
Discussion started by: reva
6 Replies

10. Shell Programming and Scripting

compare & split files

Hi All, I've 1 big file like: cat nid_lec_rej_20090804_merged 10084MOCLEC 0408090061480739nid090804132259.03.148990533 2526716790000008947850036448540401014 R030007150692000 2535502720000000010100036165742685000 R030007150354000 ... (12 Replies)
Discussion started by: ss_ss
12 Replies
Login or Register to Ask a Question