Comparing similar columns in two different files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing similar columns in two different files
# 8  
Old 04-14-2008
MySQL

I got the code to do that.

I modified a part of the code.

"printf "Columns %s is additional\n", $2" to

"printf "Columns %s in the table %s is additional\n", $2, $1.

Thanks so much for your help.
# 9  
Old 04-14-2008
Question Code working

Hi,

Can you explain how this code works?

Thanks.
# 10  
Old 04-14-2008
Quote:
Originally Posted by ragavhere
Can you explain how this code works?
Yes.

Code:
NR == FNR { 
f1[$1,$2] = $0
next
}

While reading the first input file (NR == FNR) populate the associative array f1 with keys based on the first two fields and values/elements based on the entire record. The next statement prevents subsequent actions (without explicit constraints regarding the first file) from being performed.

Code:
($1 SUBSEP $2) in f1 && $0 != f1[$1,$2] {
  printf "Column %s in the table %s mismatches\n%s\n%s\n", 
  $2, $1, f1[$1,$2], $0
}

While reading the rest of the input:
if the first two fileds match any key in the f1 array AND the entire record is different from the corresponding f1 value , print "column in table mismatches and the desired fields ($2, $1 and then the value of f1[$1,$2] and the entire record).

Code:
!(($1 SUBSEP $2) in f1) {
  printf "Columns %s is additional\n", $2 
}

If the first two fields don't match any key in the f1 array,
print "additional".

Hope this helps.
# 11  
Old 04-15-2008
Network Additional information required

Hi radoulov

Thanks. I need additional information. Please help me.

The two files are obtained from two test environments. Here one table and its statistics in the format said might be present in file1 but not in file2. Same way table and its statistics present in file2 might not be present in file1.

In this case the code has to be modified in such a way that there is additional information in the output saying that the table present in one file is not present in the other.

For e.g

The contents of file1 are

FT_T_CAVJ COUNT:22117022
FT_T_CAVJ CURR_TL SUM:5.2621E+13 MAX:2432919 MIN:2336485 AVG:2379195.7
FT_T_CAVJ JRNL_CRVL_AMT SUM:5.4881E+10 MAX:495687276 MIN:-178235551 AVG:2481.41038
FT_T_CAVJ ORIG_CRVL_AMT SUM:0 MAX:0 MIN:0 AVG:0

EMBARCADERO_EXPLAIN_PLAN COUNT:0
EMBARCADERO_EXPLAIN_PLAN PARTITION_ID SUM:5.4881E+10 MAX:495687276 MIN:-178235551 AVG:2481.41038
EMBARCADERO_EXPLAIN_PLAN PARTITION_START SUM:0 MAX:0 MIN:0 AVG:0

File2 contains

FT_T_CAVJ COUNT:22117022
FT_T_CAVJ CURR_TL SUM:5.2621E+13 MAX:2432919 MIN:2336585 AVG:2379195.7
FT_T_CAVJ JRNL_CRVL_AMT SUM:5.4881E+10 MAX:495697276 MIN:-178335551 AVG:2481.41038
FT_T_CAVJ CRVL_JRNL_ID SUM:221170220
FT_T_CAVJ CRVL_TYP

CAVJ_VRFY_PRE COUNT:115
CAVJ_VRFY_PRE BAL_DIFFERENCE SUM:6409.44 MAX:457.01 MIN:-399.99 AVG:55.7342609
CAVJ_VRFY_PRE CAVA_AMT SUM:182628580 MAX:33744800.9 MIN:-2851167.2 AVG:1588074.61

Here file2 contains additional columns which are not present in file1. They are CRVL_JRNL_ID and CRVL_TYP .

File1 contain a column ORIG_CRVL_AMT which is not present in file2.

And file2 also contains an additional table named "CAVJ_VRFY_PRE " and its statistics. This is not present in file1.

Table name EMBARCADERO_EXPLAIN_PLAN is present in file1 but not in file2.

The output should be in this format

Column CURR_TL in the table FT_T_CAVJ has mismatches
FT_T_CAVJ CURR_TL SUM:5.2621E+13 MAX:2432919 MIN:2336485 AVG:2379195.7
FT_T_CAVJ CURR_TL SUM:5.2621E+13 MAX:2432919 MIN:2336585 AVG:2379195.7

Column CRVL_JRNL_ID is present in file2 but not in file1.

Column CRVL_TYP is present in file2 but not in file1.

Table EMBARCADERO_EXPLAIN_PLAN is present in file1 but not in file2.
EMBARCADERO_EXPLAIN_PLAN COUNT:0
EMBARCADERO_EXPLAIN_PLAN PARTITION_ID SUM:5.4881E+10 MAX:495687276 MIN:-178235551 AVG:2481.41038
EMBARCADERO_EXPLAIN_PLAN PARTITION_START SUM:0 MAX:0 MIN:0 AVG:0

Table CAVJ_VRFY_PRE is present in file2 but not in file1.
CAVJ_VRFY_PRE COUNT:115
CAVJ_VRFY_PRE BAL_DIFFERENCE SUM:6409.44 MAX:457.01 MIN:-399.99 AVG:55.7342609
CAVJ_VRFY_PRE CAVA_AMT SUM:182628580 MAX:33744800.9 MIN:-2851167.2 AVG:1588074.61

Kindly help me in doing this.
# 12  
Old 04-15-2008
Something like this:

Code:
awk 'END {
  for (t1_ in t1) {
    if (!(t1_ in t2))
      printf "Table %s is present only in %s\n%s\n",
      t1_, ARGV[1], t1[t1_]
  }
}
NR == FNR { t1[$1] = t1[$1] ? t1[$1]RS$0 : $0; c1[$1,$2] = $0; next }
($1 SUBSEP $2) in c1 && c1[$1,$2] != $0 {
    printf "Column %s in the table %s mismatches.\n%s\n%s\n",
    $2, $1, c1[$1,$2], $0
}
($1 in t1) && !(($1 SUBSEP $2) in c1) {
    printf "Columns %s is present only in %s\n",
    $2, ARGV[2]
}
!($1 in t1) {
    __++
    if (__ == 1) {
    printf "Table %s is present only in %s\n",
    $1, ARGV[2]
    }
    else
      print  
}
{ t2[$1] }' file1 file2

# 13  
Old 04-16-2008
MySQL Thanks

Hi radoulov,

Thanks so much for ur help. The code is working fine.Smilie
# 14  
Old 04-16-2008
Data If the format changes

Hi,

The format of my two files have changed. I need to compare these two files.
Both the files are in the same format.

Start Time: Thu Apr 17 03:41:21 PDT 2008
Table Name: AAA
Row Count:96 SUM(F1): 3739 MAX(F1):77 MIN(F1): 0 AVG(F1): 38.9479167 LENGTH(LINE): 2260
End Time: Thu Apr 17 03:41:22 PDT 2008

------------------------------------------------------------------------------------------------------------------------------

Start Time: Thu Apr 17 03:41:22 PDT 2008
Table Name: AQ$_FT_Q_BECMD_G
Row Count: 0 SUM(SUBSCRIBER#):MAX(SUBSCRIBER#): MIN(SUBSCRIBER#): AVG(SUBSCRIBER#):LENGTH(NAME):SUM(ADDRESS#): MAX(ADDRESS#):MIN(ADDRESS#):AVG(ADDRESS#):
End Time: Thu Apr 17 03:41:23 PDT 2008

------------------------------------------------------------------------------------------------------------------------------
I need to ignore the start time line and end time line and need to compare
the other two lines.The table present in one file might not be present in the other file. Similarly
few columns present in one file might not be present in the other.After comparing the output should be like this

E.g.

column SUBSCRIBER# has mismatches for the sum value
below this the information for the corresponding table of this column should be present.

Column F1 present in file1 is not present in other
Row Count:96 SUM(F1): 3739 MAX(F1):77 MIN(F1): 0 AVG(F1): 38.9479167 LENGTH(LINE): 2260

Table AAA is present in file 1 but not in the file2.
Start Time: Thu Apr 17 03:41:21 PDT 2008
Table Name: AAA
Row Count:96 SUM(F1): 3739 MAX(F1):77 MIN(F1): 0 AVG(F1): 38.9479167 LENGTH(LINE): 2260
End Time: Thu Apr 17 03:41:22 PDT 2008


Can you give me a script to do this job? Please help me. I am stuck. Its urgent.

Last edited by ragavhere; 04-17-2008 at 09:18 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two files in UNIX and create a new file similar to equi join

I have 2 files namely branch.txt file & RXD.txt file as below Ex:Branch.txt ========================= B1,Branchname1,city,country B2,Branchname2,city,country B3,Branchname3,city,country B4,Branchname4,city,country B5,Branchname5,city,country RXD file : will... (11 Replies)
Discussion started by: satece
11 Replies

2. Shell Programming and Scripting

Merging two columns from two files with similar names into a loop

I have two files like this: fileA.net A B C fileA.dat 1 2 3 and I want the output output_expected A 1 B 2 C 3 I know that the easier way is to do a paste fileA.net fileA.dat, but the problem is that I have 10,000 couple of files (fileB.net with fileB.dat; fileC.net with... (3 Replies)
Discussion started by: valente
3 Replies

3. Shell Programming and Scripting

Computing the ratio of similar columns in the two files using awk script

Thanks Bartus11 for your help in the following code to compare the two files "t1" and "t2". awk 'NR==FNR{a=1;next}$2 in a{print $2}' t1 t2 First can anyone explain that what is the purpose of assigning a =1? Second, the current script is printing out the matched columns between the... (4 Replies)
Discussion started by: coder83
4 Replies

4. Shell Programming and Scripting

Comparing two columns from two different files

Hi, I have a single-column file1 having records like: 00AB01/11 43TG22/00 78RC09/34 ...... ...... and a second file , file 2 having two columns like 78RC09/34 1 45FD11/11 2 00AB01/11 3 43TG22/00 4 ...... ...... (8 Replies)
Discussion started by: amarn
8 Replies

5. UNIX for Dummies Questions & Answers

Merge two files with two columns being similar

Hi everyone. How can I merge two files, where each file has 2 columns and the first columns in both files are similar? I want all in a file of 4 columns; join command removes the duplicate columns. 1 Dave 2 Mark 3 Paul 1 Apple 2 Orange 3 Grapes to get it like this in the 3rd file:... (9 Replies)
Discussion started by: Atrisa
9 Replies

6. Shell Programming and Scripting

comparing two columns from two different files

Hello, I have two files as 1.txt and 2.txt with number as columns. 1.txt 0 53.7988 1 -30.0859 2 20.1632 3 14.2135 4 14.6366 5 -37.6258 . . . 31608 -8.57333 31609 -2.58554 31610 -24.2857 2.txt (1 Reply)
Discussion started by: AKD
1 Replies

7. Shell Programming and Scripting

comparing 2 columns from 2 files

Hey, I have 2 files that have a name and then a number: File 1: dog 21 dog 24 cat 33 cat 27 dog 76 cat 65 File 2: dog 109 dog 248 cat 323 cat 207 cat 66 (2 Replies)
Discussion started by: dcfargo
2 Replies

8. Shell Programming and Scripting

comparing the columns in two files

I have two files file1 and file 2 both are having multiple coloumns.i want to select only two columns. i used following code to get the desired columns,with ',' as delimiter cut -d ',' -f 1,2 file1 | sort > file1.new cut -d ',' -f 1,2 file2 | sort > file2.new I want to get the coloums... (1 Reply)
Discussion started by: bab123
1 Replies

9. Shell Programming and Scripting

Comparing Columns of two FIles

Dear all, I have two files in UNIX File1 and File2 as below: File1: 1,1234,.,67.897,,0 1,4134,.,87.97,,4 0,1564,.,97.8,,1 File2: 2,8798,.,67.897,,0 2,8879,.,77.97,,4 0,1564,.,97.8,,1 I want to do the following: (1) Make sure that both the files have equal number of columns and if... (4 Replies)
Discussion started by: ggopal
4 Replies

10. UNIX for Advanced & Expert Users

Comparing Columns of two FIles

Dear all, I have two files in UNIX File1 and File2 as below: File1: 1,1234,.,67.897,,0 1,4134,.,87.97,,4 0,1564,.,97.8,,1 File2: 2,8798,.,67.897,,0 2,8879,.,77.97,,4 0,1564,.,97.8,,1 I want to do the following: (1) Make sure that both the files have equal number of columns and if... (1 Reply)
Discussion started by: ggopal
1 Replies
Login or Register to Ask a Question