File Comparison- Need help


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting File Comparison- Need help
# 1  
Old 03-02-2009
File Comparison- Need help

I have two text files which have records of thousand rows. Each row is having around 40 columns. Each column is tab delimited. Each row is delimited by newline character.

My requirement is to find for each row i need to find whether any column is different between the two files. For each row i need to find which columns are different. Example is as below

File1
1|check|test|plan|672
2|checked|this|plan|610

File2
1|chck|test|plan|670
3|checked|ok|plan|610

Output should be in tabular form
Difference

Row ColumnNumber Value in File1 Value in File2
1 2 check chck
1 5 672 670
2 1 2 3
2 3 this ok

Please help me.

Let me know if information is not sufficient.
# 2  
Old 03-02-2009
Seems like your files are '|' delimited - not 'tab' delimited you stated.
nawk -F'|' -f ui.awk file1 file2

ui.awk:
Code:
FNR==NR { f1[FNR]=$0; next }
{
   f1N=split(f1[FNR], arr, FS)
   for(i=1;i<=NF; i++)
     if ($i != arr[i])
       print FNR, i, arr[i], $i
}

# 3  
Old 03-03-2009
Thanks for the reply. Sorry for mentioning it as | instead of tab.

I am able to run the command properly and the results are coming as expected. One problem i am facing is that there is possibility that files have data which are in different order means 1 st row in file 1 could point to 5th row in file 2.
What i could of think of now as we should take the key column (number) from the user and then sort the file on the basis of that. Is there any other way of doing the same.
Can it be done on the basis of filteration also means for file 1 we will take the primary key and then filter the file 2 on the basis of that but i think it will be cumbersome. Sorting the whole file on the basis of primary key will be better option.

Can you please provide any better way of doing this. what will the unix commands for the same. Thanking you in advance for helping me out on this.
# 4  
Old 03-03-2009
Quote:
Originally Posted by uihnybgte
Thanks for the reply. Sorry for mentioning it as | instead of tab.

I am able to run the command properly and the results are coming as expected. One problem i am facing is that there is possibility that files have data which are in different order means 1 st row in file 1 could point to 5th row in file 2.
What does it mean when you say that '1 st row in file 1 could point to 5th row in file 2'?
Is there a common key (a common row cell OR a combination of cells) that relates 2 rows from 2 different files?
If you know that, you can rewrite the initial script - no need for sorting.
Quote:
Originally Posted by uihnybgte
What i could of think of now as we should take the key column (number) from the user and then sort the file on the basis of that. Is there any other way of doing the same.
Can it be done on the basis of filteration also means for file 1 we will take the primary key and then filter the file 2 on the basis of that but i think it will be cumbersome. Sorting the whole file on the basis of primary key will be better option.

Can you please provide any better way of doing this. what will the unix commands for the same. Thanking you in advance for helping me out on this.
# 5  
Old 03-03-2009
Thanks. What i am trying to say is that data of 1st row in file 1 needs to be compared with data of 5th row.(this is just an example as data in the files are in different sort order)
In a nutshell, User should be prompt to provide the common key (means the user will be entering the column number.) Based on common key the data should be compared.
So considering the same example as above
File1
1|check|test|plan|672
4|checked|this|plan|610
3|just|no|plan|612

File2
1|chck|test|plan|670
3|jst|no|pln|400
4|checked|ok|plan|610

User will provide the common key (let's say the user has provided 1, it means first column in each file is the primary key to identify records)

Output should be in tabular form
Difference

Row ColumnNumber Value in File1 Value in File2
1 2 check chck
1 5 672 670
2 3 this ok
3 2 just jst
3 4 plan pln
3 5 612 400

*Row - Here row should row number of first file and also if possible we can display the primary key also

One more thing here is user can provide the combination of keys also.

Hope i have cleared my question now.

Last edited by uihnybgte; 03-03-2009 at 12:52 PM..
# 6  
Old 03-03-2009
ok, something along these lines:

default value for the key is '1' - first column. you can specify a key/column on cli like below:

# key - the value in the FIRST column (default)
nawk -f ui.awk file1 file2

# key - the value in the THIRD column
nawk -v key=3 -f ui.awk file1 file2

# key - the value in the FORTH column
nawk -v key=4 -f ui.awk file1 file2

ui.awk
Code:
BEGIN {
  FS="|"
  if (key=="") key="1"
}
FNR==NR { f1r[$key]=FNR; f1v[$key]=$0; next }
{
   f1N=split(f1v[$key], arrV, FS)
   for(i=1;i<=NF; i++)
     if ( ($key in f1r) && $i != arrV[i])
       print $key, f1r[$key], i, arrV[i], $i
}

The output will be in the format:
Code:
keyValue rowFile1 columnFile2 file1value file2value


Last edited by vgersh99; 03-03-2009 at 01:59 PM..
# 7  
Old 03-06-2009
Thanks a lot. It perfectly works fine.
Now the problem which i am facing in this that some of the rows are missing in the file2 due to which it does not come in the report. The current code tells the difference between columns of each row. There should be also some report which tells these rows (means primary key in file1 is not found in file2) are missing in file2.
Hope i have clear my question
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

File Comparison

HI, I have two files and contains many Fields with | (pipe) delimitor, wanted to compare both the files and get only unmatched perticular fields. this i wanted to use in shell scriting. ex: first.txt 111 |abc| 230| hbc231 |bbb |210 |bbd405 |ghc |555 |cgv second.txt 111 |abc |230 |hbc231... (1 Reply)
Discussion started by: prawinmca
1 Replies

2. Shell Programming and Scripting

File Comparison: Print Lines not present in another file

Hi, I have fileA.txt like this. B01B02 D0011718 B01B03 D0012540 B01B04 D0006145 B01B05 D0004815 B01B06 D0012069 B01B07 D0004064 B01B08 D0011988 B01B09 D0012071 B01B10 D0005596 B01B11 D0011351 B01B12 D0004814 B01C01 D0011804 I want to compare this against another file (fileB.txt)... (3 Replies)
Discussion started by: genehunter
3 Replies

3. Shell Programming and Scripting

file comparison

Dear All, I would really appreciate if you can help me to resolve this file comparison I have two files: file1: chr start end ID gene_name chr1 2020 3030 1 test1 chr1 900 5000 2 test1 chr2 5000 8000 3 test2 chr3 6000 12000 4 test3 chr3 6000 15000 5 test3 file2:... (2 Replies)
Discussion started by: paolo.kunder
2 Replies

4. Shell Programming and Scripting

Help with file comparison

Hello, I am trying to compare 2 files and get only the new lines as output. Note that new lines can be anywhere in the file and not necessarily at the bottom of the file. I have made the following progress so far. /home/aa>cat old.txt 0001 732 A 0002 732 C 0005 732 D... (7 Replies)
Discussion started by: cartrider
7 Replies

5. Shell Programming and Scripting

CSV file comparison

Hi all, i have two .csv files. i need to compare those two files and if there is any difference that should be moved into third .csv file. example, org.csv and dup.csv when we compare those two files org.csv and dup.csv. if there is any change in dup.csv. it should be capture in third... (7 Replies)
Discussion started by: baskivs
7 Replies

6. Shell Programming and Scripting

two file comparison

now i have a different file zoo.txt with content 123|zoo 234|natan 456|don and file rick.txt with contents 123|dog|pie|pep 123|tail|see|newt 456|som|sin|sim 234|pay|rat|cat i want to look for lines in file zoo.txt column1 that has same corresponding lines in column 1 of... (6 Replies)
Discussion started by: dealerso
6 Replies

7. Shell Programming and Scripting

File Comparison

Hi i have 2 csv files a.csv and b.csv with the same number of columns and a list of values in both of it. Each and every individual value in both the files need to compared and if it matches then print correct in a new csv file otherwise print Incorrect eg a.csv 1,12/27/2007,Reward,$10.00... (5 Replies)
Discussion started by: naveenn08
5 Replies

8. Shell Programming and Scripting

file comparison

hi I have 2 files to comapre ,in file a sible column it is numbers,in file b2 numbers and other values with coma separated. i want compare numbers in file a with file b,and the out put put should be in C with numbers in both file a and b along with other columns of file b. i used folowing... (7 Replies)
Discussion started by: satish.res
7 Replies

9. UNIX for Dummies Questions & Answers

file comparison...help needed.

Hello all, Can anyone help me with this. There are two files and I have to match the second file records with that of first and if matched, print the output in two fies, one containing the matched records and other containing the rest. Here is the example. File1 "111",erter,"00000", ... (4 Replies)
Discussion started by: er_ashu
4 Replies

10. Shell Programming and Scripting

File Comparison

I have to compare two text files, very few of the lines in these files will have some difference in some column. The files size is in GB. Sample lines are as below: 11111122222222333333aaaaaaaaaabbbbbbbbbccccccccdddddd 11111122222222333333aaaaaaaaaabbbbbbbbbccccccccddeddd So assuming these... (19 Replies)
Discussion started by: net_shree
19 Replies
Login or Register to Ask a Question