Compare few columns from two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare few columns from two files
# 1  
Old 09-27-2009
Compare few columns from two files

My Friends,
Need your help to find the difference between few columns from two comma delimited files. For example, File1 and File2 has 22 columns, and I want to find the difference in first 12 columns.

I have list of file names in MyListOfFiles2Compare.txt. Data is separated with commas. These are .csv and some .txt files. Most files comma delimited. Some .txt files are tab delimited.

From the list of file names in MyListOfFiles2Compare.txt, take the file from
dir1/files and dir2/files and compare the specific number of columns and if any data/column mismatch , need to write the mismatch data in another file called mismatch.csv. When we write the difference to mismatch.csv, write the name of the file which has difference, column number, data from first file and 2nd file. Since I have to compare thousands of files, I need to go back and see which file has mismatch and mismatch data/column.
======= Thank you for giving your input on this =============
~~Manish
# 2  
Old 09-27-2009
To avoid any confusion please provide sample input files and required output.
# 3  
Old 09-27-2009
Sample Input file1 is MyFile1:
"America, LLC","265826","222111","04/01/2009","ddd, Nick","333","eRes-Plus - 333","ddk,Rubino ","R8","15","0.28","","0.00","0.28","","132. Proivdence Road Suite , TX 19063 US"
"America, LLC","265826","93659211","04/01/2009","Rose, Nick","3942489","eRes-Plus - 4102414180","Nick,Rubino ","R8","8","0.15","","0.00","0.15","","1400 N. test Road Suite 5025 x, PA 44333 US"
===================
Sample Input File2 is MyFile1
----------
"America, LLC",123456,44444,04/01/2009,"Russell,ddd",14444,eRes-Plus - 7043589536,"ddd,Russell",R8,43,1.05,,0,1.05,017653,201 S main St 1470 Charlotte court Charlotte 13322
"North, LLC",4444,1111114,04/01/2009,"Russell,ddd",1136671,eRes-Plus - 2159977710,"ddd,Russell",R8,42,1.03,,0,1.03,017653,201 S main St 1470 Charlotte court Charlotte 12345
=========================
Expected Output in to mismatch.csv , after compare the first 10 columns from above file 1 and file 2
FileName,column number, data in first file, data in 2nd file
MyFile1,2,265826,123456
MyFile1,3,222111,44444
Thank you,
Manish
# 4  
Old 09-27-2009
Questions:
  1. Are you comparing line x in file 1 with line x in file 2?
  2. Are there quotes around each field (column) data all the time?
  3. How do we know if the file is csv or tab type? file extension?
  4. Should commas be expected in the data?
# 5  
Old 09-27-2009
Hi ,
Here are the answers...
Q)Are you comparing line x in file 1 with line x in file 2?
ANS: Yes, Line x in file 1 with line x in file 2 ( line 1 in file 1, with line 1 in file 2, line 2 in file1 with line 2 in file2 etc....) Before I compare I will be sorting this file

Q) Are there quotes around each field (column) data all the time?
Ans) Some files has quotes and some filed do not.

Q) How do we know if the file is csv or tab type? file extension?
Ans) depends on the file extension we need to decide it is .csv file. we may need to read first 4 columns and see if every column is comma delimited then we can decide it is comma delimited, most of them has csv extension.

Q)Should commas be expected in the data?
Ans) Most of the files has commas, some files has tab or space delimited.

Thank you..
# 6  
Old 09-27-2009
Some hints for you.

compare the first 12 columns of MyFile1 and MyFile2

Code:
diff <(cut -d, -f1-12 dir1/$MyFile1) <(cut -d, -f1-12 dir2/$MyFile2)

Use it in a loop
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to compare two columns in two files?

Hi All, I have a.dat file with content 1,338,30253395122015103,2015103,UB0085000,STMT151117055527002,,, 1,338,30253395122015103,2015103,UB0085000,STMT151117055527001,,, and b.dat having content 1,STMT151117055527001,a1.txt,b1.txt,c1.txt 1,STMT151117055527002,a2.txt,b2.txt,c2.txt ... (13 Replies)
Discussion started by: PRAMOD 96
13 Replies

2. UNIX for Dummies Questions & Answers

Help need to compare columns in files

Hi, Below is my requirement file1 id|cnt 1|1 2|2 3|3 file2 id_1|cnt_1 1|1 2|1 3|1 I want to compare cnt and cnt_1 columns, if they are differ then give the details Am using below awk command, but the output is not as expected. (2 Replies)
Discussion started by: grandhirahuletl
2 Replies

3. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

4. Shell Programming and Scripting

Compare columns in different files

Hi, I have two files like this: 8 1.3 10 1.3 12 1.3 15 1.3 21 1.3 and 1 2 3 4 10 11 15 16 21 22 (3 Replies)
Discussion started by: jamie_123
3 Replies

5. Shell Programming and Scripting

Compare multiple columns from 2 files

Hi, I need to compare multiple columns from 2 files. I can, for example, have these 2 files: file1: col1, col2, col3,col4 a,1,4,7 b,2,5,8 c,3,6,9file2: col1, col2, col3,col4 a,2,3,2 b,5,7,5 c,1,9,8As a result, I need for example the difference between the columns 2 and 4: col2,... (3 Replies)
Discussion started by: Subbeh
3 Replies

6. Shell Programming and Scripting

Compare columns in two different files using awk

Hi, I want to compare the columns of two files excluding column 2 from both the files. I tried this awk command. awk -F":" 'NR==FNR{++a;next} !(a)' file1.txt file2.txt . Example: File1.txt 123:09-15-2011:abc:123456 123:09-15-2011:abc:234567 123:09-15-2011:abc:345678 ... (5 Replies)
Discussion started by: shell_newbie
5 Replies

7. Shell Programming and Scripting

Compare Columns of two files

Hi I have file 1 like this and file 2 like this I need to compare column 3 of both files and delete lines in file1 with same column 3 values in two files. So the output is I tried with perl but didnt work. A perl code will be good as I am learning the language, but any other code would... (1 Reply)
Discussion started by: polsum
1 Replies

8. UNIX for Dummies Questions & Answers

Compare Columns in two files

Hi all, I would like to compare a column in one file to a column in another file and when there is a match it prints the first column and the corresponding second column. Example File1 ABA ABC ABE ABF File 2 ABA 123 ABB 124 ABD 125 ABC 126 So what I would like printed to a file... (0 Replies)
Discussion started by: pcg
0 Replies

9. Shell Programming and Scripting

How to compare 2 files & get only few columns based on a condition related to both files?

Hiiiii friends I have 2 files which contains huge data & few lines of it are as shown below File1: b.dat(which has 21 columns) SSR 1976 8 12 13 10 44.00 39.0700 70.7800 7.0 0 0.00 0 2.78 0.00 0.00 0 0.00 2.78 0 NULL ISC 1976 8 12 22 32 37.39 36.2942 70.7338... (6 Replies)
Discussion started by: reva
6 Replies

10. Shell Programming and Scripting

How to compare two columns in two files?

Hello all, Could someone please let me know shell script or awk solution to compare two columns in two files? Here is the sample - file1.txt abc/xyz,M1234 ddd/lyg,M2345 cnn/tnt,G0123 file2.txt A,abc/xyz,kk,dd,zz,DCT,G0123,1 A,ddd/lyg,kk,dd,zz,DCT,M1234,1... (17 Replies)
Discussion started by: sncoupons
17 Replies
Login or Register to Ask a Question