Columns comparision of two large size files and printing the difference


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Columns comparision of two large size files and printing the difference
# 1  
Old 10-07-2010
Columns comparision of two large size files and printing the difference

Hi Experts,


My requirement is to compare the second field/column in two files, if the second column is same in both the files then compare the first field. If the first is not matching then print the first and second fields of both the files.


first file (a .txt)
Code:
< 1210018971FF0000, 123321210018971, 1 >
< 1210018972FF0000, 123321210018972, 1 >
< 1210018973FF0000, 123321210018973, 1 >
< 239300002F000000, 123320832900002, 1 >
< 6746451667FF0000, 123320371265114, 1 >
< 0019250062FF0000, 123320853469168, 1 >
< 0019250064FF0000, 123320853469179, 1 >
< 0019250065FF0000, 123320853469169, 1 >
< 0019250067FF0000, 123320853469180, 1 >


Second file (b.txt)

Code:
< 1210018971FF0000, 123321210018971, 1, >
< 1210018972FF0000, 123321210018972, 1, >
< 3466418973FF0000, 123321210018973, 1, >
< 239300002F000000, 123320832900002, 1, >
< 8746451668FF0000, 123320371265114, 1, >
< 0019250062FF0000, 123320853469168, 1, >
< 0019250064FF0000, 123320853469179, 1, >
< 0019290065FF0000, 123320853469169, 1, >
< 0019250067FF0000, 123320853469180, 1, >

Output

Code:
1210018972FF0000, 123321210018972, "---------", 3466418973FF0000, 123321210018973
6746451667FF0000, 123320371265114, "---------", 8746451668FF0000, 123320371265114
0019250065FF0000, 123320853469169, "---------", 0019290065FF0000, 123320853469169



i have tried the below script but it's not working.

Code:
#! /usr/bin/perl 


$file1 = 'a.txt'; 
$file2 = 'b.txt'; 


open(R,$file1) ;
open(P,$file2) ;

foreach $i (<R>) 
{  
   @a = split (/,/,$i);
foreach $k (<P>)
{

  @b = split (/,/,$k) ;

if ( $a[1] == $b[1] ) { if( $a[0] != $b[0]) {print $a[0],$a[1],"----------",$b[0],$b[1],"\n" ;}} 

}        



} 

close R;
close P;


Pl help me. Thanks in advance

Last edited by radoulov; 10-07-2010 at 11:18 AM.. Reason: Added code tags.
# 2  
Old 10-07-2010
If awk is acceptable:

Code:
awk -F'[<,] ' 'NR == FNR {
  _[$3] = $2; next
  }
$3 in _ && _[$3] != $2 {
  print $2, $3, "\"========\"", _[$3], $3
  }' OFS=, a.txt b.txt

# 3  
Old 10-07-2010
Hi radoulov,

Thank you very much Smilie

Can you pl help me by guiding the same in perl. I need a perl script
# 4  
Old 10-07-2010
Why, is this a homework?
# 5  
Old 10-07-2010
I think the print is switched over and had to be as below in bold.

Quote:
Originally Posted by radoulov
If awk is acceptable:

Code:
  print _[$3], $3, "\"========\"", $2, $3


and here is your Perl code:-

Code:
perl -wle '                
open(D1,"<a.txt") or die "can not open file $!" ;
open(D2,"<b.txt") or die "can not open file $!" ;
while ( defined($z = <D1>) && defined($k = <D2>) ) {
$z =~ s/\s+//g ; $k =~ s/\s+//g ;                       
$a = [ split(/[<,]/,$z ) ]  ;            
$b = [ split(/[<,]/,$k ) ]  ; 
if ( ( $a->[2] eq $b->[2] ) && ( $a->[1] ne $b->[1] ) ) {
       $,="," ;
       print $a->[1],$a->[2],"\"======= \"",$b->[1],$b->[2] ;
}
}
'

SmilieSmilieSmilie
# 6  
Old 10-08-2010
Code:
#! /usr/bin/perl 

$file1 = 'a.txt'; 
$file2 = 'b.txt'; 

open(R,$file1) ;
open(P,$file2) ;

$hash1{$2}=$1 while (<R>=~ m{<\s*([^,]+),\s*([^,]+)}g);
$hash2{$2}=$1 while (<P>=~ m{<\s*([^,]+),\s*([^,]+)}g);

for (keys %hash1){
print "\n$hash1{$_}, $_ ----- $hash2{$_}, $_" if ($hash1{$_} ne $hash2{$_});
}

# 7  
Old 10-08-2010
By awk.
Code:
awk '{a=$2;b=$3;getline <"b.txt"} $2!=a||$3!=b {print a,b,"\"========\"",$2,$3}' a.txt

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparision of two data columns in different files

Hi All, I have a requirement to compare data column which is the last field in two different files and trigger and alert if the difference is greater than 1 for each row. File1 Jan Acount1 2014 11223 Feb Account2 2014 2345 Mar Account3 2014 1233 File2 Jan Account1 2014... (1 Reply)
Discussion started by: Naresh Babu
1 Replies

2. Shell Programming and Scripting

Comparing Select Columns from two CSV files in UNIX and create a third file based on comparision

Hi , I want to compare first 3 columns of File A and File B and create a new file File C which will have all rows from File B and will include rows that are present in File A and not in File B based on First 3 column comparison. Thanks in advance for your help. File A A,B,C,45,46... (2 Replies)
Discussion started by: ady_koolz
2 Replies

3. UNIX for Dummies Questions & Answers

Printing into two files under difference situation

I want to printing into two files under difference situation. For example, file 1 name.txt >gma-miR172a Glyma02g28845 >gma-miR1513a-3p Glyma02g15840 >gma-miR166a-5p Glyma02g15840 >gma-miR1530 Glyma02g15130 >gma-miR1507a Glyma02g01841 File 2 a.gff Glyma01g07930 ... (4 Replies)
Discussion started by: grace_shen
4 Replies

4. Shell Programming and Scripting

Help printing files in ascending order of the fi le size (in bytes)

Hey guys I'm new to unix and need help printing files in a specified directory according to size in bytes as well as files with equal bites in alphabetical order the part i have done so far prints out all files in the directory as well as setting a time limit in which they have been modified ... (2 Replies)
Discussion started by: wessy
2 Replies

5. Shell Programming and Scripting

help printing two consecutive columns, every twenty in a large matrix

Hi, I'm having a problem printing two consecutive columns, as I iterate through a large matrix by twenty columns and I was looking for a solution. My input file looks something like this 1 id1 A1 A2 A3 A4 A5 A6....A20 A21 A22 A23....A4001 A4002 2 id2 B1 B2 B3 B4 B5 B6... 3 id3 ... 4 id4... (8 Replies)
Discussion started by: flotsam
8 Replies

6. Shell Programming and Scripting

Find file size difference in two files using awk

Hi, Could anyone help me to solve this problem? I have two files "f1" and "f2" having 2 fields in each, a) file size and b) file name. The data are almost same in both the files except for few and new additional lines. Now, I have to find out and print the output as, the difference in the... (3 Replies)
Discussion started by: royalibrahim
3 Replies

7. Shell Programming and Scripting

Line by Line Comparision of 2 files and print only the difference

Hi, I am trying to find an alternative way to do tkdiff. In tkdiff the gui compares each line and highlights only the differences. for eg: John works at McDonalds s(test) He was playing guitar tywejk John works in McDonalds 9908 He was playing guitar I am... (1 Reply)
Discussion started by: naveen@
1 Replies

8. Shell Programming and Scripting

Comparing Columns and printing the difference from a particular file

Gurus, I have one file which is having multiple columns and also this file is not always contain the exact columns; sometimes it contains 5 columns or 12 columns. Now, I need to find the difference from that particular file. Here is the sample file: param1 | 10 | 20 | 30 | param2 | 10 |... (6 Replies)
Discussion started by: buzzusa
6 Replies

9. Shell Programming and Scripting

Creating large number of files of specific size

Hi I am new to shell scripting.I want to create a batch file which creates a desired number of files with a specific size say 1MB each to consume space.How can i go about it using for loop /any other loop condition using shell script? Thanks (3 Replies)
Discussion started by: swatideswal
3 Replies

10. Filesystems, Disks and Memory

Strange difference in file size when copying LARGE file..

Hi, Im trying to take a database backup. one of the files is 26 GB. I am using cp -pr to create a backup copy of the database. after the copying is complete, if i do du -hrs on the folders i saw a difference of 2GB. The weird fact is that the BACKUP folder was 2 GB more than the original one! ... (1 Reply)
Discussion started by: 0ktalmagik
1 Replies
Login or Register to Ask a Question