Fixed width file comparision not working

02-08-2013

Registered User

82, 0

Join Date: Feb 2008

Last Activity: 11 January 2016, 10:05 AM EST

Posts: 82

Thanks Given: 1

Thanked 0 Times in 0 Posts

Fixed width file comparision not working

Quote:

Sample records
1000000782378 abc 78 909 jksjd 0909 askkjdk 0909asd jkjk jk j as0d90a9sd
1000006782379 ddd 789 8999 kiks 0909 askkjdk 0909asd jkjk jk j as0d90a9sd
1000004789898 bcsdskdjsk9 hajsh 8989 ashdkajsd889898d jkjk jk j as0d90a9sd
1000008989890 abc 78 909 jksjd 0909 askkjdk 0909asd jkbnsh aksdhakshdkkh
1000009098988 abc 78 909 jksjd 0909 askkjdk 0909asd jkjk jk j as0d90a9sd
1000000009878 abc 78 909 jksjd 0909 askkjdk 0909asd jkjk jk j as0d90a9sd

I have two fixed width files which is having 200K records.

Two File size is same,
Two file counts are same,
Checked sample records, the record length and data are fine.

when i used diff/cmp/cat -v commands i am getting the difference

Code:

cmp command 
cmp -l file1 file2 |head -1

1300 15 10

Manually checked records[vi +1300 filename]. record length and data matched.

Code:

diff file1 file2 

3C3
<record information

Manually checked records[vi +3 filename]. record length and data matched.

Code:

Checked with cat -v -to find any special charters 
Example 
cat -v "some value" file1   <--No special charters
cat -v "some value" file2   <--No special charters

is their any way we can compare fixed width files...
Thanks
oneSuri

onesuri

View Public Profile for onesuri

Find all posts by onesuri

02-08-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Pls post relevant parts of your files, e.g. some lines around line 3, and some around line 1300. Did you check for the literal string "record information" that diff printed for line 3?

RudiC

View Public Profile for RudiC

Find all posts by RudiC

02-08-2013

Registered User

36, 19

Join Date: Feb 2013

Last Activity: 24 April 2013, 10:34 AM EDT

Posts: 36

Thanks Given: 0

Thanked 19 Times in 14 Posts

You could try: md5sum
PS: I always use cat -A, rather than cat -v

user8

View Public Profile for user8

Find all posts by user8

02-08-2013

Registered User

82, 0

Join Date: Feb 2008

Last Activity: 11 January 2016, 10:05 AM EST

Posts: 82

Thanks Given: 1

Thanked 0 Times in 0 Posts

Quote:

i am unable to provide the information on the actual data.
i am unable to find the diff/cmp to give the exact value mismatch in fixed width file.Currently each row length 2500 characters.

i need to see the difference exactly where it is the problem
row length -2500 charters.

if i get the difference and print the value where it is mismatched with the value

sample file info with length of the row 10
file1
123456789
file2
123456780

i want to print only the difference is like this
file1- 9
file2- 0

onesuri

View Public Profile for onesuri

Find all posts by onesuri

02-11-2013

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

This solution relies on components docdiff and a short perl script:

Code:

#!/usr/bin/env bash

# @(#) s2	Demonstrate differences at character level.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C perl docdiff 

f1=data1
f2=data2
FILES="$f1 $f2"

pl " Input files $FILES"
head $FILES

pl " perl extraction helper script:"
cat p1

pl " Results, wdiff format, $f1, $f2:"
docdiff --wdiff --char $f1 $f2

pl " Results, wdiff format, $f1, $f2, extracted diff with labels:"
docdiff --wdiff --char $f1 $f2 |
./p1 $f1 $f2

pl " Results, wdiff format, $f2, $f1, extracted diff with labels:"
docdiff --wdiff --char $f2 $f1 |
./p1 $f2 $f1

exit 0

producing:

Code:

% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
perl 5.10.0
docdiff 0.3.4

-----
 Input files data1 data2
==> data1 <==
orange
123456789xa
X-klystron

==> data2 <==
orange
123456780xb
Y-klystron

-----
 perl extraction helper script:
#!/usr/bin/env perl

# @(#) p1	Demonstrate wdiff difference format extraction with labels.

$f1 = shift || die " Missing first label.\n";
$f2 = shift || die " Missing second label.\n";

while (<>) {
  @a = m/\[-(.*?)-\]/xmsg;
  print "$f1: ", join( "", @a ), "\n" if defined @a;
  @b = m/\{\+(.*?)\+\}/xmsg;
  print "$f2: ", join( "", @b ), "\n" if defined @b;
}

exit(0);

-----
 Results, wdiff format, data1, data2:
orange
12345678[-9-]{+0+}x[-a-]{+b+}
[-X-]{+Y+}-klystron

-----
 Results, wdiff format, data1, data2, extracted diff with labels:
data1: 9a
data2: 0b
data1: X
data2: Y

-----
 Results, wdiff format, data2, data1, extracted diff with labels:
data2: 0b
data1: 9a
data2: Y
data1: X

The idea is that docdiff can print difference in resolution down to characters. The wdiff-style output is processed by the perl script. The data files were augmented to try to make sure that multiple lines could be processed as well as lines that were identical.

The docdiff utility is written in ruby, is available in Debian-based GNU/Linux repositories, and can also be found at DocDiff: Compare text word by word | Free Development software downloads at SourceForge.net

See man pages for details.

Best wishes ... cheers, drl (125)

---------- Post updated at 08:52 ---------- Previous update was at 08:10 ----------

Hi.

An all-perl solution:

Code:

#!/usr/bin/env perl

# @(#) p1	Demonstrate character differences in same-length lines.

use warnings;
use strict;

my (
  $f1, $f2, $file1, $file2, $i,       @a, @b,
  $s1, $s2, $t1,    $t2,    $changed, $debug
);

$f1 = shift || die " Missing first file.\n";
$f2 = shift || die " Missing second file.\n";

$debug = 1;
$debug = 0;

open( $file1, "<", $f1 ) || die " Cannot open file $f1\n";
open( $file2, "<", $f2 ) || die " Cannot open file $f2\n";
while ( $t1 = <$file1> ) {
  chomp($t1);
  @a = split "", $t1;
  $t2 = <$file2>;
  chomp($t2);
  @b = split "", $t2;
  print "file1,2 = ", join "", @a, " ", join "", @b, "\n" if $debug;
  $changed = 0;
  $s1 = $s2 = "";

  for ( $i = 0; $i <= $#a; $i++ ) {
    if ( $a[$i] ne $b[$i] ) {
      $s1 = "$f1: " if not $changed;
      $s2 = "$f2: " if not $changed;
      $s1 .= $a[$i];
      $s2 .= $b[$i];
      $changed++;
    }
  }
  print "$s1\n" if $changed;
  print "$s2\n" if $changed;
}

exit(0);

producing, using the data files noted above:

Code:

% ./p2 data1 data2
data1: 9a
data2: 0b
data1: X
data2: Y

Best wishes ... cheers, drl

drl

View Public Profile for drl

Find all posts by drl

Shell Programming and Scripting

Fixed width file comparision not working

10 More Discussions You Might Find Interesting

1. Answers to Frequently Asked Questions

Fixed width file issue and resolutions

Discussion started by: santoshkm

2. Shell Programming and Scripting

Replace using awk on fixed width file.

Discussion started by: pinnacle

3. Shell Programming and Scripting

Alter Fixed Width File

Discussion started by: vinus

4. UNIX for Dummies Questions & Answers

Length of a fixed width file

Discussion started by: Amrutha24

5. Shell Programming and Scripting

Comparing two fixed width file

Discussion started by: anshul_er

6. Shell Programming and Scripting

Fixed-Width file from Oracle

Discussion started by: Amit.Sagpariya

7. Shell Programming and Scripting

summing up the fields in fixed width file

Discussion started by: srilaxmi

8. UNIX Desktop Questions & Answers

Help with Fixed width File Parsing

Discussion started by: sate911

9. Shell Programming and Scripting

adding delimiter to a fixed width file

Discussion started by: sumeet

10. UNIX for Dummies Questions & Answers

Fixed Width file using AWK

Discussion started by: alok.benjwal