The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk - comparing files dbrundrett Shell Programming and Scripting 6 01-18-2009 10:51 PM
Comparing two files superstar003 Forum Support Area for Unregistered Users & Account Problems 1 05-08-2008 03:34 AM
Comparing two files.. padarthy Shell Programming and Scripting 1 08-29-2007 08:01 AM
Comparing two files... paqman Shell Programming and Scripting 12 08-08-2007 03:45 AM
comparing shadow files with real files terrym UNIX for Advanced & Expert Users 4 02-09-2007 02:38 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rating: Thread Rating: 3 votes, 3.00 average. Display Modes
  #1 (permalink)  
Old 04-21-2008
ragavhere ragavhere is offline
Registered User
  
 

Join Date: Apr 2008
Location: Chennai,India
Posts: 79
Cool Comparing two files

Hi,

I have two files in this format. The files contain the statistics of tables as seen below. The other file is also in this format. I need to compare both the files and if there is a mismatch i need to display the contents within the break lines from both the files for that corresponding table.

Table Name: AAA
Row Count:96 SUM(F1): 3739 MAX(F1):77 MIN(F1): 0 AVG(F1): 38.9479167 LENGTH(LINE): 2260

------------------------------------------------------------------------------------------------------------------------------

Table Name: AQ$_FT_Q_BECMD_G
Row Count: 0 SUM(SUBSCRIBER#):MAX(SUBSCRIBER#): MIN(SUBSCRIBER#): AVG(SUBSCRIBER#):LENGTH(NAME):SUM(ADDRESS#): MAX(ADDRESS#):MIN(ADDRESS#):AVG(ADDRESS#):

------------------------------------------------------------------------------------------------------------------------------

Table Name: AQ$_FT_Q_BECMD_H
Row Count: 0 SUM(SUBSCRIBER#):MAX(SUBSCRIBER#): MIN(SUBSCRIBER#): AVG(SUBSCRIBER#):LENGTH(NAME):SUM(ADDRESS#): MAX(ADDRESS#):MIN(ADDRESS#):AVG(ADDRESS#): LENGTH(TRANSACTION_ID): SUM(DEQUEUE_USER): MAX(DEQUEUE_USER): LENGTH(DEQUEUE_USER): 50
------------------------------------------------------------------------------------------------------------------------------

Can anyone help me on this?

Last edited by ragavhere; 05-12-2008 at 12:27 AM..
  #2 (permalink)  
Old 04-21-2008
aigles's Avatar
aigles aigles is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2004
Location: Bordeaux, France
Posts: 1,414
A possible way using awk :
Code:
awk '
/^Table Name/  { table = $3 ; tables[table]++                }
! NF || /^-+/  {                                        next }
NR == FNR      { stats1[table] = stats1[table] $0 ORS ; next }
               { stats2[table] = stats2[table] $0 ORS ; next }
END {
   for (t in tables) {
      if (stats1[t] != stats2[t]) {
         out = "==========================================" ORS
         out = out stats1[t] ORS stats2[t]
         print out
      }
   }
}
' stats1.dat stats2.dat
Input file 1:
Code:
Table Name: AAA
Row Count:96 SUM(F1): 3739 MAX(F1):77 MIN(F1): 0 AVG(F1): 38.9479167 LENGTH(LINE): 2260

------------------------------------------------------------------------------------------------------------------------------
Table Name: AQ$_FT_Q_BECMD_G
Row Count: 0 SUM(SUBSCRIBER#):MAX(SUBSCRIBER#): MIN(SUBSCRIBER#): AVG(SUBSCRIBER#):LENGTH(NAME):SUM(ADDRESS#): MAX(ADDRESS#):MIN(ADDRESS#):AVG(ADDRESS#):

------------------------------------------------------------------------------------------------------------------------------

Table Name: AQ$_FT_Q_BECMD_H
Row Count: 0 SUM(SUBSCRIBER#):MAX(SUBSCRIBER#): MIN(SUBSCRIBER#): AVG(SUBSCRIBER#):LENGTH(NAME):SUM(ADDRESS#): MAX(ADDRESS#):MIN(ADDRESS#):AVG(ADDRESS#): LENGTH(TRANSACTION_ID): SUM(DEQUEUE_USER): MAX(DEQUEUE_USER):
------------------------------------------------------------------------------------------------------------------------------
Input file 2:
Code:
Table Name: AAA
Row Count:96 SUM(F1): 3739 MAX(F1):77 MIN(F1): 0 AVG(F1): 38.9479167 LENGTH(LINE): 2260

------------------------------------------------------------------------------------------------------------------------------
Table Name: AQ$_FT_Q_BECMD_G
Row Count: 0 SUM(SUBSCRIBER#):MAX(SUBSCRIBER#): MIN(SUBSCRIBER#): AVG(SUBSCRIBER#):LENGTH(NAME):SUM(ADDRESS#): MAX(ADDRESS#):MIN(ADDRESS#):AVG(ADDRESS#):

------------------------------------------------------------------------------------------------------------------------------
Table Name: AQ$_FT_Q_BECMD_K
Row Count: 0 SUM(SUBSCRIBER#):MAX(SUBSCRIBER#): MIN(SUBSCRIBER#): AVG(SUBSCRIBER#):LENGTH(NAME):SUM(ADDRESS#): MAX(ADDRESS#):MIN(ADDRESS#):AVG(ADDRESS#):

------------------------------------------------------------------------------------------------------------------------------

Table Name: AQ$_FT_Q_BECMD_H
Row Count: 1 SUM(SUBSCRIBER#): 11 MAX(SUBSCRIBER#): 11 MIN(SUBSCRIBER#): 11 AVG(SUBSCRIBER#): 11 LENGTH(NAME) 24 :SUM(ADDRESS#): MAX(ADDRESS#): 0 MIN(ADDRESS#): 0 AVG(ADDRESS#):  0 LENGTH(TRANSACTION_ID): 4 SUM(DEQUEUE_USER): 0 MAX(DEQUEUE_USER): 0
------------------------------------------------------------------------------------------------------------------------------
Ouput:
Code:
==========================================
Table Name: AQ$_FT_Q_BECMD_H
Row Count: 0 SUM(SUBSCRIBER#):MAX(SUBSCRIBER#): MIN(SUBSCRIBER#): AVG(SUBSCRIBER#):LENGTH(NAME):SUM(ADDRESS#): MAX(ADDRESS#):MIN(ADDRESS#):AVG(ADDRESS#): LENGTH(TRANSACTION_ID): SUM(DEQUEUE_USER): MAX(DEQUEUE_USER):

Table Name: AQ$_FT_Q_BECMD_H
Row Count: 1 SUM(SUBSCRIBER#): 11 MAX(SUBSCRIBER#): 11 MIN(SUBSCRIBER#): 11 AVG(SUBSCRIBER#): 11 LENGTH(NAME) 24 :SUM(ADDRESS#): MAX(ADDRESS#): 0 MIN(ADDRESS#): 0 AVG(ADDRESS#):  0 LENGTH(TRANSACTION_ID): 4 SUM(DEQUEUE_USER): 0 MAX(DEQUEUE_USER): 0

==========================================

Table Name: AQ$_FT_Q_BECMD_K
Row Count: 0 SUM(SUBSCRIBER#):MAX(SUBSCRIBER#): MIN(SUBSCRIBER#): AVG(SUBSCRIBER#):LENGTH(NAME):SUM(ADDRESS#): MAX(ADDRESS#):MIN(ADDRESS#):AVG(ADDRESS#):
Jean-Pierre.
  #3 (permalink)  
Old 04-22-2008
ragavhere ragavhere is offline
Registered User
  
 

Join Date: Apr 2008
Location: Chennai,India
Posts: 79
Thumbs up Comparing two files

Hi,

The code is working very fine. Can you please explain each line of the code?

  #4 (permalink)  
Old 04-22-2008
aigles's Avatar
aigles aigles is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2004
Location: Bordeaux, France
Posts: 1,414
Code:
awk '

/^Table Name/ {                             # Select lines starting with 'Table Name'
   table = $3;                              # Memorize table name into variable
   tables[table]++                          #   and array
}                                           #

! NF || /^-+/ {                             # Select empty and delimiter lines 
   next                                     # Proceed next line (skip selected lines)
}                                           #

NR == FNR {                                 # Select lines from first input file 
   stats1[table] = stats1[table] $0 ORS;    # Memorize stats's table
   next                                     # Proceed next line
}

{                                           # Lines comes from second input file 
   stats2[table] = stats2[table] $0 ORS;    # Memorize stats's table
   next                                     # Proceed next line
}                                           #

END {                                       # All files have been read
   for (t in tables) {                      # For all memorized tables
      if (stats1[t] != stats2[t]) {         #    If stats mismatch
         out = "==========================================" ORS;
         out = out stats1[t] ORS stats2[t]; #
         print out                          #       Output sep line and stats
      }                                     #
   }                                        #
}                                           #

' stats1.dat stats2.dat
Jean-Pierre.
  #5 (permalink)  
Old 04-22-2008
ragavhere ragavhere is offline
Registered User
  
 

Join Date: Apr 2008
Location: Chennai,India
Posts: 79
Thumbs up Comparing two files

Hi aigles,

Thanks a lot.
  #6 (permalink)  
Old 04-26-2008
ragavhere ragavhere is offline
Registered User
  
 

Join Date: Apr 2008
Location: Chennai,India
Posts: 79
Cool Comparing two files

Hi Jean-Pierre,

Does stats1 and stats2 refer to the two input files? Will the code work if i give the path name like /home/frk/ragav/stats1 and /home/frk/ragav/stats2 instead of the file names? Then how should the code be modified?

If i assign the path like /home/frk/ragav/stats1 to a variable how can i call the path in the code?

When i assigned the file name to a variable like
a=stats1.txt
b=stats2.txt
and changed the code to

nawk '
/^Table Name/ { table = $3 ; tables[table]++ }
! NF || /^-+/ { next }
NR == FNR { $e[table] = $e[table] $0 ORS ; next }
{ $f[table] = $f[table] $0 ORS ; next }
END {
for (t in tables) {
if ($e[t] != $f[t]) {
out = "-----------------------------------------------------------------" ORS
out = out $e[t] ORS $f[t]
print out
}
}
}
' $e $f >> result.out


i am getting this error.

nawk: illegal field $()
input record number 1, file startendcut1.txt
source line number 4

Can you please help on the above two ways of modifying the code?

Last edited by ragavhere; 04-26-2008 at 05:31 AM.. Reason: Error reason included
  #7 (permalink)  
Old 04-26-2008
aigles's Avatar
aigles aigles is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2004
Location: Bordeaux, France
Posts: 1,414
In my script, stats1 and stats2 inside awk code are arrays.
stats1.dat and stats2.dat are the input files.

The inputfiles can be specified with the full path.
Don't use $e and $f as arrays in your awk code, use fixed names (stats1 and stats2 or what you want like first_array and second_array...)
Code:
a=stats1.txt
b=stats2.txt

nawk '
/^Table Name/ { table = $3 ; tables[table]++ }
! NF || /^-+/ { next }
NR == FNR     { stats1[table] = stats1[table] $0 ORS ; next }
              {stats2[table] = stats2[table] $0 ORS ; next }
END {
   for (t in tables) {
      if (stats1[t] != stats2[t]) {
         out = "-----------------------------------------------------------------" ORS
         out = out stats1[t] ORS stats2[t]
         print out
      }
   }
}
' $e $f >> result.out
Jean-Pierre.
Sponsored Links
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 04:33 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0