The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Formatting Output dhanamurthy Shell Programming and Scripting 6 05-02-2008 08:43 AM
formatting output balaji_prk Shell Programming and Scripting 4 09-15-2007 06:23 AM
Formatting output illur81 Shell Programming and Scripting 3 10-13-2005 06:24 AM
diff 2 files; output diff's to 3rd file blt123 Shell Programming and Scripting 2 05-28-2002 08:29 AM
Formatting the output Cameron Shell Programming and Scripting 7 02-15-2002 07:30 AM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #8  
Old 05-06-2008
Registered User
 

Join Date: Apr 2008
Location: Chennai,India
Posts: 75
Unhappy Help needed in formatting the ouptut of diff

Hi era,

This code is not working.

sort -k10 21.txt 22.txt |
awk '$10 == prevfile && $12 != prevsize { print prev; print }
{ prevfile = $10; prevsize=$12; prev=$0 }'

I am not getting any output.The contents of my files are
for e.g.
File 1 contains

A The row count of file2.txt is 23
A The row count of file3.txt is 20
A The row count of file4.txt is 2

File 2 contains

a The row count of file2.txt is 22
a The row count of file3.txt is 21
a The row count of file4.txt is 1

I need to ignore the case and then compare.

I have another set of files in this format.
For eg.
File3 contains.
From Test Run - 1|20070228|070|100|0.00000|0.00000|2240605.00000
From Test Run - 4|20070228|076|100|0.00000|0.00000|424064.29000
From Test Run - 5|20070228|077|100|0.00000|0.00000|2545203.00000
From Test Run - 6|20070228|078|100|0.00000|0.00000|432940.00000
From Test Run - 2|20070228|074|100|1.00000|0.00000|97857.55000
From Test Run - 3|20070228|075|100|0.00000|0.00000|299658.93000

File 4 contains

From Test Run - 1|20070230|070|100|0.00000|0.00000|2240605.00000
From Test Run - 4|20070228|076|100|0.00000|0.00000|424064.29000
From Test Run - 5|20070228|077|100|0.00000|0.00000|2545203.00000
From Test Run - 6|20070228|078|100|0.00000|0.00000|432940.00000
From Test Run - 2|20070228|074|100|1.00000|0.00000|97857.55000
From Test Run - 3|20070228|075|100|0.00000|0.00000|299658.93000

There is spacing difference in this.The size of these files(file3 and file4) may go upto 100MB or so.Will the above code be able to compare these files too?

Last edited by ragavhere; 05-06-2008 at 06:01 AM.
Reply With Quote
Forum Sponsor
  #9  
Old 05-07-2008
era era is offline
Herder of Useless Cats
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,650
Quote:
Originally Posted by ragavhere View Post
File 1 contains

A The row count of file2.txt is 23
A The row count of file3.txt is 20
A The row count of file4.txt is 2

File 2 contains

a The row count of file2.txt is 22
a The row count of file3.txt is 21
a The row count of file4.txt is 1
These are different from what you posted before, so the field numbers are wrong. You need to change $10 to $6 and $12 to $8 for them to work, and to change the sort too.

Code:
sort -k6 file1 file2 |
awk '$6 == prevfile && $8 != prevsize { print prev; print }
{ prevfile = $6; prevsize=$8; prev=$0 }'
See? The file name of your latest example is in the sixth field so that's $6 and the row count is in the eighth column so $8.

Quote:
File3 contains.
From Test Run - 1|20070228|070|100|0.00000|0.00000|2240605.00000
From Test Run - 4|20070228|076|100|0.00000|0.00000|424064.29000
From Test Run - 5|20070228|077|100|0.00000|0.00000|2545203.00000
From Test Run - 6|20070228|078|100|0.00000|0.00000|432940.00000
From Test Run - 2|20070228|074|100|1.00000|0.00000|97857.55000
From Test Run - 3|20070228|075|100|0.00000|0.00000|299658.93000

File 4 contains

From Test Run - 1|20070230|070|100|0.00000|0.00000|2240605.00000
From Test Run - 4|20070228|076|100|0.00000|0.00000|424064.29000
From Test Run - 5|20070228|077|100|0.00000|0.00000|2545203.00000
From Test Run - 6|20070228|078|100|0.00000|0.00000|432940.00000
From Test Run - 2|20070228|074|100|1.00000|0.00000|97857.55000
From Test Run - 3|20070228|075|100|0.00000|0.00000|299658.93000

There is spacing difference in this.The size of these files(file3 and file4) may go upto 100MB or so.Will the above code be able to compare these files too?
If the same logic applies then yes, it still keeps only two lines at a time in memory, so the file size doesn't matter to awk. Again, the file size is a problem for the sort, but if you can sort these files then the rest is trivial.

I won't try to adapt the code because it's not at all clear to me which fields should be compared and which fields ignored. If you just need to ignore spacing and case then of course, you can always convert them to some normalized form first, and then compare with a simpler tool such as diff or comm.

Code:
tr -s ' ' <file3 | tr A-Z a-z >temp3
tr -s '<file4 | tr A-Z a-z >temp4
diff temp3 temp4
Reply With Quote
  #10  
Old 05-07-2008
Registered User
 

Join Date: Apr 2008
Location: Chennai,India
Posts: 75
Question

File3 contains.
From Test Run - 1|20070228|070|100|0.00000|0.00000|2240605.00000
From Test Run - 4|20070228|076|100|0.00000|0.00000|424064.29000

File 4 contains
From Test Run - 1|20070230|070|200|0.00000|0.00000|2240605.00000

Here file 3 and file4 has mismatches.I have made the mismatching field bold.Comparison should be such that each field in each and every line should be compared.If there is a mismatch print the corresponding line from both the files one below the other.If one line is present in one and not in the other then print that line from the corresponding file. For e.g in the above example the line common to both the files has mismatches.hence both these lines should be print to my output file.and the 2nd line from file3.txt has an additional line.This should also be printed.

The code i used is

diff -b -i file3.txt file4.txt > file5.txt

Diff would compare files of size upto 2GB only.But i am not sure whether my filesize would increase beyond 2GB in the future.So is there anyother way of comparing taking into account the spacing difference and case?
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 12:08 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0