Formatting the output from diff


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Formatting the output from diff
# 8  
Old 05-06-2008
Data Help needed in formatting the ouptut of diff

Hi era,

This code is not working.

sort -k10 21.txt 22.txt |
awk '$10 == prevfile && $12 != prevsize { print prev; print }
{ prevfile = $10; prevsize=$12; prev=$0 }'

I am not getting any output.The contents of my files are
for e.g.
File 1 contains

A The row count of file2.txt is 23
A The row count of file3.txt is 20
A The row count of file4.txt is 2

File 2 contains

a The row count of file2.txt is 22
a The row count of file3.txt is 21
a The row count of file4.txt is 1

I need to ignore the case and then compare.

I have another set of files in this format.
For eg.
File3 contains.
From Test Run - 1|20070228|070|100|0.00000|0.00000|2240605.00000
From Test Run - 4|20070228|076|100|0.00000|0.00000|424064.29000
From Test Run - 5|20070228|077|100|0.00000|0.00000|2545203.00000
From Test Run - 6|20070228|078|100|0.00000|0.00000|432940.00000
From Test Run - 2|20070228|074|100|1.00000|0.00000|97857.55000
From Test Run - 3|20070228|075|100|0.00000|0.00000|299658.93000

File 4 contains

From Test Run - 1|20070230|070|100|0.00000|0.00000|2240605.00000
From Test Run - 4|20070228|076|100|0.00000|0.00000|424064.29000
From Test Run - 5|20070228|077|100|0.00000|0.00000|2545203.00000
From Test Run - 6|20070228|078|100|0.00000|0.00000|432940.00000
From Test Run - 2|20070228|074|100|1.00000|0.00000|97857.55000
From Test Run - 3|20070228|075|100|0.00000|0.00000|299658.93000

There is spacing difference in this.The size of these files(file3 and file4) may go upto 100MB or so.Will the above code be able to compare these files too?

Last edited by ragavhere; 05-06-2008 at 10:01 AM..
# 9  
Old 05-07-2008
Quote:
Originally Posted by ragavhere
File 1 contains

A The row count of file2.txt is 23
A The row count of file3.txt is 20
A The row count of file4.txt is 2

File 2 contains

a The row count of file2.txt is 22
a The row count of file3.txt is 21
a The row count of file4.txt is 1
These are different from what you posted before, so the field numbers are wrong. You need to change $10 to $6 and $12 to $8 for them to work, and to change the sort too.

Code:
sort -k6 file1 file2 |
awk '$6 == prevfile && $8 != prevsize { print prev; print }
{ prevfile = $6; prevsize=$8; prev=$0 }'

See? The file name of your latest example is in the sixth field so that's $6 and the row count is in the eighth column so $8.

Quote:
File3 contains.
From Test Run - 1|20070228|070|100|0.00000|0.00000|2240605.00000
From Test Run - 4|20070228|076|100|0.00000|0.00000|424064.29000
From Test Run - 5|20070228|077|100|0.00000|0.00000|2545203.00000
From Test Run - 6|20070228|078|100|0.00000|0.00000|432940.00000
From Test Run - 2|20070228|074|100|1.00000|0.00000|97857.55000
From Test Run - 3|20070228|075|100|0.00000|0.00000|299658.93000

File 4 contains

From Test Run - 1|20070230|070|100|0.00000|0.00000|2240605.00000
From Test Run - 4|20070228|076|100|0.00000|0.00000|424064.29000
From Test Run - 5|20070228|077|100|0.00000|0.00000|2545203.00000
From Test Run - 6|20070228|078|100|0.00000|0.00000|432940.00000
From Test Run - 2|20070228|074|100|1.00000|0.00000|97857.55000
From Test Run - 3|20070228|075|100|0.00000|0.00000|299658.93000

There is spacing difference in this.The size of these files(file3 and file4) may go upto 100MB or so.Will the above code be able to compare these files too?
If the same logic applies then yes, it still keeps only two lines at a time in memory, so the file size doesn't matter to awk. Again, the file size is a problem for the sort, but if you can sort these files then the rest is trivial.

I won't try to adapt the code because it's not at all clear to me which fields should be compared and which fields ignored. If you just need to ignore spacing and case then of course, you can always convert them to some normalized form first, and then compare with a simpler tool such as diff or comm.

Code:
tr -s ' ' <file3 | tr A-Z a-z >temp3
tr -s '<file4 | tr A-Z a-z >temp4
diff temp3 temp4

# 10  
Old 05-07-2008
Question

File3 contains.
From Test Run - 1|20070228|070|100|0.00000|0.00000|2240605.00000
From Test Run - 4|20070228|076|100|0.00000|0.00000|424064.29000

File 4 contains
From Test Run - 1|20070230|070|200|0.00000|0.00000|2240605.00000

Here file 3 and file4 has mismatches.I have made the mismatching field bold.Comparison should be such that each field in each and every line should be compared.If there is a mismatch print the corresponding line from both the files one below the other.If one line is present in one and not in the other then print that line from the corresponding file. For e.g in the above example the line common to both the files has mismatches.hence both these lines should be print to my output file.and the 2nd line from file3.txt has an additional line.This should also be printed.

The code i used is

diff -b -i file3.txt file4.txt > file5.txt

Diff would compare files of size upto 2GB only.But i am not sure whether my filesize would increase beyond 2GB in the future.So is there anyother way of comparing taking into account the spacing difference and case?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with diff output

I am running diff between two directories dir1 and dir2. diff --exclude --recursive --brief -b dir1 dir2 The output of the above command is Files dir1/java/abc/bcd/abc9991.java and dir2/java/abc/bcd/abc9991.java differ Files dir1/java/abc/bcd/abc9933.java and... (11 Replies)
Discussion started by: gaurav99
11 Replies

2. UNIX for Dummies Questions & Answers

Output formatting for diff -y

Hi, I wasn't sure whether to post this in the dummies or expert section, here's what I'm trying to do, but I suspect I'm missing the boat and should perhaps be using some of diff's builtin output functionality. diff -yb --suppress-common-lines file1.js file2.js >> ~/results.txt When I... (5 Replies)
Discussion started by: Buckaroo Banzai
5 Replies

3. UNIX for Dummies Questions & Answers

What does this diff output mean?

35d34 < What does that mean in diff? (3 Replies)
Discussion started by: glev2005
3 Replies

4. Shell Programming and Scripting

Processing diff output

How to get diff to not print the chevrons and the dashes? In this case the differences are all single line differences. Also the first few lines don't matter. How to get the output to always exclude the first few lines? Thanks! (1 Reply)
Discussion started by: stevensw
1 Replies

5. Shell Programming and Scripting

Tweaking the output of diff

hello everyone, I am trying to compare two files and have the result in a new files. When I used diff I am getting the header, '<' and '>' in my result which I don't want to have it in my output file. :wall: opt/sam/input: diff file1.txt file2.txt 1,20d0 < 16,ZA, < ZJ,08, < Z7,03, Any... (1 Reply)
Discussion started by: siteregsam
1 Replies

6. Shell Programming and Scripting

diff output next to each other

I have two files to compare, but diff output doesn't give me decent output I want. The portion of the two files are shown below. file 1) Authorize <1> Transaction Database Slave 3 <1> CPM HTTP Proxy Server <1> SSP (TDB Server) <1> CPM Application Authorization <7> CPM Script... (5 Replies)
Discussion started by: Daniel Gate
5 Replies

7. Shell Programming and Scripting

diff output is it correct??

I'm asking for explanation about the output of the diff format when i compare the two files f1 and f2: root@host1 # cat f1 205226 205237 205250 205255 205262 205274 205307 205403 205464 205477 205500 205520 205626 205759 205766 205776 (2 Replies)
Discussion started by: ahmad.zuhd
2 Replies

8. Shell Programming and Scripting

Is there a way to limit DIFF output

Hello is there a way to limit the number of lines output by the DIFF command? I tried -C 200 ect and -c but it continues to print out the whole huge file. Reason needed is i'm trying to do alot of DIFFs on a long list of files and would like to only get back an indicator which files are... (2 Replies)
Discussion started by: bobk544
2 Replies

9. UNIX for Dummies Questions & Answers

diff output

I have two CSV files and I would like to create a third CSV file containing the differences between the two. I understand the diff command can be used to list differences between two files. My problem is that when I pipe the output into a third CSV file, the line numbers and other formatting... (3 Replies)
Discussion started by: paulp
3 Replies

10. Shell Programming and Scripting

diff 2 files; output diff's to 3rd file

Hello, I want to compare two files. All records in file 2 that are not in file 1 should be output to file 3. For example: file 1 123 1234 123456 file 2 123 2345 23456 file 3 should have 2345 23456 I have looked at diff, bdiff, cmp, comm, diff3 without any luck! (2 Replies)
Discussion started by: blt123
2 Replies
Login or Register to Ask a Question