Comparing two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing two files
# 1  
Old 01-15-2013
Comparing two files

I am trying to do a comparison between two files, and trying to output the difference between the two files.

Let's take FileA.txt and FileB.txt for example:

Code:
FileA.txt
--------
Just A Fool:Christina Aguilera feat. Blake Shelton:Lotus (Deluxe Edition)
Figure 8:Ellie Goulding:Halcyon
Lovebird:Leona Lewis:Glassheart
Try:P!nk:The Truth About Love
Die Young:Ke$ha:Warrior

FileB.txt
--------
Just A Fool:Christina Aguilera feat. Blake Shelton:Lotus
Figure 8:Ellie Goulding:Halcyon
Lovebird:Leona Lewis:Glassheart (Deluxe Edition)
Try:P!nk:The Truth About Love
Die Young:Ke$ha:Warrior

I wanna compare FileA.txt (before) and FileB.txt (after), and take the lines that are different from FileA.txt to output to FileC.txt. So, FileC.txt should have the following output:

Code:
Just A Fool:Christina Aguilera feat. Blake Shelton:Lotus
Lovebird:Leona Lewis:Glassheart (Deluxe Edition)

I'm using diff to check for the difference:

Code:
diff FileA.txt FileB.txt > FileC.txt

However, I got the following results:
Code:
12c12
< Just A Fool:Christina Aguilera feat. Blake Shelton:Lotus (Deluxe Edition)
---
> Just A Fool:Christina Aguilera feat. Blake Shelton:Lotus

I am unable to find such an option that does what I need to do.

Help please? Smilie
# 2  
Old 01-15-2013
Code:
awk 'FNR==NR{a[$0];next} !($0 in a)' FileA.txt FileB.txt > FileC.txt

These 2 Users Gave Thanks to vgersh99 For This Post:
# 3  
Old 01-15-2013
Also, if you have sdiff, that is a bit more useful than regular diff
Code:
$ sdiff File[AB].txt | sed -n "/|/ {s/.*| *//;p;}"
Just A Fool:Christina Aguilera feat. Blake Shelton:Lotus
Lovebird:Leona Lewis:Glassheart (Deluxe Edition)

(although how you extract the required text is up to you)
This User Gave Thanks to Scott For This Post:
# 4  
Old 01-15-2013
I tried vgersh99's and Scott's solution, and both worked amazingly. Thanks!

I really need to read up more on awk/sed, because they are amazing when it comes to almost anything bash-related.

---------- Post updated at 12:21 AM ---------- Previous update was at 12:10 AM ----------

Oh yes, is it possible for vgersh99 to explain your code? It's more for documentation.
# 5  
Old 01-15-2013
Quote:
Originally Posted by todaealas
... ... ...

Oh yes, is it possible for vgersh99 to explain your code? It's more for documentation.
Code:
1 awk '
2 FNR==NR{
3       a[$0]
4       next
5 }
6 !($0 in a)
7 ' FileA.txt FileB.txt > FileC.txt

This is a reformatted version of vgersh99's awk script with line numbers added for reference during this discussion. The line numbers cannot appear in the actual script.

Line 1 says we are using the awk utility to evaluate a script of awk commands.

Lines 2 through 6 are the awk commands that make up the script. The script is delimited by the single quotes at the end of line 1 and start of line 7.

Line 7 names the two input files (FileA.txt and FileB.txt) that awk will process, and specifies that the shell running this command will redirect any output written by awk (>) into a file named FileC.txt.

When awk runs a script, it first processes any commands that are requested to run before processing data read from input files (but there aren't any of these in this script). Then it goes into a loop that reads the next line from the input files and processes that line by running the script. This loop repeats until all lines have been read and processed for all of the input files given. Then it processes any commands that are requested to run after all input lines have been processed (but there aren't any of these in this script either).

In the awk script there are commands of the form:
Code:
         condition{action}

When condition evaluates to a non-zero value or to a non-empty string (depending on context), the condition evaluates to TRUE and the commands in {action} will be performed. (If condition is not present, {action} will be performed for every input line read.) If condition is present but {action} is not present, the default action is to print the current contents of the current line. (Note that the contents of the current line may have been changed by statements in the script, so the current line night not be the line that was read.)

The condition on line 2 tests whether the number of lines read from the current input file (FNR) is equal to(==) the number of lines read from all input files (NR). This is a common idiom in awk saying "Execute this action for lines read from the 1st input file."

The command on line 3 creates an element in array a indexed by the contents of the current line ($0). That element (a["contents of current line"] is not assigned any value, it just creates an element in the array.

The command on Line 4 says stop processing this line and restart the script for the next input line.

Line 5 marks the end of the commands in the action assoiated with the condition on Line 2.

The condition on Line 6 evaluates to TRUE is there is not (!) an element in the array a indexed by the contents of the current line (($0 in a)). Since there is no {action} for this condition, if this condition evaluates to TRUE the current line will be printed.

So, if a line in the 2nd file did not also appear in the first file, print the line.

Note, however, that this will not report any differences if the same lines appear in both files, but are in a different order. It also will not notice if identical lines appear a different number of times in the two files.
These 3 Users Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Comparing two files and list the difference with common first line content of both files

I have two file as given below which shows the ACL permissions of each file. I need to compare the source file with target file and list down the difference as specified below in required output. Can someone help me on this ? Source File ************* # file: /local/test_1 # owner: own #... (4 Replies)
Discussion started by: sarathy_a35
4 Replies

2. Shell Programming and Scripting

Comparing files in a directory against an array of files

I hope I can explain this correctly. I am using Bash-4.2 for my shell. I have a group of file names held in an array. I want to compare the names in this array against the names of files currently present in a directory. If the file does not exist in the directory, that is not a problem.... (5 Replies)
Discussion started by: BudMan
5 Replies

3. Shell Programming and Scripting

Comparing the files

Hi Friends, I have file1.txt file2.txt I tried using the diff and comm but not getting the expected output.. I want where exactly the miss match occurs. probably the field. Sourcevalue|Targetvalue|Linenumber|field 29123975|2923975|3|1 Please help. (6 Replies)
Discussion started by: i150371485
6 Replies

4. Shell Programming and Scripting

Help with comparing two files

Hi all I have to compare two file this time one is P11223 x1124 x1145 t5678 e3456 z2345 another file P11223 x s (2 Replies)
Discussion started by: manigrover
2 Replies

5. UNIX for Advanced & Expert Users

How to find duplicates contents in a files by comparing other files?

Hi Guys , we have one directory ...in that directory all files will be set on each day.. files must have header ,contents ,footer.. i wants to compare the header,contents,footer ..if its same means display an error message as 'files contents same' (7 Replies)
Discussion started by: Venkatesh1
7 Replies

6. Shell Programming and Scripting

Comparing the matches in two files using awk when both files have their own field separators

I've two files with data like below: file1.txt: AAA,Apples,123 BBB,Bananas,124 CCC,Carrot,125 file2.txt: Store1|AAA|123|11 Store2|BBB|124|23 Store3|CCC|125|57 Store4|DDD|126|38 So,the field separator in file1.txt is a comma and in file2.txt,it is | Now,the output should be... (2 Replies)
Discussion started by: asyed
2 Replies

7. Shell Programming and Scripting

Need help comparing two files and deleting some things in those files!

So I have two files: File1 pictures.txt 1.1 1.3 dance.txt 1.2 1.4 treehouse.txt 1.3 1.5 File2 pictures.txt 1.5 ref2313 1.4 ref2345 1.3 ref5432 1.2 ref4244 dance.txt 1.6 ref2342 1.5 ref2352 1.4 ref0695 1.3 ref5738 1.2 ref4948 1.1 treehouse.txt 1.6 ref8573 1.5 ref3284 1.4 ref5838... (24 Replies)
Discussion started by: linuxkid
24 Replies

8. Shell Programming and Scripting

Need Help Comparing two Files

I really need help on creating a script that does the following: I have one file (File 1) with lines in the following format: Name.maf score1 score2 I have a second file (File 2) with lines in the following format: label start end Name What I need to do is compare File 1 and... (1 Reply)
Discussion started by: awknerd
1 Replies

9. Shell Programming and Scripting

Comparing files

I have a file called X, which contains the following: 10 100 200 300 I then have file Y, which containts the following: 10 200 500 800 I want to write a script that will compare the contents of Y with the contents of X and ONLY return values in Y that does not exist in X (output... (5 Replies)
Discussion started by: soliberus
5 Replies

10. UNIX for Advanced & Expert Users

comparing shadow files with real files

Hi I need to compare shadow file sizes with their real file counterparts. If the shadow file size differs form the realfile size then it must send a mail. My problem is that our system has over 1600 shadowfiles in different directories, with different names. the only consistancy is the .sh file... (4 Replies)
Discussion started by: terrym
4 Replies
Login or Register to Ask a Question