Compare 2 flat files


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare 2 flat files
# 1  
cr

Hi Frnds,

I have a flat file with millions of records. .

Now I on this. (I prefer for AWK as its gives good performance.)

Old_file.txt
------------------
1 gopi ase .
2 arun pl
3 jack sutha ..
4 peter pm ..


New_file.txt
---------------
4 peter pm ..
..

Outputfile.txt
2 arun pl ..
..

Last edited by Gopal_Engg; 02-19-2010 at 08:49 AM.. Reason: cr
# 2  
Code:-

Code:
nawk ' NR==FNR{a[$1]=$0 ; next} !a[$1] ' New_file.txt Old_file.txt > Output.txt

Code:
o/p:-

2 arun pl

SmilieSmilieSmilieSmilie
# 3  
Hi, I can see that You prefer AWK but the small program comm can also be very useful. I think it wouldn't barf on big files unless there are too many lines between diffs.

Example, this will show You the lines that does not exist in both files:
Code:
comm -3 oldfile.txt newfile.txt

So in the case of newfile.txt only having records removed, it would work.
Best regards,
Lakris
# 4  
Quote:
Originally Posted by Lakris
Hi, I can see that You prefer AWK but the small program comm can also be very useful. I think it wouldn't barf on big files unless there are too many lines between diffs.

Example, this will show You the lines that does not exist in both files:
Code:
comm -3 oldfile.txt newfile.txt

So in the case of newfile.txt only having records removed, it would work.
Best regards,
Lakris
kindly ..but I didn't know that nawk or gawk will barf on big files!!!!
I only know that its limitation is the fields size only (columns)..
Is it right guys?
# 5  
MySQL

comm is the ideal candidate for doing the job. It is faster than awk.

Code:
$comm file_orig file_new
contents(only in file_orig)  contents(only in file_new)  contents (common)

ourput consists of three columns as shown.

You can skip the particular column by giving it's number as option to comm.
Code:
$comm -12 fileA fileB

skips both 1 & 2 and gives you the common contents of both files. (3 row)

For your job the command would be:
Code:
$comm  -23 orig_file new_file

gives the list of deleted records in new_file.
Code:
Assumptions made:
orig_file : file containing all the records
new_file : subset of orig_file. some records are deleted from this.

row 1 : contains the elements unique to file_orig (not there in file_new)

Hope this helps.

Cheers,
14341

Last edited by 14341; 12-29-2009 at 09:42 AM..
# 6  
14341:- I thought that the issue is awk will barf
on big files, not that "comm" is faster on execution than awk; I know that shell commands always faster than external ones.

I hope I had clear the issue.
and still I don't know if awk has limitation on file raw or columns sizes.

BR
# 7  
awk has some limitation.
eg.
Code:
Number of fields per record	100
Characters per input record	3000
Characters per output record	3000
Characters per field	1024
Characters per printf string	3000
Characters in literal string	400
Characters in character class	400
Files open	15
Pipes open	1


though, gawk,mawk and other latest version are the alternatives for these limitations.

reference - Orelly - sed & awk ch 10.8 Limitations

just wanted to share this info.
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #633
Difficulty: Medium
The touch and gesture features of the iPhone are based on technology originally developed by FingerWorks.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Converting Multiline Files to Flat Files?

How to convert this: F1-R1 F1-R2 F1-R3 into a flat file for bash?? Each record F2-R1 F2-R2 F2-R3 F3-R1 F3-R2 F3-R3 F4-R1 F4-R2 F4-R3is on one line with all fields for that record, put into an output file. The output file should look like this when converted: F1-R1,F2-R1,F3-R1,F4-R1... (6 Replies)
Discussion started by: bud1738
6 Replies

2. Shell Programming and Scripting

Compare to flat files using awk

compare to flat files using awk .but in 4th field contains non ordered substring. how to do that. file1.txt john|0.0|4|**:25;JP:50;UY:25 file2.txt andy|0.0|4|JP:50;**:25;UY:25 (4 Replies)
Discussion started by: veeruasu
4 Replies

3. UNIX for Dummies Questions & Answers

Compare two flat files and update one based on the values in the other

Hi, I'm a newbie to scripting and am trying to compare two files using awk. The files are exactly the same dimensions. Where the first file has 0's I would like to create an updated version of the second file which has the corresponding elements set to zero also. eg: file1: 12345 1 2 0... (3 Replies)
Discussion started by: kasan0
3 Replies

4. Shell Programming and Scripting

Require compare command to compare 4 files

I have four files, I need to compare these files together. As such i know "sdiff and comm" commands but these commands compare 2 files together. If I use sdiff command then i have to compare each file with other which will increase the codes. Please suggest if you know some commands whcih can... (6 Replies)
Discussion started by: nehashine
6 Replies

5. Shell Programming and Scripting

awk to compare flat files and print output to another file

Hello, I am strugling from quite a some time to compare flat files with over 1 million records could anyone please help me. I want to compare two pipe delimited flat files, file1 with file2 and output the unmatched rows from file2 in file3 Sample File1: ... (9 Replies)
Discussion started by: suhaeb
9 Replies

6. Programming

compare XML/flat file with UNIX file system structure

Before i start doing something, I wanted to know whether the approach to compare XML file with UNIX file system structure. I have a pre-configured file(contains a list of paths to executables) and i need to check against the UNIX directory structure. what are the various approches should i use ? I... (6 Replies)
Discussion started by: shafi2all
6 Replies

7. Shell Programming and Scripting

Compare 2 flat files

Hi Gurus, I searched the forum but didnt get much info. I want to compare 2 files. 1)Newfile comes today with 2)Old file of previous day. The files are same ,just the new files might have new records sometimes. So I want to capture these new records in another file. Can anyone help... (5 Replies)
Discussion started by: ganesh123
5 Replies

8. Shell Programming and Scripting

How to compare two flat files and get changed data

Hi, I need to compare two flat files (yesterday & today's data) and get only the changed data from flat files. In flat file i dont have data column or anything its just a string data in flat file.Can any one please let me know the script With Regds Shashi (3 Replies)
Discussion started by: jtshashidhar
3 Replies

9. Shell Programming and Scripting

How to compare data in two flat files and update them?

Hi All, I am giving an example similar to the problem I have. I have two data files of 10 columns each in which fields are delimited by comma(,). I need to compare compare the two files using the uniq col(col3). If there are any records in file1 and are not in file2 then I have check the value... (3 Replies)
Discussion started by: rajus19
3 Replies

10. Shell Programming and Scripting

Flat Files

I have a flat file like this 0001 THER ULT HEAD & NECK VES 0002 THER ULTRASOUND OF HEART 0003 THER ULT PERIPHERAL VES 0009 OTHER THERAPEUTIC ULTSND 0010 IMPLANT CHEMOTHERA AGENT 0011 INFUS DROTRECOGIN ALFA 0012 ADM INHAL NITRIC OXIDE I need to conver this to a comma delimited flat file... (2 Replies)
Discussion started by: thumsup9
2 Replies

Featured Tech Videos