Sponsored Content
Top Forums Shell Programming and Scripting awk to compare flat files and print output to another file Post 302432539 by methyl on Friday 25th of June 2010 10:05:43 AM
Old 06-25-2010
Sorry to be a wet blanket but neither the grep nor the uniq approach will fulfill the requirement, even if the data was in sorted order (which it isn't).

1) Do both files have exactly the same number of records and are you just looking for records which have changed? Does the order of the output into file3 matter?
2) If there can be more or less records in file2 than file1, does the order of the output into file3 matter?
Are you also interested in records which exist in file1 but do not exist in file2?
3) What percentage of differences do you expect? (This is really a performance question because some approaches would involve multiple lookups).
4) If this proves too difficult for shell programming, do you have a mainstream database engine?

---------- Post updated at 15:05 ---------- Previous update was at 14:20 ----------

One shell approach if the order of the output does not matter.
Tried with two approx 5 million record files of 500 Mb each. Took about 5 mins to run and the output only shows the mismatched records from file2. Actual performance will depend on how fast you computer is and how much memory you can give to sort.

Code:
#!/bin/ksh
cat file1 | sort > sortfile1
cat file2 | sort > sortfile2
comm -13 sortfile1 sortfile2

When sorting large files be sure to set $TMPDIR to somewhere with enough space for at least twice the size of the file being sorted.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to compare lines of two files and print output on screen

hey guys, I have two files both with two columns, I have already created an awk code to ignore certain lines (e.g lines that start with 963) as they wou ld begin with a certain string, however, the rest I have added together and calculated the average. At the moment the code also displays... (3 Replies)
Discussion started by: chlfc
3 Replies

2. Shell Programming and Scripting

compare columns from seven files and print the output

Hi guys, I need some help to come out with a solution . I have seven such files but I am showing only three for convenience. filea a5 20 a8 16 fileb a3 42 a7 14 filec a5 23 a3 07 The output file shoud contain the data in table form showing first field of... (7 Replies)
Discussion started by: smriti_shridhar
7 Replies

3. Shell Programming and Scripting

compare two files and search keyword and print output

You have two files to compare by searching keyword from one file into another file File A 23 >pp_ANSWER 24 >aa hello 25 >jau head wear 66 >jss oops 872 >aqq olps ploww oww sss 722 >GG_KILLER ..... large files File B Beta done KILLER John Mayor calix meyers ... (5 Replies)
Discussion started by: cdfd123
5 Replies

4. UNIX for Advanced & Expert Users

Shell Script to compare xml files and print output to a file

All, PLease can you help me with a shell script which can compare two xml files and print the difference to a output file. I have attached one such file for you reference. <Group> <Member ID=":Year_Quad:41501" childCount="4" fullPath="PEPSICO Year-Quad-Wk : FOLDER.52 Weeks Ending Dec... (2 Replies)
Discussion started by: kanthrajgowda
2 Replies

5. Shell Programming and Scripting

awk compare specific columns from 2 files, print new file

Hello. I have two files. FILE1 was extracted from FILE2 and modified thanks to help from this post. Now I need to replace the extracted, modified lines into the original file (FILE2) to produce the FILE3. FILE1 1466 55.27433 14.72050 -2.52E+03 3.00E-01 1.05E+04 2.57E+04 1467 55.27433... (1 Reply)
Discussion started by: jm4smtddd
1 Replies

6. Shell Programming and Scripting

Compare two files and print using awk

I have 2 files: email_1.out 1 abc@yahoo.com 2 abc_1@yahoo.com 3 abc_2@yahoo.com data_1.out <tr> 1 MAIL # 1 TO src_1 </tr> <tr><td class="hcol">col_id</td> <td class="hcol">test_dt</td> <td class="hcol">user_type</td> <td class="hcol">ct</td></tr> <tr><td... (1 Reply)
Discussion started by: sol_nov
1 Replies

7. Shell Programming and Scripting

Compare to flat files using awk

compare to flat files using awk .but in 4th field contains non ordered substring. how to do that. file1.txt john|0.0|4|**:25;JP:50;UY:25 file2.txt andy|0.0|4|JP:50;**:25;UY:25 (4 Replies)
Discussion started by: veeruasu
4 Replies

8. Shell Programming and Scripting

Compare columns of multiple files and print those unique string from File1 in an output file.

Hi, I have multiple files that each contain one column of strings: File1: 123abc 456def 789ghi File2: 123abc 456def 891jkl File3: 234mno 123abc 456def In total I have 25 of these type of file. (5 Replies)
Discussion started by: owwow14
5 Replies

9. Shell Programming and Scripting

[Solved] awk compare two different columns of two files and print all from both file

Hi, I want to compare two columns from file1 with another two column of file2 and print matched and unmatched column like this File1 1 rs1 abc 3 rs4 xyz 1 rs3 stu File2 1 kkk rs1 AA 10 1 aaa rs2 DD 20 1 ccc ... (2 Replies)
Discussion started by: justinjj
2 Replies

10. UNIX for Beginners Questions & Answers

Compare two files and print output

Hi All, i am trying to compare two files in Centos 6. F1: /tmp/d21 NAME="xvda" TYPE="disk" SIZE="40G" OWNER="root" GROUP="disk" MODE="brw-rw----" MOUNTPOINT="" NAME="xvda1" TYPE="part" SIZE="500M" OWNER="root" GROUP="disk" MODE="brw-rw----" MOUNTPOINT="/boot" NAME="xvda2" TYPE="part"... (2 Replies)
Discussion started by: balu1234
2 Replies
SC_WARTS2PCAP(1)					    BSD General Commands Manual 					  SC_WARTS2PCAP(1)

NAME
sc_warts2pcap -- write packets included in warts object to a pcap file. SYNOPSIS
sc_warts2pcap [-o outfile] [-s sort] [file ...] DESCRIPTION
The sc_warts2pcap utility provides the ability to extract packets embedded in the tbit, sting, and sniff warts objects and write them to a pcap file, which can be read by tcpdump and wireshark. The options are as follows: -o outfile specifies the name of the output file. If no output file is specified, it will be written to the standard output, provided that it is not a tty. -o sort specifies how the pcap records (packets) are sorted before being written out. By default, no sorting is applied; the packets are grouped as they are in the warts file. If packet sorting is specified, the packets are written out in timestamp order. Note that this operation requires the packets to be read into memory to be sorted, so it will require a corresponding amount of memory to com- plete. EXAMPLES
The command: sc_warts2pcap -o output.pcap file1.warts file2.warts will read the packet objects from file1.warts, and then file2.warts, and write them to output.pcap. The command: gzcat file1.warts.gz | sc_warts2pcap -s packet >file1.pcap will read the contents of the uncompressed warts file supplied on stdin, sort the packets by their timestamp, and then write the output to file1.pcap. SEE ALSO
scamper(1), tcpdump(1) AUTHORS
sc_warts2pcap is written by Stephen Eichler and Matthew Luckie. BSD
October 15, 2010 BSD
All times are GMT -4. The time now is 02:41 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy