Find common Strings in two large files


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Find common Strings in two large files
# 1  
Old 12-20-2010
Find common Strings in two large files

Hi ,
I have a text file in the format

DB2: [NodeID=1]
DB2: [NodeID=2]
WB: [NodeID=3]
WB: [NodeID=3]
WB: [NodeID=3]
WB: [NodeID=3]

and a second text file of the format

Time=00:00:00.473 [NodeID=3]
Time=00:00:00.436 [NodeID=3]
Time=00:00:00.016 [NodeID=2]
Time=00:00:00.027 [NodeID=1]
Time=00:00:00.471 [NodeID=3]
Time=00:00:00.436 [NodeID=3]

the last string in both the text files is of the form NodeID=*

I want to combine lines in both the files where the last string in both the files matches ....
something like

DB2: [NodeID=1] Time=00:00:00.027 [NodeID=1]

could you please suggest...

NOTE: the actual size of the text files runs into GBs...

Thanks in advnce.
# 2  
Old 12-20-2010
Code:
 
awk 'NR==FNR{a[$2]=$0;next;}{print $0,a[$2]}' secondFile firstFile

This User Gave Thanks to anurag.singh For This Post:
# 3  
Old 12-20-2010
Anurag,
Thanks for quick response. As I am beginer in Shell need help in understanding the following

awk 'NR==FNR{a[$2]=$0;next;}{print $0,a[$2]}' secondFile firstFile

In the above code

What does {a[$2]=$0;next;} $2 and $0 stand for...?

so That I can modify your script to make it working...

Thanks
# 4  
Old 12-20-2010
Here is the explaination of above command (Go through any awk tutorial to get a basic idea of how awk works.)
awk processes input file line by line and in each line, $0 represents the whole line, $1 represents 1st field, $2 represents 2nd field and so on. Default delimiter is space/tab.
Code:
echo "abc def ghi" | awk '{print $0}'

will prints whole line
Code:
abc def ghi

Code:
echo "abc def ghi" | awk '{print $1}'

will prints 1st field
Code:
abc

Code:
echo "abc def ghi" | awk '{print $2}'

will prints 2nd field
Code:
 
def

When more than one file is given to awk,
Code:
 
NR==FNR

will be true only for 1st file. FNR is record no in current file, NR is record no processed by awk (so commulative count).
Code:
 
NR==FNR{a[$2]=$0;next;}

This block will execute only for 1st file (As NR==FNR will be true only for 1st file). Here record value is being assigned to array (indexed with 2nd field i.e. [NodeID=1/2/3]). next command will get next line in the file for processing.
Code:
{print $0,a[$2]}

This block will execute for 2nd file ONLY.
Here $0 is the whole current record value in 2nd file and $2 is 2nd field in current line (i.e. NodeID in 2nd file). a[$2] will be printed if array was set for this index (2nd field in 2nd file) while processing 1st file.

Last edited by anurag.singh; 12-20-2010 at 12:46 PM.. Reason: typo
This User Gave Thanks to anurag.singh For This Post:
# 5  
Old 12-21-2010
Bug

Anurag,
Thanks for all the support - Now scripts are able to deliver ....Thanks a TON
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find common files between two directories

I have two directories Dir 1 /home/sid/release1 Dir 2 /home/sid/release2 I want to find the common files between the two directories Dir 1 files /home/sid/release1>ls -lrt total 16 -rw-r--r-- 1 sid cool 0 Jun 19 12:53 File123 -rw-r--r-- 1 sid cool 0 Jun 19 12:53... (5 Replies)
Discussion started by: sidnow
5 Replies

2. UNIX for Dummies Questions & Answers

Find common numbers from two very large files using awk or the like

I've got two files that each contain a 16-digit number in positions 1-16. The first file has 63,120 entries all sorted numerically. The second file has 142,479 entries, also sorted numerically. I want to read through each file and output the entries that appear in both. So far I've had no... (13 Replies)
Discussion started by: Scottie1954
13 Replies

3. Shell Programming and Scripting

Find Common Values Across Two Files

Hi All, I have two files like below: File1 MYFILE_28012012_1112.txt|4 MYFILE_28012012_1113.txt|51 MYFILE_28012012_1114.txt|57 MYFILE_28012012_1115.txt|57 MYFILE_28012012_1116.txt|57 MYFILE_28012012_1117.txt|57 File2 MYFILE_28012012_1110.txt|57 MYFILE_28012012_1111.txt|57... (2 Replies)
Discussion started by: angshuman
2 Replies

4. Shell Programming and Scripting

Script to find NOT common strings in two files

Hi all, I'd like you to help or give any advise about the following: I have two (2) files, file1 and file2, both files have information common to each other. The contents of file1 is a subset of the contents of file2: file1: errormsgadmin esdp esgservices esignipa iprice ipvpn irm... (18 Replies)
Discussion started by: hnux
18 Replies

5. Shell Programming and Scripting

Script to find NOT common strings in two files

Hi all, I'd like you to help or give any advise about the following: I have two (2) files, file1 and file2, both files have information common to each other. The contents of file1 is a subset of the contents of file2: file1: errormsgadmin esdp esgservices esignipa iprice ipvpn irm... (0 Replies)
Discussion started by: hnux
0 Replies

6. Shell Programming and Scripting

Simple script to find common strings in two files

Hi , I want to write a simple script. I have two files file1: BCSpeciality Backend CB CBAPQualDisp CBCimsVFTRCK CBDSNQualDisp CBDefault CBDisney CBFaxMCGen CBMCGeneral CBMCQualDisp file2: CSpeciality Backend (8 Replies)
Discussion started by: ramky79
8 Replies

7. Shell Programming and Scripting

Find and analyze variable Strings in a large file.

Hi guys, I have multiple files (>5000) which I have in a folder. I read every file name and put it in a tmp variable "i" so that i can use it in the following task. I have a large .txt file (>50 MB) in which I want to find the variable "i" and the lines after this variable, so that I can use... (4 Replies)
Discussion started by: Ashitaka007
4 Replies

8. Shell Programming and Scripting

Drop common lines at head/tail of a large set of files

Hi! I have a large set of pairs of text files (each pair in their own subdirectory) and each pair shares head/tail (a couple of first and last lines) but differs in the middle part. I need to delete the heads/tails and keep only the middle portions in which they differ. The lengths of heads/tails... (1 Reply)
Discussion started by: dobryden
1 Replies

9. UNIX for Dummies Questions & Answers

how to find common words and take them out from two files

Hi, everyone, Let's say, we have xxx.txt A 1 2 3 4 5 C 1 2 3 4 5 E 1 2 3 4 5 yyy.txt A 1 2 3 4 5 B 1 2 3 4 5 C 1 2 3 4 5 D 1 2 3 4 5 E 1 2 3 4 5 First I match the first column I find intersection (A,C, E), then I want to take those lines with ACE out from yyy.txt, like A 1... (11 Replies)
Discussion started by: kaixinsjtu
11 Replies

10. Shell Programming and Scripting

Files common in two sets ??? How to find ??

Suppose we have 2 set of files set 1 set 2 ------ ------ abc hgb def ppp mgh vvv nmk sdf hgb ... (1 Reply)
Discussion started by: skyineyes
1 Replies
Login or Register to Ask a Question