Combine common line from 2 Huge files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Combine common line from 2 Huge files
# 1  
Old 05-13-2010
Combine common line from 2 Huge files

Hi,

I am having 2 huge files having line count more than 10million. The files look like:

File 1

Code:
45905099 2059
942961505 3007
8450875165 7007
615565331 3015
9415586035 9012
9871573 5367
4415655 4011
44415539519 5361
3250659295 4001
5950718618 9367

File 2

Code:
44415539519      TQ03      99.86 12-MAY-10 09.36.45.453366 AM
5950718618      ZT04         53 01-MAY-10 02.42.55.600218 PM
94121628      TH04      98.73 11-MAY-10 08.57.42.617615 PM
941488      TZ03      49.86 10-APR-10 07.46.27.920278 PM
4415655      TR03      49.86 10-MAY-10 11.47.39.701701 AM
84224643      TR03      49.86 10-MAY-10 09.58.07.313377 AM
8860320024      TR03      48.86 12-MAY-10 10.00.59.901523 AM
6614414138      TR03      44.86 06-MAY-10 06.59.46.958793 PM
9442381886      TR03      44.86 03-MAY-10 05.01.44.008156 PM
999631410      TR03      45.86 04-APR-10 07.40.31.117461 PM

I need to create an output file containing common(1st column) entries from both files. The sample is :

Output:
Code:
44415539519 5361      TQ03      99.86 12-MAY-10 09.36.45.453366 AM
5950718618 9367      ZT04         53 01-MAY-10 02.42.55.600218 PM
4415655 4011      TR03      49.86 10-MAY-10 11.47.39.701701 AM

Please suggest some solution other than join as join utility consumes alot of time and files need to be sorted before applying join.

Thanks & Regards

Last edited by vgersh99; 05-13-2010 at 06:04 AM.. Reason: code tags, please!
# 2  
Old 05-13-2010
Code:
awk 'NR==FNR{for(i=1;++i<=NF;) _[$1]=_[$1] FS $i;next}$1 in _{print $0,_[$1]}' file2 file1

please use code tags next time for better reading!!!
# 3  
Old 05-13-2010
mate, I doubt you can beat the speed of join, especially if you want
to do it with unsorted files.
Programmatically that would be ridiculously expensive.

sorted would be N1 x N2 searches
unsorted something like:
N1 x N2!

ridiculous.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Comparing two files and list the difference with common first line content of both files

I have two file as given below which shows the ACL permissions of each file. I need to compare the source file with target file and list down the difference as specified below in required output. Can someone help me on this ? Source File ************* # file: /local/test_1 # owner: own #... (4 Replies)
Discussion started by: sarathy_a35
4 Replies

2. Shell Programming and Scripting

Ignore common line between 2 files in perl

I want to ignore the same line which appear in File1 and File2 and then print the final result back in file1 File1 ABC 123 XYZ File2 XYX Output ABC 123 I have to run this command on multiple servers over ssh. Below is my code that worked only on same server and not over ssh. ... (5 Replies)
Discussion started by: crypto87
5 Replies

3. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

4. UNIX for Dummies Questions & Answers

Want to change common line from multiple files

Hi everyone, I've a requirement to modify an existing line which is common to multiple files. I need to replace that existing line with a new line. I've almost 900 ksh files to edit in the similar fashion in the same directory. Example: Existing Line: . $HOME/.eff.env (notice the "." at the... (3 Replies)
Discussion started by: kaleem.adil
3 Replies

5. Shell Programming and Scripting

How to fix line breaks format text for huge files?

Hi, I need to correct line breaks for huge files (more than 1MM records in a file) and then format it properly. Except the header and trailer, each record starts with 'D'. Requirement:Scan the whole file except the header and trailer records and see if any of the records start with... (19 Replies)
Discussion started by: kikionline
19 Replies

6. UNIX for Dummies Questions & Answers

Combine multiple files with common string into one new file.

I need to compile a large amount of data with a common string from individual text files throughout many directories. An example data file is below. I want to search for the following string, "cc_sectors_1" and combine all the data from each file which contains this string, into one new... (2 Replies)
Discussion started by: GradStudent2010
2 Replies

7. Shell Programming and Scripting

Two Huge Texts and Combine Result to Third

hi, i want to examine two file and write some codes to a third file. note that seperators are TAB, not space. first file: 192.168.1.1 3 192.168.1.2 2 192.168.3.2 2 192.168.7.3 1 ... second file: 192.168.1.1 1 10.15.1.1 3 30 10.15.2.1 2 40 192.168.1.1 2 10.23.4.5... (3 Replies)
Discussion started by: gc_sw
3 Replies

8. Shell Programming and Scripting

Replacing second line from huge files

I'm trying simple functionality of replacing the second line of files with some other string. Problem is these files are huge and there are too many files to process. Could anyone please suggest me a way to replace the second line of all files with another text in a fastest possible manner. ... (2 Replies)
Discussion started by: satish.pyboyina
2 Replies

9. UNIX for Dummies Questions & Answers

combine files line by line

Hi all, I once knew of a simple unix command to do this, but I can't remember it and I can't find it by searching. I have two files. ### FILE A #### A1 A2 A3 A4 A5 ### FILE B #### B1 B2 B3 B4 B5 (2 Replies)
Discussion started by: Digby
2 Replies

10. UNIX for Advanced & Expert Users

Insert a line as the first line into a very huge file

Hello, I need to insert a line (like a header) as the first line of a very huge file (about 3 ml rows). I am able to do it with sed, but redirecting the output and creating a new file takes quite some time. I was wondering if there was a more efficient way of doing it? Any help would be... (3 Replies)
Discussion started by: shriek
3 Replies
Login or Register to Ask a Question