Merging Very large CSV files in Unix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merging Very large CSV files in Unix
# 1  
Old 04-13-2012
Merging Very large CSV files in Unix

Hi,

I have two very large CSV files, which I want to merge (equi-join) based on a key (column).

One of the file (say F1) would have ~30 MM records and 700 columns. The other file (~f2) would have same # of records and lesser columns (say 50). I want to create an output file joining on a common column (in F1 and F2).

Something like:

F1=>

Code:
Key V1 .. V600
1111 .................
2222 .................
3333 .................

F2 =>

Code:
Key L1 .. L50
2222 .................
1111 .................
3333 .................

The merged file would be:

Code:
Key V1 .. V600 L1 .. L50
1111 .................
2222 .................
3333 .................

Please note that the files are not sorted.

Any insights would be appreciated.

Thank you!
-V

Moderator's Comments:
Mod Comment Welcome to the UNIX and Linux Forums. Please use code tags. Video tutorial on how to use them

Last edited by Scrutinizer; 04-13-2012 at 03:20 AM..
# 2  
Old 04-13-2012
Could this help you ?
Code:
 
awk -F"," 'NR==FNR{temp=$1;$1="";a[temp]=$0;next}
a[$1]{print $0,a[$1]}' file2.csv file1.csv

This User Gave Thanks to pravin27 For This Post:
# 3  
Old 04-13-2012
Try join.

EDIT: Although you would need to sort the files...

Last edited by CarloM; 04-13-2012 at 06:11 AM..
# 4  
Old 04-13-2012
Hi Pravin27,

Thank you. I will try the same and circle back with feedback.

Hi CarloM,

I do not want to sort considering the large amount of data.

Thanks again to both of you.

-V
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merging two files in UNIX

Hi Experts, Need urgent solution for a problem. I have two files file1 and file2. file1 is tab separated and file2 is comma separated. I need to merge both the files into single file based on CUST_ID by retaining the headers of file1 Matching CUST_IDs should be placed one below the other in... (11 Replies)
Discussion started by: bharathbangalor
11 Replies

2. Shell Programming and Scripting

Comparing two large unsorted csv files

Hi All, My requirement is to write a shell script to compare two large csv files. I've created sample files for explaining my problem i.e., a.csv and b.csv contents of files: ----------------- a.csv ------ Type,Memory (Kb),Location HD,Size (Mb),Serial # XT,640,D402,0,MG0010... (2 Replies)
Discussion started by: vasavi
2 Replies

3. Shell Programming and Scripting

Help with merging two CSV files

Hi, I have following 2 CSV files file1.txt A1,B1,C1,D1,E1 A2,B2,C2,D2,E2 A3,B3,C3,D3,E3 .... file2.txt A1,B1,P1,Q1,R1,S1,T1,U1 A1,B1,P2,Q2,R2,S2,T2,U2 A1,B1,P3,Q3,R3,S3,T3,U3 A2,B2,X1,Y1,Z1,I1,J1,K1 A2,B2,X2,Y2,Z2,I2,J2,K2 A2,B2,X3,Y3,Z3,I3,J3,K3 A2,B2,X4,Y4,Z4,I4,J4,K4... (2 Replies)
Discussion started by: learnoutmore99
2 Replies

4. SCO

Need advice: Copying large CSV report files off SCO system

I have a SCO Unix server from 1999 running SCO 5.0.5 and some ancient accounting software called Real World A report writer program on the system is used to generate CSV files from accounting that we write with DOSCOPY commands to 3.5" floppies In the next 60 days we will be decommissioning... (11 Replies)
Discussion started by: magnetman
11 Replies

5. Shell Programming and Scripting

Merging all (48) CSV files from a directory

I have 48 csv files in my directory that all have this form: Storm Speed (mph),43.0410781151 Storm motion direction (degrees),261.580774982 MLCAPE,2450.54098661 MLCIN,-9.85040520279 MLLCL,230 MLLFC,1070.39871 MLEL,207.194689294 MLCT,Not enough data Sbcape,2203.97617778... (3 Replies)
Discussion started by: RissaR
3 Replies

6. Shell Programming and Scripting

Matching lines across multiple csv files and merging a particular field

I have about 20 CSV's that all look like this: "","","","","","","","","","","","","","","",""What I've been told I need to produce is the exact same thing, but with each file now containing the start_code from every other file where the email matches. It doesn't matter if any of the other... (1 Reply)
Discussion started by: Demosthenes
1 Replies

7. UNIX for Dummies Questions & Answers

Merging two CSV files by 3 primary keys (columns)

Hi there! I have the following problem: I have a set of files called rates_op_yyyyddmm with the format below (which corresponds to the file rates_op_20090130) 30-JAN-2009,ED,FEB09,C,96.375,,,0,,,,,,2.375,,,,,, 30-JAN-2009,ED,FEB09,C,96.5,,,0,,,,,,2.25,,,,,,... (2 Replies)
Discussion started by: Pep Puigvert
2 Replies

8. Shell Programming and Scripting

Merging files to create CSV file

Hi, I have different files of the same type, as: Time: 100 snr: 88 perf: 10 other: 222 Each of these files are created periodically. What I need to do is to merge all of them into one but having the following form: (2 Replies)
Discussion started by: Ravendark
2 Replies

9. Shell Programming and Scripting

Sed or awk script to remove text / or perform calculations from large CSV files

I have a large CSV files (e.g. 2 million records) and am hoping to do one of two things. I have been trying to use awk and sed but am a newbie and can't figure out how to get it to work. Any help you could offer would be greatly appreciated - I'm stuck trying to remove the colon and wildcards in... (6 Replies)
Discussion started by: metronomadic
6 Replies

10. UNIX for Dummies Questions & Answers

Merging 2 .CSV files in Unix

I need a little help as I am a complete novice at scripting in unix. However, i am posed with an issue...:eek: i have two csv files in the following format@ FILE1.CSV: HEADER HEADER Header , , HEADER 001X ,,200 002X ,,300 003X ... (6 Replies)
Discussion started by: chachabronson
6 Replies
Login or Register to Ask a Question