Compare and merge two big CSV files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare and merge two big CSV files
# 1  
Old 08-19-2018
Compare and merge two big CSV files

Hi all,

i need help.
I have two csv files with a huge amount of data.
I need the first column of the first file, to be compared with the data of the second, to have at the end a file with the data not present in the second file.
Example

File1: (only one column)
Code:
profile_id
57036226
57036230
57109826
57241366
57241374

File2: (has 7 columns; the data in File1 are present in the second, third, fourth and fifth columns)
Code:
Id                    FederationIdentifier    Profile_ID__c    Contact.Profile_ID__c    Contact.Account.Profile_ID__c    IsActive    LastModifiedDate
005b000000150mxAAA    57036226                57036226        57036226                57036226                        true        2018-05-24
0050X000007K4p7QAC    56564502                56564502        56564502                56564502                        true        2018-04-12
0050X000007JxYOQA0    56994082                56994082        56994082                56994082                        true        2018-04-10
0050X000007K3UrQAK    57241366                57241366        57241366                57241366                        true        2018-04-11
0050X000007K3UcQAK    57109826                57109826        57109826                57109826                        true        2018-04-11
0050X000007K4d1QAC    58699731                58699731        58699731                58699731                        true        2018-04-12

Output file:
Code:
profile_id
57036230
57241374

Output file is a file with the data of File1 not present in File2.


I've tried with join but not works...he put in the output file all the data of the second file without any kind of compare:
Code:
join -t, -v 2 <(sort file1) <(sort file2) > file 3


Last edited by vbe; 08-19-2018 at 04:55 AM..
# 2  
Old 08-19-2018
Welcome to the forum.


How far would
Code:
awk 'NR == FNR && NR > 1 {T[$1]; next} {for (i=2; i<=5; i++) if ($i in T) delete T[$i]} END {for (t in T) print t}' file[12]
57241374
57036230

# 3  
Old 08-19-2018
Hi RudiC,


many thanks.
the code not works.

I launched the command. I took one of the data from the output file and verified that it was not in file 2 but:
Code:
grep 63995855 Salesforce_Active_OK.csv 
 005b0000004D91aAAC;63995855;63995855;63995855;63995855;true;2018-06-20T10:42:13.000Z

Seems that it works for the format of the output file but the data of file 1 were not compared with file 2


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 08-19-2018 at 06:00 AM.. Reason: Added CODE tags.
# 4  
Old 08-19-2018
The code works. It was tested with your samples posted. Did you try with those?



If you chose to change the datas' field separators, you need to adapt awk's FS variable as well. Add the -F\; option.
# 5  
Old 08-19-2018
Ok, I've tried with:
Code:
awk -F\; 'NR == FNR && NR > 1 {T[$1]; next} {for (i=2; i<=5; i++) if ($i in T) delete T[$i]} END {for (t in T) print t}' Magento_2.csv Salesforce_Active_OK.csv > Output.csv

And I have an output file with data that are present in the file2 (Salesforce_Active_OK.csv).

This is the format of the file2 (yes, it has ";" ):
Code:
Id;FederationIdentifier;Profile_ID__c;Contact.Profile_ID__c;Contact.Account.Profile_ID__c;IsActive;LastModifiedDate
005b000000150mxAAA;2630;2630;2630;2630;true;2018-05-24T09:58:01.000Z
0050X000007K4p7QAC;56564502;56564502;56564502;56564502;true;2018-04-12T15:23:04.000Z

Have I made some mistake?
# 6  
Old 08-19-2018
What be the file1 data for your above file2 test run? Are fields 2 - 5 always identical?
# 7  
Old 08-19-2018
I attach an example of the two files (only some rows...files are too bigSmilie)


in the second file, fields from 2 to 5 are with the same data.

PS: the real file1 is more big of file 2...is that a problem?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merge the three csv files as one according to first coloumn.

I have three files with similar pattern i need to merge all the coloumns side by side from all three files according to the first coloumn example as shown below I mentioned 5 coloumns only in example but i have around 15 coloumns in each file. file1: Name,Samples,Error,95RT,90RT... (4 Replies)
Discussion started by: Raghuram717
4 Replies

2. UNIX for Beginners Questions & Answers

Compare two big files for differences using Linux

Hello everybody Looking for help in comparing two files in Linux(files are big 800MB each). Example:- File1 has below data $ cat file1 5,6,3 2.1.4 1,1,1 8,9,1 File2 has below data $ cat file2 5,6,3 8,9,8 1,2,1 2,1,4 (8 Replies)
Discussion started by: shanul karim
8 Replies

3. Shell Programming and Scripting

Compare two big files for differences using Linux

Hello everybody Looking for help in comparing two files in Linux(files are big 800MB each). Example:- File1 has below data $ cat file1 5,6,3 2.1.4 1,1,1 8,9,1 File2 has below data $ cat file2 5,6,3 8,9,8 1,2,1 2,1,4 (1 Reply)
Discussion started by: shanul karim
1 Replies

4. UNIX for Dummies Questions & Answers

Merge two csv files using column name

Hi all, I have two separate csv files(comma delimited) file 1 and file 2. File 1 contains PAN,NAME,Salary AAAAA5467D,Raj,50000 AAFAC5467D,Ram,60000 BDCFA5677D,Kumar,90000 File 2 contains PAN,NAME,Dept,Salary ASDFG6756T,Karthik,ABC,450000 QWERT8765Y,JAX,CDR,780000... (5 Replies)
Discussion started by: Nivas
5 Replies

5. Shell Programming and Scripting

Merge CSV files

I have lot of csv file collected from script like below : Name of files (some examples) there are thousands of it: 192.168.0.123_251_18796_1433144473.csv 192.168.0.123_251_18796_1433144772.csv 192.168.0.123_251_18796_1433145073.csv 192.168.0.123_251_18796_1433145372.csvContent of each... (5 Replies)
Discussion started by: rk4k
5 Replies

6. Shell Programming and Scripting

Compare two files and merge into third

Hello: Newbie with Awk. Trying to compare two files and merge data based on CID. Please see the input file format and desired output. Any help is appreciated. TIA Input File1 CID1 --- TYP1 --- DCN1 --- INDATE1 --- IN-DATA1 CID2 --- TYP2 --- DCN2 --- INDATE2 --- IN-DATA2 CID3 ---... (6 Replies)
Discussion started by: wincrazy
6 Replies

7. Shell Programming and Scripting

Compare 2 csv files in ksh and o/p the difference in a new csv file

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below: file1.csv ID,version,cost 1000,1,30 2000,2,40 3000,3,50 4000,4,60 file2.csv ID,version,cost 1000,1,30 2000,2,45 3000,4,55 6000,5,70 The... (7 Replies)
Discussion started by: Naresh101
7 Replies

8. Shell Programming and Scripting

Compare and Merge files

Hi All, I have two different files as shown below separated by a "|". I need to compare the first column from both the files and if they match merge both the columns. File 1 "S00172012"|"CHRONIC RENAL FAILURE"|""|"I" "S00159962"|"SUBENDO INFRC-INIT EPISD"|""|"I" "S00255303"|"BENIGN... (6 Replies)
Discussion started by: nua7
6 Replies

9. UNIX for Advanced & Expert Users

best method to compare 2 big files in unix

Hi , I have a requirement to compare 2 files which can contain 40 million or more records and more than 20 fields to compare . Currently I am using awk scripting , and since awk has a memory issue, I am not able to process file more than 10 million records. Any suggestions or pointers to... (7 Replies)
Discussion started by: rashmisb
7 Replies

10. Shell Programming and Scripting

Need to compare two csv files values and write into another csv file

Hi all, Am new to scripting. So i just need your ideas to help me out. Here goes my requirement. I have two csv files 1.csv 2.csv abc,1.24 abc,1 def,2.13 def,1 I need to compare the first column of 1.csv with 2.csv and if matches then need to compare... (2 Replies)
Discussion started by: chinnahyd
2 Replies
Login or Register to Ask a Question