CSV joining and checking multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting CSV joining and checking multiple files
# 8  
Old 08-23-2016
How about
Code:
awk  ' 
BEGIN           {FS = OFS = ";"
                 print "ean;sku;pps_reference;mpn;stock;price;manufacturer"
                }
FNR  == 1       {FCNT++
                }
FCNT == 1       {for (n = split($1, T, ","); n > 0; n--)         MFC[T[n]] = $0
                }

                {gsub (/ /, "", $1)
                }
FCNT == 2       {for (n = split($1, T, ","); n > 0; n--)        {SKU[T[n]] = $2
                                                                 SUP[T[n]] = $3
                                                                }
                 next
                }
                {for (n = split($1, T, ","); n > 0; n--) if (T[n] in SKU)       {$2 = SKU[T[n]] OFS $2
                                                                                 if (MFC[$NF] ~ SUP[T[n]]) print
                                                                                 else                      print $0, SUP[T[n]] > ERRFILE
                                                                                 next
                                                                                }
                }
' ERRFILE="wrong_match.csv" manufacturer_compare.csv website_clean.csv supplier_clean.csv
ean;sku;pps_reference;mpn;stock;price;manufacturer
4260010852693;P00002801;1043154;84;743.42;70100118555;Fujifilm
4960999575285;P00002800;273189;9400;141.80;2768B016;Canon
0088698115763,3141725001174;P00002631;3654125;20;54.80;C1825A;Hewlett Packard

cat wrong_match.csv 
4000461034213,4960999575292;P00002799;27433196;44;44.94;2768B017;Canon;Dörr

This User Gave Thanks to RudiC For This Post:
# 9  
Old 08-23-2016
Thank you for the code. I will test it asap and over a couple days to see what happens.

1 thing i did notice though that i am unsure off. From what i can see in the code it will check every manufacturer against the compare file right? This is not really needed since if the name is the same it does not need to check anymore but i assume comparing them all is easier with the coding?

---------- Post updated at 04:12 PM ---------- Previous update was at 04:00 PM ----------

I just tested it and it seems there is something strange going on.

The "correct" file has the following line:
Code:
ean;pps_reference;stock;price;sku;mpn;manufacturer
0885370995022;132477;20;2220.45;P00030979;KF7-00030;Microsoft
0885370942705;132484;70;2277.56;P00030395;KF6-00037;Microsoft
0889842018363;146442;4;2724.97;P00031049;KF7-00081;Microsoft
0889842050868;191907;80;2520.21;P00069389;XBOX BUNDLE;Microsoft
0885370808339,0885370813340,0885370903331;813533;10;2205.69;P00006988;5C5-00013;Microsoft
0885370604085,0885370862201;127962;100;322.79;P00030393;S4V-00010;Microsoft
0885370862492;127976;0;328.37;P00032536;9J7-00004;Microsoft
0885370672213;128004;0;221.26;P00007564;6VC-00001;Microsoft

All correct lines only contain Microsoft as manufacturer. The faulty file contains the following items (the rest)

Code:
4260010852372;P00016984;849562;70100114776;40;516.34;Fujifilm ;DIF;Fujifilm
4548736014138;P00016992;100214;DSCWX500B.CE3;0;3102.70;Sony ;DIF;Sony
4260041684713;P00016986;101432;10051;20;417.20;Easypix ;DIF;Easypix
5025232832996;P00017004;116790;DMC-FT5EG9-Z;3;2100.08;Panasonic ;DIF;Panasonic
4548736017092;P00008802;118309;DSCRX100M4.CE3;10;9108.11;Sony ;DIF;Sony
4960371004433;P00073960;624461;272ES;7;3020.22;Tamron ;DIF;Tamron
4013904001338,4961607634356;P00017337;625380;T5111611;30;4100.72;Tokina ;DIF;Tokina

Do you have any idea what could be wrong?

Just to add some extra info. The column with DIF in there is just a supplier code which gets added just before the joining process.

Last edited by SDohmen; 08-23-2016 at 11:20 AM..
# 10  
Old 08-23-2016
My crystal bowl is somewhat hazy today - you could help by sending some polish. Or - surprise! - by posting all three input files.

And yes - I forwent the test on supplier equality for simplicity of code at the expense of some CPU cycles. Should your manufacturer file become tremdously large, you may add the test into the script.
# 11  
Old 08-23-2016
Quote:
Originally Posted by RudiC
My crystal bowl is somewhat hazy today - you could help by sending some polish. Or - surprise! - by posting all three input files.

And yes - I forwent the test on supplier equality for simplicity of code at the expense of some CPU cycles. Should your manufacturer file become tremdously large, you may add the test into the script.
Sorry, in all haste i forgot to post the files itself. I will see if i can get some full files although i have to edit some columns. Will do this tomorrow when i am back at work.
# 12  
Old 08-24-2016
Hello,

I attached the 3 main files: website_clean.csv, supplier_clean.csv and manufacturer_compare.csv.

With the original script only the first 2 are needed and with the new one also the compare one of course.

Since these are full versions i assume this is sufficient for the script testing. The supplier one is also the one that should provide some mismatches with manufacturer when matching for the testing itself.
# 13  
Old 08-24-2016
How do you expect a script to execute reasonably on a data set if the structure of that data is NOT what you posted and on which the script was built?

The script evaluates $NF for the supplier field in supplier.csv, but in ALL lines finds "JAC" (not "DIF" as you claimed), for which no manufaturer exists, so it writes the record to the wrong_match file. Except for those "Microsoft" (and a few "Nokia") entries, that don't have a manufacturer in $3 in website.csv, so empty matches empty and thus prints to the result file.

PLEASE get your act together, correct the input data, and run the script again. Use $6 in lieu of $NF if the "JAC" field can't be eliminated, hoping $6 will always be the supplier field. Also, look for orthography in the data, like additional spaces or upper/lower case.

Last edited by RudiC; 08-24-2016 at 09:00 AM..
# 14  
Old 08-24-2016
Quote:
Originally Posted by RudiC
How do you expect a script to execute reasonably on a data set if the structure of that data is NOT what you posted and on which the script was built?

The script evaluates $NF for the supplier field in supplier.csv, but finds "JAC" (not "DIF" as you claimed) in ALL lines, for which no manufaturer exists, so it writes the record to the wrong_match file. Except for those "Microsoft" (and a few "Nokia") entries, that don't have a manufacturer in $3 in website.csv, so empty matches empty and thus prints to the result file.

PLEASE get your act together, correct the input data, and run the script again. Use $6 in lieu of $NF if the "JAC" can't be eliminated, hoping $6 will always be the supplier field. Also, look for orthography in the data, like additional spaces or upper/lower case.
I am sorry for overlooking that error. The JAC/DIF code is just a supplier code which get added. I overlooked this when copying over the code.

I tested the change and it works fine but i am wondering about 1 thing though. You mentioned the part about the upper/lower cases. Does the script not skip checking on this or is that on purpose that it fails them in the list?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Export Oracle multiple tables to multiple csv files using UNIX shell scripting

Hello All, just wanted to export multiple tables from oracle sql using unix shell script to csv file and the below code is exporting only the first table. Can you please suggest why? or any better idea? export FILE="/abc/autom/file/geo_JOB.csv" Export= `sqlplus -s dev01/password@dEV3... (16 Replies)
Discussion started by: Hope
16 Replies

2. UNIX for Dummies Questions & Answers

Joining different columns from multiple files

Hello again, I am trying to join 3rd column of 3 files into the end on one file and save it separately... my data looks like this file 1 Bob, Green, 80 Mark, Brown, 70 Tina, Smith, 60 file 2 Bob, Green, 70 Mark, Brown, 60 Tina, Smith, 50 file 3 Bob, Green, 50 Mark, Brown,60 Tina,... (6 Replies)
Discussion started by: A-V
6 Replies

3. Shell Programming and Scripting

Other alternative for joining together columns from multiple files

Hi again, I have monthly one-column files of roughly around 10 years. Is there a more efficient way to concatenate these files column-wise other than using paste command? For instance: file1.txt 12 13 15 12 file2.txt 14 15 18 19 file3.txt 20 21 (8 Replies)
Discussion started by: ida1215
8 Replies

4. Shell Programming and Scripting

checking csv files with empty fields..!

Hi! I need to learn that how a shell script can transverse a csv file n check if any field is empty or not. means its contains two comma or space b/w commas i.e., "" or " ". can anyone help me out how I can do that.... (10 Replies)
Discussion started by: sukhdip
10 Replies

5. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Hi, I have nine files looking similar to file1 & file2 below. File1: 1 ABCA1 1 ABCC8 1 ABR:N 1 ACACB 1 ACAP2 1 ACOT1 1 ACSBG 1 ACTR1 1 ACTRT 1 ADAMT 1 AEN:N 1 AKAP1File2: 1 A4GAL 1 ACTBL 1 ACTL7 (4 Replies)
Discussion started by: seqbiologist
4 Replies

6. Shell Programming and Scripting

Checking the existance of multiple files

I am trying to execute the following command to check the existance of a file (which has a date timestamp on it). If there are more than one file, then also it should give me 'success' result. if then <do some work> else <no files> fi Since there are more than one... (18 Replies)
Discussion started by: vivek_damodaran
18 Replies

7. UNIX for Dummies Questions & Answers

Joining string on multiple files

Hi guys, I am a forum (and a bit of a unix) newbie, and I currently have a tricky problem lying ahead of me. I have multiple files, and I am looking to join the files on the first column. Example: File 1 andy b 100 amy c 200 amy d 300 File 2 andy c 200 amy c 100 clyde o 50 ... (3 Replies)
Discussion started by: jdr0317
3 Replies

8. Shell Programming and Scripting

joining multiple files into one while putting the filename in the file

Hello, I know how to join multiple files using the cat function. I want to do something a little more advanced. Basically I want to put the filename in the first column... One thing to note is that the file is tab delimited. e.g. file1.txt joe 1 4 5 6 7 3 manny 2 3 4 5 6 7 ... (4 Replies)
Discussion started by: phil_heath
4 Replies

9. UNIX for Dummies Questions & Answers

Joining files based on multiple keys

I need a script (perl or awk..anything is fine) to join 3 files based on three key columns. The no of non-key columns can vary in each file. The columns are delimited by semicolon. For example, File1 Dim1;Dim2;Dim3;Fact1;Fact2;Fact3;Fact4;Fact5 ---- data delimited by semicolon --- ... (1 Reply)
Discussion started by: Sebben
1 Replies

10. UNIX for Advanced & Expert Users

Joining 2 CSV files together

I need a little help as I am a complete novice at scripting in unix. However, i am posed with an issue... i have two csv files in the following format@ FILE1.CSV: HEADER HEADER Header , , HEADER 001X ,,200 002X ,,300 003X ,,300 004X ,,300 FILE2.CSV: HEADER HEADER Header , ,... (3 Replies)
Discussion started by: chachabronson
3 Replies
Login or Register to Ask a Question