CSV joining and checking multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting CSV joining and checking multiple files
# 15  
Old 09-13-2016
*bump*.

After last post i decided to add some extra cleanup lines in the form of all lower cases etc. However i seemed to find some lines that dont like to work with the comparing.

In the output file i have the following lines:
Code:
4,25E+12   P00043585   242457   8,4E+10   90   19.40   ekl   JAC   alpenfohn
4,25E+12   P00052963   1577936   8,4E+10   18   12.50   ekl   JAC   alpenfohn
4,25E+12   P00052964   2001285   8,4E+10   18   12.00   ekl   JAC   alpenfohn

Dont look at the first and third column as i copied these over from excel so they look messed up. In the last field it says alpenfohn as manufacturer and according to the supplier it is ekl.

In my comparing file i have this line:
Code:
ekl,alpenfohn,alpenföhn

This should be an accepted compare and it should not have been in the compare file.

I have the idea that the script you provided only checks for the first name in the comma seperated list. Can you confirm this and possible have an idea how to get around that since this would mean i need to add double lines.

[EDIT]

For some reason i start to doubt that it also keeps tracks of the spaces in the names since i noticed that the lines below also dont get skipped (as being correct).

Code:
          
5,04E+12   P00055056   2085825   KD1250K-QS   0   169.00   black en decker   JAC   black en   decker       
5,04E+12   P00055017   2090770   KR805K   0   83.60   black en decker   JAC   black en decker


Last edited by SDohmen; 09-13-2016 at 11:28 AM..
# 16  
Old 09-13-2016
So we are initially told that we have three CSV input files with semicolons as the field delimiters, but now you are telling us that code RudiC suggested meeting those requirements does not work when one of the input files starts using spaces instead of semicolons as the field delimiters (even though some of the fields in that file contain spaces as data) and one of the input files starts using commas instead of semicolons as the field delimiter. And, instead of seven input fields we now have 10 input fields and we aren't supposed to look at two of those fields because they are messed up.

If your input data formats keep changing, it is not our job to act as your unpaid programming staff and rewrite code to meet your constantly changing input file formats.

If you would like to try modifying RudiC's suggestions to meet your new input file formats, can show us clear specifications for your new input file formats, and can explain where you are stuck trying to get your modifications to work correctly; we will be happy to help you. Smilie

If you would like us to guess at what your new input file format is for your third input file and guess at how we might modify RudiC's code to work with your new input file formats, I doubt that many of the volunteers reading your posts here will be interested in tackling that assignment. Smilie
This User Gave Thanks to Don Cragun For This Post:
# 17  
Old 09-13-2016
Quote:
Originally Posted by Don Cragun
So we are initially told that we have three CSV input files with semicolons as the field delimiters, but now you are telling us that code RudiC suggested meeting those requirements does not work when one of the input files starts using spaces instead of semicolons as the field delimiters (even though some of the fields in that file contain spaces as data) and one of the input files starts using commas instead of semicolons as the field delimiter. And, instead of seven input fields we now have 10 input fields and we aren't supposed to look at two of those fields because they are messed up.

If your input data formats keep changing, it is not our job to act as your unpaid programming staff and rewrite code to meet your constantly changing input file formats.

If you would like to try modifying RudiC's suggestions to meet your new input file formats, can show us clear specifications for your new input file formats, and can explain where you are stuck trying to get your modifications to work correctly; we will be happy to help you. Smilie

If you would like us to guess at what your new input file format is for your third input file and guess at how we might modify RudiC's code to work with your new input file formats, I doubt that many of the volunteers reading your posts here will be interested in tackling that assignment. Smilie
I think we have a slight miscommunication here Smilie. All files use ; as delimiter. The linked parts above are just several lines which went through the script he wrote and that gives that output. The problem(s) i seem to have is that it ignores manufacturers which are not the first name in the list (all files are added to a previous post btw) and for some reason it does not seem to recognize several manufacters which have spaces like the ones above. The first part (the name in front) is pure to know if i need to rearrange the names but the second can be tricky since there are enough names that do have spaces. I could technicly remove all spaces but i am not sure if that would be a good solution.

The input files havent changed ever except for my miss on the column number for which i am sorry.

[Edit]
Again the code RudiC wrote is superb and the files havent changed. The problem is with the output it creates and most possible be a simple solution.
# 18  
Old 09-13-2016
If you look at the script, you see that FS = OFS = ";", and that the respective fields are split($1, T, ",") by commas. ALL comma separated subfields are treated identical and equal, and spaces are NOT treated specially thus can be part of names.
So - I'd recommend to take a look at the input data and validate those. black en decker and black en decker are definitely NOT the same!
This User Gave Thanks to RudiC For This Post:
# 19  
Old 09-13-2016
Thank you for that answer. Could you do comment on my question if the searched manufacturer name has to be first or does that not matter?
# 20  
Old 09-13-2016
Quote:
Originally Posted by RudiC
... ALL comma separated subfields are treated identical and equal, ...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Export Oracle multiple tables to multiple csv files using UNIX shell scripting

Hello All, just wanted to export multiple tables from oracle sql using unix shell script to csv file and the below code is exporting only the first table. Can you please suggest why? or any better idea? export FILE="/abc/autom/file/geo_JOB.csv" Export= `sqlplus -s dev01/password@dEV3... (16 Replies)
Discussion started by: Hope
16 Replies

2. UNIX for Dummies Questions & Answers

Joining different columns from multiple files

Hello again, I am trying to join 3rd column of 3 files into the end on one file and save it separately... my data looks like this file 1 Bob, Green, 80 Mark, Brown, 70 Tina, Smith, 60 file 2 Bob, Green, 70 Mark, Brown, 60 Tina, Smith, 50 file 3 Bob, Green, 50 Mark, Brown,60 Tina,... (6 Replies)
Discussion started by: A-V
6 Replies

3. Shell Programming and Scripting

Other alternative for joining together columns from multiple files

Hi again, I have monthly one-column files of roughly around 10 years. Is there a more efficient way to concatenate these files column-wise other than using paste command? For instance: file1.txt 12 13 15 12 file2.txt 14 15 18 19 file3.txt 20 21 (8 Replies)
Discussion started by: ida1215
8 Replies

4. Shell Programming and Scripting

checking csv files with empty fields..!

Hi! I need to learn that how a shell script can transverse a csv file n check if any field is empty or not. means its contains two comma or space b/w commas i.e., "" or " ". can anyone help me out how I can do that.... (10 Replies)
Discussion started by: sukhdip
10 Replies

5. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Hi, I have nine files looking similar to file1 & file2 below. File1: 1 ABCA1 1 ABCC8 1 ABR:N 1 ACACB 1 ACAP2 1 ACOT1 1 ACSBG 1 ACTR1 1 ACTRT 1 ADAMT 1 AEN:N 1 AKAP1File2: 1 A4GAL 1 ACTBL 1 ACTL7 (4 Replies)
Discussion started by: seqbiologist
4 Replies

6. Shell Programming and Scripting

Checking the existance of multiple files

I am trying to execute the following command to check the existance of a file (which has a date timestamp on it). If there are more than one file, then also it should give me 'success' result. if then <do some work> else <no files> fi Since there are more than one... (18 Replies)
Discussion started by: vivek_damodaran
18 Replies

7. UNIX for Dummies Questions & Answers

Joining string on multiple files

Hi guys, I am a forum (and a bit of a unix) newbie, and I currently have a tricky problem lying ahead of me. I have multiple files, and I am looking to join the files on the first column. Example: File 1 andy b 100 amy c 200 amy d 300 File 2 andy c 200 amy c 100 clyde o 50 ... (3 Replies)
Discussion started by: jdr0317
3 Replies

8. Shell Programming and Scripting

joining multiple files into one while putting the filename in the file

Hello, I know how to join multiple files using the cat function. I want to do something a little more advanced. Basically I want to put the filename in the first column... One thing to note is that the file is tab delimited. e.g. file1.txt joe 1 4 5 6 7 3 manny 2 3 4 5 6 7 ... (4 Replies)
Discussion started by: phil_heath
4 Replies

9. UNIX for Dummies Questions & Answers

Joining files based on multiple keys

I need a script (perl or awk..anything is fine) to join 3 files based on three key columns. The no of non-key columns can vary in each file. The columns are delimited by semicolon. For example, File1 Dim1;Dim2;Dim3;Fact1;Fact2;Fact3;Fact4;Fact5 ---- data delimited by semicolon --- ... (1 Reply)
Discussion started by: Sebben
1 Replies

10. UNIX for Advanced & Expert Users

Joining 2 CSV files together

I need a little help as I am a complete novice at scripting in unix. However, i am posed with an issue... i have two csv files in the following format@ FILE1.CSV: HEADER HEADER Header , , HEADER 001X ,,200 002X ,,300 003X ,,300 004X ,,300 FILE2.CSV: HEADER HEADER Header , ,... (3 Replies)
Discussion started by: chachabronson
3 Replies
Login or Register to Ask a Question