Modify csv-files with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Modify csv-files with awk
# 1  
Old 08-14-2018
Modify csv-files with awk

Hello everyone!

I have thousands of csv files I have to import into a Database table. As usually the files aren't perfect. For example they have a different number of columns and some weird columns.


The second problem is, that I have to add 3 parts of the filename into 3 rows in the table. The manufacturer, filetype and the date/time are only in the filename and not in the columns in the file itself.


The filename looks like:
Manufacturer1_FiletypeA_20180502120102.csv
Manufacturer2_FiletypeA_20180502140537.csv
Manufacturer3_FiletypeB_20180501010424.csv

And the data like:
Code:
abcd; XXXX; 1234; XXXX; XXXX; 43563; abcde; XXXX; XXXX; 
cdef; XXXX, 4354; XXXX; XXXX; 43244; cdefg; XXXX;

where XXXX stands for useless data I don't need.

What I have at the moment.
a script to select only the needed columns (awkScript):

Code:
 
#!/usr/bin/awk -f 
BEGIN {FS = ";"; OFS = ";"; ORS = "\n"} 
{print $1,$2,$7,$28,$29,$33,$34,$38,$39,$44,$45}

And a code that edits one file at the time and moves it to a new place. From the new place I can import the file into the database.


Code:
 
hdfs dfs -cat {remoteHdfsDir}/{file} | awk -f {awkScript} | sed 's/\r//' | sed '$d' | hdfs dfs -put - {hdfsWorkDir}/{file}


The two sed commands are needed because the files are from windows.
I use a python script to loop through the files, something like

Code:
for file in filelist: 
         run script 
         split filename
import file into database and add columns manufacturer, filetype and date/time from filename 


Because there are thousands of files, I need to add the 3 parts of the filename into the csv files and merge them to one file.

Anyone has an idea, how I could add the filename etc. to 3 columns in the file in my script?
And has anyone an idea, how I could put the files together to one (several) file(s)?

# 2  
Old 08-14-2018
Please show how and where the respective filename part should be added. What does "merge them to one file" mean - ALL input files resulting in one big output file? And, I don't know anythig about that hdfs dfs command - is that really necessary?
# 3  
Old 08-14-2018
Thank you RudiC for your response.

Quote:
Originally Posted by elRonaldo

The filename looks like:
Manufacturer1_FiletypeA_20180502120102.csv
Manufacturer2_FiletypeA_20180502140537.csv
Manufacturer3_FiletypeB_20180501010424.csv

And the data like:
Code:
abcd; XXXX; 1234; XXXX; XXXX; 43563; abcde; XXXX; XXXX; 
cdef; XXXX, 4354; XXXX; XXXX; 43244; cdefg; XXXX;

where XXXX stands for useless data I don't need.
The columns in the new file should look like this
Code:
 abcd; 1234; 43563; abcde; Manufacturer1; FiletypeA; 20180502120102
 cdef; 4354; 43244; cdefg; Manufacturer1; FiletypeA; 20180502120102

Quote:
Originally Posted by RudiC
What does "merge them to one file" mean - ALL input files resulting in one big output file?And, I don't know anythig about that hdfs dfs command - is that really necessary?
I have a few hundred files every day. They aren't huge so it would be good to merge them to one big file. So I need to connect to the DB only once and not a few hundred times.


Yes, the hdfs dfs is necessary because the files are on a Hadoop cluster.
# 4  
Old 08-14-2018
That is unfortunate as catting a file into a pipe and then reading it from stdin loses its file name, making it impossible to add its parts to the csv output file.
In a normal *nix setup, I'd propose something like (untested!)
Code:
awk '
BEGIN   {FS = OFS = ";"} 

LAST    {print LAST, FNMOD
        }
 
        {sub (/\r/, "")
         LAST = sprintf ("%s; %s; %s; %s", $1, $3, $6, $7)
         FNMOD = FILENAME
         gsub (/_/, "; ", FNMOD)
         sub (/.csv$/, "", FNMOD)
        } 
' *.csv


Last edited by RudiC; 08-14-2018 at 11:40 AM..
# 5  
Old 08-14-2018
Thank you RudiC. I'm going to try this tomorrow.
# 6  
Old 08-14-2018
If nobody in here has a brilliant proposal on how to deal with hdfs, you could copy the files to a local disk / temp directory, and then run above. If reading from a directory different from your current working directory, additional processing of the FILENAME might be necessary.
# 7  
Old 08-15-2018
Ok thank you. It looks like there is (at the moment) no easy solution.
I have to solve it in my python script and save the files in a temp directory.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk on csv files

wk on csv files Hi I have two csv files 1 ) keys.csv with data as below XX1,offsite XX2,offsite XX8,none XX3,offsite XX4,onsite XX7,none XX5,onsite XX6,onsite XX15,none 2) source.csv file with data like 1,0123,ppp,XX1 2,0122,sss,XX2 3,1239,yyy,XX8 4,567,kkk,XX5 (2 Replies)
Discussion started by: zozoo
2 Replies

2. UNIX for Beginners Questions & Answers

awk assistance - Comparing 2 csv files

Hello all, I have searched high and low for a solution to this, many have come really close but not quite what I'm after. I have 2 files. One contains GUID's, for example: 8121E002-96FE-4C9C-BC5A-6AFF20DACECD 84468F30-F3B7-418B-81F0-0908E80792BF A second file, contains a path to the... (8 Replies)
Discussion started by: tirmUK
8 Replies

3. Shell Programming and Scripting

Filter using awk in CSV files

Hello Gentlemen, Finding difficulties to play with my Input files:confused: . Your guidance will certainly help as always. After converting to csv file from XLSM file, I am getting some extra ""(double quote) characters which I want to terminate inside shell script and process it further. ... (6 Replies)
Discussion started by: pradyumnajpn10
6 Replies

4. Shell Programming and Scripting

Modify CSV file

Hi, I would like to change my CSV file by adding " and : and moving some of the information around. the CSV file looks as follows: 501254424;500440257;PE PACKS;300467279;PREP;;276476070;655031001867176;Two Block;Olga;25/12/2015 00:00:00;Olga I would like to move the field 7 to the front "... (13 Replies)
Discussion started by: omuhans123
13 Replies

5. Shell Programming and Scripting

Perl script to modify csv file

Hi Friends, I want to convert a csv file into a ordinary .txt file. I am able to convert but I want the output to look as shown below in the .txt file table findhost= { {"xyz","abc"}, {"rxz","mmz"}, {"vrr","nnz"}, } default={"NONE"} My current perl script #!/usr/bin/env perl... (12 Replies)
Discussion started by: dbashyam
12 Replies

6. Shell Programming and Scripting

Calling Pl/sql function in shell script to modify csv

I need to 1.Open a csv 2.Process the csv i.e. Modify 2 column in the csv. To modify the column the value needs to be passed to a pl/sql function and the return value should be updated For eg: If column 2 E,then E will be passed in database function which will return Employee. 3. Write a... (5 Replies)
Discussion started by: Chinky23
5 Replies

7. Shell Programming and Scripting

Read data from .csv file through shell script & modify

I need to read data from a file called "test.csv" through shell script where the file contains values like name,price,descriptor etc. There are rows where descriptor (& in some rows name) are written as string & other characters like "car_+" OR "bike*" etc where it should contains strings like... (3 Replies)
Discussion started by: raj100
3 Replies

8. Shell Programming and Scripting

Merge 2 csv files with awk

I have 2 files pipe delimted and want to merge them based on a key e.g file 1 123$aaa$yyy$zzz 345$xab$yzy$zyz 456$sss$ttt$foo 799$aaa$ggg$dee file 2 123$hhh 345$ddd 456$xxx 888$zzz so if the key is the first field, and the result should be the common key between file 1 and 2 (6 Replies)
Discussion started by: loloAix
6 Replies

9. Shell Programming and Scripting

Need to modify csv-file with bash script

Hi Guys, I need to write a script, that exports the "moz_places" table of the "places.sqlite"-file (firefox browser history) into a csv-file. That part works. After the export, my csv looks like this: ... 4429;http://www.sqlite.org/sqlite.html;"Command Line Shell For... (11 Replies)
Discussion started by: Sebi0815
11 Replies

10. UNIX for Dummies Questions & Answers

csv files (with quoted commas) and awk

I have a file as follows: 1,"This is field 2",3,4,5 2,"This is field 2 it can contain one , comma",3,4,5 3,"This is field 2 it also, can, contain, more",3,4,5 4,"This is field 2 without extra commas",3,4,5 and i pass this through to awk: awk -F, ' { if (... (3 Replies)
Discussion started by: Cranie
3 Replies
Login or Register to Ask a Question