Read multiple text files and copy data to csv


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Read multiple text files and copy data to csv
# 29  
Old 04-04-2016
Sorry. Change:
Code:
NR == 1         {HD = "Filename=IP address=PC name=destination=UTC begin=UTC end=destination IP address=duration (ms)=local begin=local end=" \
                        "source IP address=actual extension=original called party=original calling party"
                 for (N=i=split (HD, T); i>0; i--) SRCH[T[i]]
                 for (i=1; i<N; i++) printf "%s%s", T[i], OFS
                 print T[N]
                 FN = FILENAME
                }

to:
Code:
NR == 1         {HD = "Filename=IP address=PC name=destination=UTC begin=UTC end=destination IP address=duration (ms)=local begin=local end=" \
                        "source IP address=actual extension=original called party=original calling party"
                 for (N=i=split (HD, T); i>0; i--) SRCH[T[i]]
                 for (i=1; i<N; i++) printf "%s%s", T[i], OFS
                 print T[N]
                }
FNR == 1        {FN = FILENAME
                }

as a short term solution. And, note that if the number of files being processed increases at some point, you WILL get extra header lines in the middle of your output. As long as you always expect to have a least one file to process, the safer long-term solution is:
Code:
echo "Filename,IP address,PC name,destination,UTC begin,UTC end,destination IP address,duration (ms),local begin,local end,source IP address,actual extension,original called party,original calling party"
find . -type f -name "*.ipfx_audio" -exec awk -F= -vOFS="," '
NR == 1         {HD = "Filename=IP address=PC name=destination=UTC begin=UTC end=destination IP address=duration (ms)=local begin=local end=" \
                        "source IP address=actual extension=original called party=original calling party"
                 for (N=i=split (HD, T); i>0; i--) SRCH[T[i]]
                }
FNR == 1        {FN = substr(FILENAME, 3)	# Strip leading "./".
                }
FNR==1 && L     {printf "%s", FN
                 for (i=1; i<N; i++) printf "%s%s", PR[T[i]], OFS
                 print PR[T[N]]
                 delete PR
                }
$1 in SRCH      {PR[$1]=$NF
                 L=1
                }
END             {printf "%s", FN
                 for (i=1; i<N; i++) printf "%s%s", PR[T[i]], OFS
                 print PR[T[N]]
                }
' {} +

This User Gave Thanks to Don Cragun For This Post:
# 30  
Old 04-04-2016
hello

i tried this

l
Code:
descamp@D06613 /cygdrive/y/20090130/test
$ echo "Filename,IP address,PC name,destination,UTC begin,UTC end,destination IP address,duration (ms),local begin,local end,source IP address,actual extension,original called party,original calling party"
Filename,IP address,PC name,destination,UTC begin,UTC end,destination IP address,duration (ms),local begin,local end,source IP address,actual extension,original called party,original calling party

ldescamp@D06613 /cygdrive/y/20090130/test
$ find . -type f -name "*.ipfx_audio" -exec awk -F= -vOFS="," '
> NR == 1         {HD = "Filename=IP address=PC name=destination=UTC begin=UTC end=destination IP address=duration (ms)=local begin=local end=" \
>                         "source IP address=actual extension=original called party=original calling party"
>                  for (N=i=split (HD, T); i>0; i--) SRCH[T[i]]
>                  # Remove two lines here that printed the header.
>                  FN = substr(FILENAME, 3)# Strip leading "./".
>                 }
> FNR==1 && L     {printf "%s", FN
>                  for (i=1; i<N; i++) printf "%s%s", PR[T[i]], OFS
>                  print PR[T[N]]
>                  delete PR
>                 }
>
> $1 in SRCH      {PR[$1]=$NF
>                  L=1
>                 }
> END             {printf "%s", FN
>                  for (i=1; i<N; i++) printf "%s%s", PR[T[i]], OFS
>                  print PR[T[N]]
>                 }
> ' {} +

and it gives me this

Code:
1/20090130T165309.474091.ipfx_audio,10.11.40.61,D01309,,20090130T165309.474090,20090130T170003.303469,10.11.41.151,413829,20090130T165309.474091,20090130T170003.303469,10.11.0.241,,,
1/20090130T165309.474091.ipfx_audio,10.11.40.67,D01332,,20090130T165558.886637,20090130T165839.247671,10.11.41.136,160361,20090130T165558.886637,20090130T165839.247671,10.11.0.240,,,
1/20090130T165309.474091.ipfx_audio,10.11.40.61,D01309,,20090130T170010.074146,20090130T170019.325071,10.11.41.151,9250,20090130T170010.074146,20090130T170019.325071,10.11.41.97,,,

any idea?

thanks

---------- Post updated at 03:34 PM ---------- Previous update was at 03:27 PM ----------

i tried this

Code:
ldescamp@D06613 /cygdrive/y/20090130/test
$ echo "Filename,IP address,PC name,destination,UTC begin,UTC end,destination IP address,duration (ms),local begin,local end,source IP address,actual extension,original called party,original calling party"
Filename,IP address,PC name,destination,UTC begin,UTC end,destination IP address,duration (ms),local begin,local end,source IP address,actual extension,original called party,original calling party

ldescamp@D06613 /cygdrive/y/20090130/test
$ find . -type f -name "*.ipfx_audio" -exec awk -F= -vOFS="," '
> NR == 1         {HD = "Filename=IP address=PC name=destination=UTC begin=UTC end=destination IP address=duration (ms)=local begin=local end=" \
>                         "source IP address=actual extension=original called party=original calling party"
>                  for (N=i=split (HD, T); i>0; i--) SRCH[T[i]]
>                 }
> FNR == 1        {FN = substr(FILENAME, 3)# Strip leading "./".
>                 }
> FNR==1 && L     {printf "%s", FN
>                  for (i=1; i<N; i++) printf "%s%s", PR[T[i]], OFS
>                  print PR[T[N]]
>                  delete PR
>                 }
> $1 in SRCH      {PR[$1]=$NF
>                  L=1
>                 }
> END             {printf "%s", FN
>                  for (i=1; i<N; i++) printf "%s%s", PR[T[i]], OFS
>                  print PR[T[N]]
>                 }
> ' {} +

and got this

Code:
1/20090130T165558.886637.ipfx_audio,10.11.40.61,D01309,,20090130T165309.474090,20090130T170003.303469,10.11.41.151,413829,20090130T165309.474091,20090130T170003.303469,10.11.0.241,,,
2/20090130T170010.074146.ipfx_audio,10.11.40.67,D01332,,20090130T165558.886637,20090130T165839.247671,10.11.41.136,160361,20090130T165558.886637,20090130T165839.247671,10.11.0.240,,,
2/20090130T170010.074146.ipfx_audio,10.11.40.61,D01309,,20090130T170010.074146,20090130T170019.325071,10.11.41.151,9250,20090130T170010.074146,20090130T170019.325071,10.11.41.97,,,

thanks
Moderator's Comments:
Mod Comment Please use CODE tags (not ICODE tags) for full-line and multi-line sample input, output, and code segments.

Last edited by Don Cragun; 04-04-2016 at 06:41 PM.. Reason: Change ICODE tags to CODE tags.
# 31  
Old 04-05-2016
Since you posted this in the experienced user's forum (Shell Programming and Scripting) instead of the beginner's forum (UNIX for Dummies Questions & Answers), I incorrectly assumed that you were creating a shell script and that you would know how to put two commands in a file and execute those two commands in a script.

So, going back to basics: Create a file named myscript containing the following text using an editor that uses <newline> characters as the line terminator (not the DOS/Windows <carriage-return><newline> pair of characters as the line terminator):
Code:
#!/bin/sh
echo "Filename,IP address,PC name,destination,UTC begin,UTC end,destination IP address,duration (ms),local begin,local end,source IP address,actual extension,original called party,original calling party"
find . -type f -name "*.ipfx_audio" -exec awk -F= -vOFS="," '
NR == 1         {HD = "Filename=IP address=PC name=destination=UTC begin=UTC end=destination IP address=duration (ms)=local begin=local end=" \
                        "source IP address=actual extension=original called party=original calling party"
                 for (N=i=split (HD, T); i>0; i--) SRCH[T[i]]
                }
FNR == 1        {FN = substr(FILENAME, 3)	# Strip leading "./".
                }
FNR==1 && L     {printf "%s", FN
                 for (i=1; i<N; i++) printf "%s%s", PR[T[i]], OFS
                 print PR[T[N]]
                 delete PR
                }
$1 in SRCH      {PR[$1]=$NF
                 L=1
                }
END             {printf "%s", FN
                 for (i=1; i<N; i++) printf "%s%s", PR[T[i]], OFS
                 print PR[T[N]]
                }
' {} +

then make this file executable:
Code:
chmod +x myscript

and then run the script:
Code:
./myscript

and, if you have 12 million files with names ending with .ipfx_audio in the file hierarchy rooted in the current directory, that should give you about 12,000,001 lines of output in CSV file format printed to your terminal including the single header line.

If, instead of sending the output to your terminal, you want to save the output in a file named myoutput.csv, use this command to run the script:
Code:
./myscript > myoutput.csv

If you would have told us how to determine which lines in this output belonged to which department, we could have suggested a script that would produce one output file (using this same format, with one header line in each department's file) instead of creating one file you'll have to reprocess again to create your department files. But, since you have been unwilling to give us any information on how to do that, we will have to let you split this output into the needed department files on your own.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Generate files and use csv data to replace multiple variables in a template

I have a source csv file consists of first field as variable name, and the rest are site-specific information (converted from excel file, where site -specific values in columns). I am trying to create a file for every site using a template and replace the multiple variables with values from the... (3 Replies)
Discussion started by: apalex
3 Replies

2. UNIX for Beginners Questions & Answers

Copy data at specified location from multiple files

Hello everyone, Im super new to coding but increasingly in need of it at work. Im have task stacked because of this problems, that I cannot figure out how to solve looking on the internet after trying many many things that looked similar to me. I have multiple data files of the form (see below).... (2 Replies)
Discussion started by: Xfiles_fan
2 Replies

3. Shell Programming and Scripting

In PErl script: need to read the data one file and generate multiple files based on the data

We have the data looks like below in a log file. I want to generat files based on the string between two hash(#) symbol like below Source: #ext1#test1.tale2 drop #ext1#test11.tale21 drop #ext1#test123.tale21 drop #ext2#test1.tale21 drop #ext2#test12.tale21 drop #ext3#test11.tale21 drop... (5 Replies)
Discussion started by: Sanjeev G
5 Replies

4. Shell Programming and Scripting

Read csv file, convert the data and make one text file in UNIX shell scripting

I have input data looks like this which is a part of a csv file 7,1265,76548,"0102:04" 8,1266,76545,"0112:04" I need to make the output data should look like this and the output data will be part of text file: 7|1265000 |7654899 |A| 8|12660000 |76545999 |B| The logic behind the... (6 Replies)
Discussion started by: RJG
6 Replies

5. UNIX for Beginners Questions & Answers

awk GSUB read field values from multiple text files

My program run without error. The problem I am having. The program isn't outputting field values with the column headers to file.txt. Each of the column headers in file.txt has no data. MEMSIZE SECOND SASFoundation Filename The output results in file.txt should show: ... (1 Reply)
Discussion started by: dellanicholson
1 Replies

6. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

7. UNIX for Dummies Questions & Answers

Need help combining txt files w/ multiple lines into csv single cell - also need data merge

:confused:Hello -- i just joined the forums. I am a complete noob -- only about 1 week into learning how to program anything... and starting with linux. I am working in Linux terminal. I have a folder with a bunch of txt files. Each file has several lines of html code. I want to combine... (2 Replies)
Discussion started by: jetsetter
2 Replies

8. Shell Programming and Scripting

Read multiple files, parse data and append to a file

Hi..Can anyone suggest a simple way of achieving this. I have several files which ends with extension .vcf . I will give example with two files In the below files, we are interested in File 1: 38 107 C 3 T 6 C/T 38 241 C 4 T 5 C/T 38 247 T 4 C 5 T/C 38 259 T 3 C 6 T/C... (8 Replies)
Discussion started by: empyrean
8 Replies

9. Shell Programming and Scripting

Combine Multiple text or csv files column-wise

Hi All I am trying to combine columns from multiple text files into a single file using paste command but the record length being unequal in the different files the data is running over to the closest empty cell on the left. Please see below. What can i do to resolve this ? File 1 File... (15 Replies)
Discussion started by: venky_ibm
15 Replies

10. Shell Programming and Scripting

Read the data from multiple files and sum the value

Hi all, I have a requirement where i have to read multiple files using Shell Script in Korn Shell. each file will have the 3rd line as the amount field, i have to read this amount field and sum it for all the files. any idea on how to achieve this?? (i think i can achieve it using a loop,... (9 Replies)
Discussion started by: nvuradi
9 Replies
Login or Register to Ask a Question