File transformation - what is most efficient method


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting File transformation - what is most efficient method
# 1  
Old 10-28-2009
File transformation - what is most efficient method

I've done quite a bit of searching on this but cannot seem to find exactly what I'm looking for. Say I have a | delimited input file with 6 columns and I need to change the value of a few columns and create an output file. With my limited knowledge I can do this with many lines of code but want some expert opinions on what would be the most efficient as I'll be working with some large files.

Example input row:
12345|employee1|customer2|d. gibbins|20091028|10000

column1 - leave as is
column2 - if employee is in lookup1.txt, set to Y else N
column3 - if customer is in lookup2.txt, set to Y else N
column4 - upcase
column5 - change date format to MM/DD/YYYY
column6 - add explicit decimal

Example output row:
12345|Y|N|D. GIBBINS|10/28/2009|100.00

I'm not looking for exact syntax, but a general idea of commands you would use and/or workflow.
Thanks!
# 2  
Old 10-28-2009
Code:
awk -F\| '{
if (v == 3) printf("%s|%s|%s|%s|%s/%s/%s|%.2f\n",
  $1,
  ($2 in emp)? "Y" : "N",
  ($3 in cus)? "Y" : "N",
  toupper($4),
  substr($5, 3, 2), substr($5, 1, 2), substr($5, 5, 4),
  $6/100)
else if (v == 1) emp[$0]
else if (v == 2) cus[$0]
}' v=1 lookup1.txt v=2 lookup2.txt v=3 maininput.txt

# 3  
Old 10-29-2009
This worked perfectly. Thanks alot for your time.

One other quick question if I could. If I have to trim white space from a column, is there a way to do that with printf? I was able to do it by assigning the column to a variable and use gsub. Just wondering if there's a quicker way.

Thanks again.
# 4  
Old 10-29-2009
Instead of trimming the spaces, you can set your column delimiter to:
optional spaces followed by "|" and then followed by optional spaces with

Code:
awk '
BEGIN { FS=" *\\| *" }
...
'

# 5  
Old 10-29-2009
Interesting, didn't know that was an option for FS. Unfortunately my input file is fixed length and I'm doing a bunch of substr's to pull out the data I need. I said | in my original question just to keep it simple.
Thanks
# 6  
Old 10-30-2009
Trim left white spaces:

Code:
gsub(/^[ \t]+/, "", string)

Trim right white spaces:

Code:
gsub(/[ \t]+$/, "", string)

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Most efficient method to extract values from text files

I have a list of files defined in a single file , one on each line.(No.of files may wary each time) eg. content of ETL_LOOKUP.dat /data/project/randomname /data/project/ramname /data/project/raname /data/project/radomname /data/project/raame /data/project/andomname size of these... (5 Replies)
Discussion started by: h0x0r21
5 Replies

2. Shell Programming and Scripting

Efficient method of determining if a string is in a file.

Hi, I was hoping someone could suggest an alternative to code I currently have as mine takes up far too much processor time and it to slow. The situation: I have a programme that runs on some files just before they are zipped up and archived, the program appends a one line summary of the... (4 Replies)
Discussion started by: RECrerar
4 Replies

3. Shell Programming and Scripting

Efficient population of array from text file

Hi, I am trying to populate an array with data from a text file. I have a working method using awk but it is too slow and inefficent. See below. The text file has 70,000 lines. As awk is a line editor it reads each line of the file until it gets to the required line and then processes it.... (3 Replies)
Discussion started by: carlr
3 Replies

4. UNIX for Dummies Questions & Answers

file transformation using fixed width file

Hi Gurus! I need to make some file transformations. Please help. This is my input file. It has four columns with fixed width. 1 aaa bbbb cccc 2 eee dddd jjjj 3 fff gggg jjjj 4 hhh iiii cccc 5 kkk llll cccc 6 mmm nnnn oooo 7 ppp qqqq xxxx 8 rrr ... (1 Reply)
Discussion started by: kokoro
1 Replies

5. Homework & Coursework Questions

Efficient Text File Writing

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: Write a template main.c file via shell script to make it easier for yourself later. The issue here isn't writing... (2 Replies)
Discussion started by: george3isme
2 Replies

6. UNIX for Dummies Questions & Answers

Efficient way of extracting data from file

I am having a file, around 500 lines. which contains one letter words, two letters words,...and so on(up to 15 letter words and words are not seprated by line). I need to compare all 1 letter words with 3,4,5 and 6 letters word, all 2 letters words with 2,3,4 and 5 letters words and all 3 letters... (3 Replies)
Discussion started by: akhay_ms
3 Replies

7. Shell Programming and Scripting

XML file transformation

Hi all, I have to transform a XML file like this: <?xml version="1.0"?> <vocabulary> <voc_id>102</voc_id> <name>Vocabulary Name</name> <description>Voc description</description> <relations>3</relations> <hierarchy>5</hierarchy> <word> <word_id>1</word_id> ... (1 Reply)
Discussion started by: aLittleBeat
1 Replies

8. UNIX for Dummies Questions & Answers

efficient raid file server

I need to put together a RAID1 file server for use by Windoze systems. I've built zillions of windows systems from components. I was a HPUX SE for a long time at HP, but have been out of the game for years. I've got an old workhorse mobo FIC PA-2013 with a 450 MHz K6 III+ I could use, but I'd... (2 Replies)
Discussion started by: pcmacd
2 Replies

9. Shell Programming and Scripting

file name transformation

I've got a multitude of text data files that carry exactly the same kind of data. Unfortunately some of them have a different filename format some are: 'category'_'month'-'year'_act.txt an example being: daf_Apr-1961_act.txt and some are: 'category'_ 'year'-'month'_act.txt an... (16 Replies)
Discussion started by: vrms
16 Replies

10. Shell Programming and Scripting

Need more efficient log file grep

I'm writing a script that at one point needs to check the contents of another script's log file to determine how to proceed. An example record from the log file is: "mcref04152006","060417","ANTH0415","282","272","476,983.37","465,268.44","loaded" I want my script to return this record if: ... (3 Replies)
Discussion started by: Glenn Arndt
3 Replies
Login or Register to Ask a Question