awk and substr performance


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk and substr performance
# 1  
Old 09-25-2008
awk and substr performance

Hi,
I have another performance question i would like to share:

i have this code:
Code:
#!/usr/bin/ksh
#CONTACT
gzcat *CONTACT* | awk ' 
{
KEY=substr($0,1,12)
fkey=substr($0,14,12)
addkey=substr($0,27,12)
fkeyEATED=substr($0,40,10)
fkeyEATET=substr($0,51,8)
lkey=substr($0,60,10)
utlkey=substr($0,71,8)
st=substr($0,80,20)
ft=substr($0,101,100)
ft__=substr($0,202,100)
nic=substr($0,303,20)
SUBSfkeyIBERNUMBER=substr($0,324,20)
sco=substr($0,345,20)
stblk=substr($0,366,10)
stblkT=substr($0,377,8)
edb=substr($0,386,10)
edbT=substr($0,397,8)
bckrs=substr($0,406,10)
pckcd=substr($0,417,40)
dlcd=substr($0,458,10)
ppmd=substr($0,469,20)
cstmrd=substr($0,490,20)
dsnnrsn=substr($0,511,20)
sx=substr($0,532,10)
prpd=substr($0,543,1)
ml=substr($0,545,128)
ms=substr($0,674,15)
KEY=trim(KEY)
fkey=trim(fkey)
addkey=trim(addkey)
fkeyEATED=trim(fkeyEATED)
fkeyEATET=trim(fkeyEATET)
lkey=trim(lkey)
utlkey=trim(utlkey)
st=trim(st)
ft=trim(ft)
ft__=trim(ft__)
nic=trim(nic)
SUBSfkeyIBERNUMBER=trim(SUBSfkeyIBERNUMBER)
sco=trim(sco)
stblk=trim(stblk)
stblkT=trim(stblkT)
edb=trim(edb)
edbT=trim(edbT)
bckrs=trim(bckrs)
pckcd=trim(pckcd)
dlcd=trim(dlcd)
ppmd=trim(ppmd)
cstmrd=trim(cstmrd)
dsnnrsn=trim(dsnnrsn)
sx=trim(sx)
prpd=trim(prpd)
ml=trim(ml)
ms=trim(ms)
print SUBSfkeyIBERNUMBER,",1,"KEY,","fkey,","addkey,","fkeyEATED,","fkeyEATET,","lkey,","utlkey,","st,","ft,","ft__,","nic,","SUBSfkeyIBERNUMBER,","sco,","stblk,","stblkT,","edb,","edbT,","bckrs,","pckcd,","dlcd,","ppmd,","cstmrd,","dsnnrsn,","sx,","prpd,","ml,","ms}
function ltrim(s) { sub(/^ +/, "", s); return s }
function rtrim(s) { sub(/ +$/, "", s); return s }
function trim(s)  { return rtrim(ltrim(s)); }
'> final
sort final > final2
rm  final
gzip final2
mv final2.gz ${data}-fkeym_all.gz

my problem:
this takes a really long time to execute ( maybe because the input file has 4 GB in gz format and 30 GB in normal format.)

I'm trying to find a way to replace all the substrings by a single or a simpler expression ( it seems to me like i loose a lot of performance cutting for each field the original string)

I'm searching for a way to do all the cutting in just one time.

best regards,
Ricardo Tomás
# 2  
Old 09-25-2008
With Perl you could do them all in one big happy regular expression (or unpack for that matter), although I'm uncertain whether that could be any faster.
# 3  
Old 09-26-2008
Quote:
Originally Posted by era
With Perl you could do them all in one big happy regular expression (or unpack for that matter), although I'm uncertain whether that could be any faster.
era i think in perl also it will take same amount of time
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk and substr

Hello All; I have an input file 'abc.txt' with below text: 512345977,213458,100021 512345978,213454,100031 512345979,213452,100051 512345980,213455,100061 512345981,213456,100071 512345982,213456,100091 512345983,213457,100041 512345984,213451,100011 I need to paste the first field... (10 Replies)
Discussion started by: mystition
10 Replies

2. Shell Programming and Scripting

HELP : awk substr

Hi, - In a file test.wmi Col1 | firstName | lastName 4003 | toto_titi_CT- | otot_itit - I want to have only ( colones $7,$13 and $15) with code 4003 and 4002. for colone $13 I want to have the whole name untill _CT- or _GC- 1- I used the command egrep with awk #egrep -i... (2 Replies)
Discussion started by: georg2014
2 Replies

3. Shell Programming and Scripting

awk substr

Hello life savers!! Is there any way to use substr in awk command for returning one part of a string from declared start and stop point? I mean I know we have this: substr(string, start, length) Do we have anything like possible to use in awk ? : substr(string, start, stop) ... (9 Replies)
Discussion started by: @man
9 Replies

4. Shell Programming and Scripting

Substr with awk

Hi to all, I'm here again, cause I need your help to solve another issue for me. I have some files that have this name format: date_filename.csv In my shell I must rename each file removing the date so that the file name is filename.csv To do this I use this command: fnames=`ls ${fname}|... (2 Replies)
Discussion started by: leobdj
2 Replies

5. Shell Programming and Scripting

awk substr

HI I am using awk and substr function to list out the directory names in the present working directory . I am using below code ls -l | awk '{ if ((substr($1,1,1)) -eq d) {print $9 }}' But the problem is i am getting all the files and directories listed where as the requirement i wrote... (7 Replies)
Discussion started by: prabhu_kumar
7 Replies

6. Shell Programming and Scripting

Help with awk and substr

I have the following to find lines matching "COMPLETE" and extract parts of it using substr. sed -n "/COMPLETE/p" 1.txt | awk 'BEGIN { FS = "\" } {printf"%s %s:%s \n", substr($3,17,3),substr($6,4,1), substr($7,4,1)}' | sort | uniq > temp.txt Worked fine until the numbers in 2nd & 3rd substr... (5 Replies)
Discussion started by: zpn
5 Replies

7. Shell Programming and Scripting

awk substr

Hi I have multiple files that name begins bidb_yyyymm. (yyyymm = current year month of file creation). What I want to do is look at the files and where yyyymm is older than 1 month I want to remove the file from the server. I was looking at looping through the files and getting the yyyymm... (2 Replies)
Discussion started by: colesga
2 Replies

8. UNIX for Dummies Questions & Answers

awk or substr

i have a variable 200612 the last two digits of this variable should be between 1 and 12, it should not be greater than 12 or less than 1 (for ex: 00 or 13,14,15 is not accepted) how do i check for this conditions in a unix shell script. thanks Ram (3 Replies)
Discussion started by: ramky79
3 Replies

9. Shell Programming and Scripting

How to use awk substr ?

Hi all, I have a flatfile I would like to get ext = 7950 , how do I do that ? if ($1 == "CTI-ProgramStart") { ext = substr($9,index($9,"Extension")+11,4); But why it is not working ???? Please help . Thanks (1 Reply)
Discussion started by: sabercats
1 Replies

10. Shell Programming and Scripting

awk substr?

Sorry if this has been posted before, I searched but not sure what I really want to do. I have a file with records that show who has logged into my application: 2003-03-14:I:root: Log_mesg: registered servername:userid. (more after this) I want to pull out the userid, date and time into... (2 Replies)
Discussion started by: MizzGail
2 Replies
Login or Register to Ask a Question