Sponsored Content
Full Discussion: awk and substr performance
Top Forums Shell Programming and Scripting awk and substr performance Post 302240290 by naoseionome on Thursday 25th of September 2008 10:51:21 AM
Old 09-25-2008
awk and substr performance

Hi,
I have another performance question i would like to share:

i have this code:
Code:
#!/usr/bin/ksh
#CONTACT
gzcat *CONTACT* | awk ' 
{
KEY=substr($0,1,12)
fkey=substr($0,14,12)
addkey=substr($0,27,12)
fkeyEATED=substr($0,40,10)
fkeyEATET=substr($0,51,8)
lkey=substr($0,60,10)
utlkey=substr($0,71,8)
st=substr($0,80,20)
ft=substr($0,101,100)
ft__=substr($0,202,100)
nic=substr($0,303,20)
SUBSfkeyIBERNUMBER=substr($0,324,20)
sco=substr($0,345,20)
stblk=substr($0,366,10)
stblkT=substr($0,377,8)
edb=substr($0,386,10)
edbT=substr($0,397,8)
bckrs=substr($0,406,10)
pckcd=substr($0,417,40)
dlcd=substr($0,458,10)
ppmd=substr($0,469,20)
cstmrd=substr($0,490,20)
dsnnrsn=substr($0,511,20)
sx=substr($0,532,10)
prpd=substr($0,543,1)
ml=substr($0,545,128)
ms=substr($0,674,15)
KEY=trim(KEY)
fkey=trim(fkey)
addkey=trim(addkey)
fkeyEATED=trim(fkeyEATED)
fkeyEATET=trim(fkeyEATET)
lkey=trim(lkey)
utlkey=trim(utlkey)
st=trim(st)
ft=trim(ft)
ft__=trim(ft__)
nic=trim(nic)
SUBSfkeyIBERNUMBER=trim(SUBSfkeyIBERNUMBER)
sco=trim(sco)
stblk=trim(stblk)
stblkT=trim(stblkT)
edb=trim(edb)
edbT=trim(edbT)
bckrs=trim(bckrs)
pckcd=trim(pckcd)
dlcd=trim(dlcd)
ppmd=trim(ppmd)
cstmrd=trim(cstmrd)
dsnnrsn=trim(dsnnrsn)
sx=trim(sx)
prpd=trim(prpd)
ml=trim(ml)
ms=trim(ms)
print SUBSfkeyIBERNUMBER,",1,"KEY,","fkey,","addkey,","fkeyEATED,","fkeyEATET,","lkey,","utlkey,","st,","ft,","ft__,","nic,","SUBSfkeyIBERNUMBER,","sco,","stblk,","stblkT,","edb,","edbT,","bckrs,","pckcd,","dlcd,","ppmd,","cstmrd,","dsnnrsn,","sx,","prpd,","ml,","ms}
function ltrim(s) { sub(/^ +/, "", s); return s }
function rtrim(s) { sub(/ +$/, "", s); return s }
function trim(s)  { return rtrim(ltrim(s)); }
'> final
sort final > final2
rm  final
gzip final2
mv final2.gz ${data}-fkeym_all.gz

my problem:
this takes a really long time to execute ( maybe because the input file has 4 GB in gz format and 30 GB in normal format.)

I'm trying to find a way to replace all the substrings by a single or a simpler expression ( it seems to me like i loose a lot of performance cutting for each field the original string)

I'm searching for a way to do all the cutting in just one time.

best regards,
Ricardo Tomás
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk substr?

Sorry if this has been posted before, I searched but not sure what I really want to do. I have a file with records that show who has logged into my application: 2003-03-14:I:root: Log_mesg: registered servername:userid. (more after this) I want to pull out the userid, date and time into... (2 Replies)
Discussion started by: MizzGail
2 Replies

2. Shell Programming and Scripting

How to use awk substr ?

Hi all, I have a flatfile I would like to get ext = 7950 , how do I do that ? if ($1 == "CTI-ProgramStart") { ext = substr($9,index($9,"Extension")+11,4); But why it is not working ???? Please help . Thanks (1 Reply)
Discussion started by: sabercats
1 Replies

3. UNIX for Dummies Questions & Answers

awk or substr

i have a variable 200612 the last two digits of this variable should be between 1 and 12, it should not be greater than 12 or less than 1 (for ex: 00 or 13,14,15 is not accepted) how do i check for this conditions in a unix shell script. thanks Ram (3 Replies)
Discussion started by: ramky79
3 Replies

4. Shell Programming and Scripting

awk substr

Hi I have multiple files that name begins bidb_yyyymm. (yyyymm = current year month of file creation). What I want to do is look at the files and where yyyymm is older than 1 month I want to remove the file from the server. I was looking at looping through the files and getting the yyyymm... (2 Replies)
Discussion started by: colesga
2 Replies

5. Shell Programming and Scripting

Help with awk and substr

I have the following to find lines matching "COMPLETE" and extract parts of it using substr. sed -n "/COMPLETE/p" 1.txt | awk 'BEGIN { FS = "\" } {printf"%s %s:%s \n", substr($3,17,3),substr($6,4,1), substr($7,4,1)}' | sort | uniq > temp.txt Worked fine until the numbers in 2nd & 3rd substr... (5 Replies)
Discussion started by: zpn
5 Replies

6. Shell Programming and Scripting

awk substr

HI I am using awk and substr function to list out the directory names in the present working directory . I am using below code ls -l | awk '{ if ((substr($1,1,1)) -eq d) {print $9 }}' But the problem is i am getting all the files and directories listed where as the requirement i wrote... (7 Replies)
Discussion started by: prabhu_kumar
7 Replies

7. Shell Programming and Scripting

Substr with awk

Hi to all, I'm here again, cause I need your help to solve another issue for me. I have some files that have this name format: date_filename.csv In my shell I must rename each file removing the date so that the file name is filename.csv To do this I use this command: fnames=`ls ${fname}|... (2 Replies)
Discussion started by: leobdj
2 Replies

8. Shell Programming and Scripting

awk substr

Hello life savers!! Is there any way to use substr in awk command for returning one part of a string from declared start and stop point? I mean I know we have this: substr(string, start, length) Do we have anything like possible to use in awk ? : substr(string, start, stop) ... (9 Replies)
Discussion started by: @man
9 Replies

9. Shell Programming and Scripting

HELP : awk substr

Hi, - In a file test.wmi Col1 | firstName | lastName 4003 | toto_titi_CT- | otot_itit - I want to have only ( colones $7,$13 and $15) with code 4003 and 4002. for colone $13 I want to have the whole name untill _CT- or _GC- 1- I used the command egrep with awk #egrep -i... (2 Replies)
Discussion started by: georg2014
2 Replies

10. Shell Programming and Scripting

awk and substr

Hello All; I have an input file 'abc.txt' with below text: 512345977,213458,100021 512345978,213454,100031 512345979,213452,100051 512345980,213455,100061 512345981,213456,100071 512345982,213456,100091 512345983,213457,100041 512345984,213451,100011 I need to paste the first field... (10 Replies)
Discussion started by: mystition
10 Replies
textutil::trim(3tcl)				    Text and string utilities, macro processing 			      textutil::trim(3tcl)

__________________________________________________________________________________________________________________________________________________

NAME
textutil::trim - Procedures to trim strings SYNOPSIS
package require Tcl 8.2 package require textutil::trim ?0.7? ::textutil::trim::trim string ?regexp? ::textutil::trim::trimleft string ?regexp? ::textutil::trim::trimright string ?regexp? ::textutil::trim::trimPrefix string prefix ::textutil::trim::trimEmptyHeading string _________________________________________________________________ DESCRIPTION
The package textutil::trim provides commands that trim strings using arbitrary regular expressions. The complete set of procedures is described below. ::textutil::trim::trim string ?regexp? Remove in string any leading and trailing substring according to the regular expression regexp and return the result as a new string. This is done for all lines in the string, that is any substring between 2 newline chars, or between the beginning of the string and a newline, or between a newline and the end of the string, or, if the string contain no newline, between the beginning and the end of the string. The regular expression regexp defaults to "[ \t]+". ::textutil::trim::trimleft string ?regexp? Remove in string any leading substring according to the regular expression regexp and return the result as a new string. This apply on any line in the string, that is any substring between 2 newline chars, or between the beginning of the string and a newline, or between a newline and the end of the string, or, if the string contain no newline, between the beginning and the end of the string. The regular expression regexp defaults to "[ \t]+". ::textutil::trim::trimright string ?regexp? Remove in string any trailing substring according to the regular expression regexp and return the result as a new string. This apply on any line in the string, that is any substring between 2 newline chars, or between the beginning of the string and a newline, or between a newline and the end of the string, or, if the string contain no newline, between the beginning and the end of the string. The regular expression regexp defaults to "[ \t]+". ::textutil::trim::trimPrefix string prefix Removes the prefix from the beginning of string and returns the result. The string is left unchanged if it doesn't have prefix at its beginning. ::textutil::trim::trimEmptyHeading string Looks for empty lines (including lines consisting of only whitespace) at the beginning of the string and removes it. The modified string is returned as the result of the command. BUGS, IDEAS, FEEDBACK This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category textutil of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation. SEE ALSO
regexp(3tcl), split(3tcl), string(3tcl) KEYWORDS
prefix, regular expression, string, trimming CATEGORY
Text processing textutil 0.7 textutil::trim(3tcl)
All times are GMT -4. The time now is 03:31 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy