shell script performance issues --Urgent


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting shell script performance issues --Urgent
# 1  
Old 12-13-2007
shell script performance issues --Urgent

I need help in awk please help immediatly.

This below function is taking lot of time
Please help me to fine tune it so that it runs faster.
The file count is around 3million records

# Process Body
processbody() {

#set -x

while read line
do
ENTITY_TYPE=`print "$line" | cut -d'|' -f2 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`

if [ ${ENTITY_TYPE} == "O" ]
then
ENTITY_TYPE="B"
else
ENTITY_TYPE="P"
fi
CUSTOMER_ID=`print "$line" | cut -d'|' -f1 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`

#Branch and Account Numbers should be left blank

BRANCH_NUMBER=
ACCOUNT_NUMBER=
ACCOUNT_DATE_OPEN=`print "$line" | cut -d'|' -f3 |sed 's/[^0-9]//g' | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}' | cut -c1-8`
CORPORATE_NAME=`print "$line" | cut -d'|' -f4 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
LAST_NAME=`print "$line" | cut -d'|' -f5 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
FIRST_NAME=`print "$line" | cut -d'|' -f6 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
MIDDLE_NAME=`print "$line" | cut -d'|' -f7 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
NAME_SUFFIX=`print "$line" | cut -d'|' -f8 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`

# Extracting person gender information
PERSON_GENDER=`print "$line" | cut -d'|' -f9 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
# If gender is anything other than M or F,replace it with blank
if [[ ${PERSON_GENDER} != "M" && ${PERSON_GENDER} != "F" ]]
then
PERSON_GENDER=
fi

BIRTH_DATE=`print $line | cut -d'|' -f10 | sed 's/[^0-9]//g' | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}' | cut -c1-8`
#AGE should be left blank
AGE=



# Extracting citizenship code information
CITIZEN_COUNTRY_NAME=`print $line | cut -d'|' -f11 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`


if [[ ${CITIZEN_COUNTRY_NAME} == "US" || ${CITIZEN_COUNTRY_NAME} == "USA" || ${CITIZEN_COUNTRY_NAME} == "UNITED STATES" || ${CITIZEN_COUNTRY_NAME} == "UNITED STATES OF AMERICA" ]]
then
CITIZENSHIP_CODE="USA"
FED_ID=`print $line | cut -d'|' -f12 | sed -e 's/[^0-9]//g' | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
else
CITIZENSHIP_CODE=`print $line | cut -d'|' -f11 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}' | cut -c1-3`
FED_ID=
fi

if [[ ${ENTITY_TYPE} == "P" ]]
then
FED_ID_TYPE="S"
else
FED_ID_TYPE="T"
fi

#Extracting National ID information

ID_INFORMATION_1=`print $line | cut -d'|' -f13 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
ID_INFORMATION_2=`print $line | cut -d'|' -f14 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`


if [[ ! -z ${ID_INFORMATION_1} && ${ID_INFORMATION_1} != "" ]]
then
NATIONAL_ID=${ID_INFORMATION_1}

# Remove all non numeric characters in NATIONAL_ID field
NATIONAL_ID=`print ${NATIONAL_ID} | sed 's/[^0-9a-zA-Z]//g' | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
NATIONAL_ID_TYPE="DL"
elif [[ ! -z ${ID_INFORMATION_2} && ${ID_INFORMATION_2} != "" ]]
then
NATIONAL_ID=${ID_INFORMATION_2}


# Remove all non numeric characters in NATIONAL_ID field
NATIONAL_ID=`print ${NATIONAL_ID} | sed 's/[^0-9a-zA-Z]//g' | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
NATIONAL_ID_TYPE="PP"
else
NATIONAL_ID=
NATIONAL_ID_TYPE=
fi

#Extracting street address information

ADDRESS_1=`print $line | cut -d'|' -f15 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
ADDRESS_2=`print $line | cut -d'|' -f16 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
STREET_ADDRESS=${ADDRESS_1}${ADDRESS_2}
STREET_ADDRESS=`print ${STREET_ADDRESS} | cut -c1-60`

#Extracting city information

ADDRESS_3=`print $line | cut -d'|' -f17 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
CITY_NAME=${ADDRESS_3}

#Extracting country information

COUNTRY=`print $line | cut -d'|' -f20 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
ADDRESS_4=`print $line | cut -d'|' -f18 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`

COUNTRY_NAME=${COUNTRY}

if [[ ${COUNTRY_NAME} == "US" || ${COUNTRY_NAME} == "USA" || ${COUNTRY_NAME} == "UNITED STATES" || ${COUNTRY_NAME} == "UNITED STATES OF AMERICA" ]]
then
COUNTRY_CODE="USA"
else
COUNTRY_CODE=`print ${COUNTRY} | sed 's/ //g' | cut -c1-3`
fi

#POSTCODE=`print $line | cut -d'|' -f19 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}' |cut -c1-5`
if [[ ${COUNTRY_CODE} == "USA" ]]
then
STATE_CODE=${ADDRESS_4}
POSTCODE=`print $line | cut -d'|' -f19 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}' |cut -c1-5`

FOREIGN_PROVINCE=
FOREIGN_POSTAL_CODE=
else
STATE_CODE=
POSTCODE=
FOREIGN_PROVINCE=${ADDRESS_4}
FOREIGN_POSTAL_CODE=`print $line | cut -d'|' -f19 | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}'`
fi

PROCESSBODY="CDCI|"
PROCESSBODY="${PROCESSBODY}${ENTITY_TYPE}|"
PROCESSBODY="${PROCESSBODY}${CUSTOMER_ID}|"
PROCESSBODY="${PROCESSBODY}${BRANCH_NUMBER}|"
PROCESSBODY="${PROCESSBODY}${ACCOUNT_NUMBER}|"
PROCESSBODY="${PROCESSBODY}${ACCOUNT_DATE_OPEN}|"
PROCESSBODY="${PROCESSBODY}${CORPORATE_NAME}|"
PROCESSBODY="${PROCESSBODY}${LAST_NAME}|"
PROCESSBODY="${PROCESSBODY}${FIRST_NAME}|"
PROCESSBODY="${PROCESSBODY}${MIDDLE_NAME}|"
PROCESSBODY="${PROCESSBODY}${NAME_SUFFIX}|"
PROCESSBODY="${PROCESSBODY}${PERSON_GENDER}|"
PROCESSBODY="${PROCESSBODY}${BIRTH_DATE}|"
PROCESSBODY="${PROCESSBODY}${AGE}|"
PROCESSBODY="${PROCESSBODY}${CITIZENSHIP_CODE}|"
PROCESSBODY="${PROCESSBODY}${FED_ID}|"
PROCESSBODY="${PROCESSBODY}${FED_ID_TYPE}|"
PROCESSBODY="${PROCESSBODY}${NATIONAL_ID}|"
PROCESSBODY="${PROCESSBODY}${NATIONAL_ID_TYPE}|"
PROCESSBODY="${PROCESSBODY}${STREET_ADDRESS}|"
PROCESSBODY="${PROCESSBODY}${CITY_NAME}|"
PROCESSBODY="${PROCESSBODY}${STATE_CODE}|"
PROCESSBODY="${PROCESSBODY}${POSTCODE}|"
PROCESSBODY="${PROCESSBODY}${FOREIGN_PROVINCE}|"
PROCESSBODY="${PROCESSBODY}${FOREIGN_POSTAL_CODE}|"
PROCESSBODY="${PROCESSBODY}${COUNTRY_NAME}|"
PROCESSBODY="${PROCESSBODY}${COUNTRY_CODE}"

print "${PROCESSBODY}" >> ${INQ_TEMP_FILE}
done < ${EDD_HOME}/tmp/inquiry.txt

}
# 2  
Old 12-13-2007
looks like you could have done all of this in ONE awk program withOUT constant chopping of lines with print|sed|cut|awk.
# 3  
Old 12-13-2007
vgersh99,
can you please let me know how to use
cut and sed inside awk program?

Thanks & Regards
# 4  
Old 12-13-2007
example
how to write below line inside awk
ACCOUNT_DATE_OPEN=`print "$line" | cut -d'|' -f3 |sed 's/[^0-9]//g' | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}' | cut -c1-8`
# 5  
Old 12-13-2007
Quote:
Originally Posted by icefish
vgersh99,
can you please let me know how to use
cut and sed inside awk program?

Thanks & Regards
you don't need to use cut/sed - awk provides most of the cut/sed functions natively.
# 6  
Old 12-13-2007
Quote:
Originally Posted by icefish
example
how to write below line inside awk
ACCOUNT_DATE_OPEN=`print "$line" | cut -d'|' -f3 |sed 's/[^0-9]//g' | awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}' | cut -c1-8`
what does the 'line' look like AND what part of it you want?
A sample pls!
# 7  
Old 12-13-2007
example of line
COL001 | P | 2007-02-01-00.00.00.000000 | | sam | babu | | | M | 1949-01-04-00.00.00.000000 | INDIA | | C60 | | 110 S | | ENNIS | IN | 46563 | INDIA |
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

URGENT Reading a file and assessing the syntax shell script URGENT

I am trying to write a shell script which takes an input file as an arguement in the terminal e.g. bash shellscriptname.sh input.txt. I would like for the file to be read line by line each time checking if the .txt file contains certain words or letters(validating the syntax). If the line being... (1 Reply)
Discussion started by: Gurdza32
1 Replies

2. What is on Your Mind?

Baiduspider and Forum Performance Issues

For years we blocked Baiduspider due to the fact their bots do not obey the robots.txt directive and can really hurt site performance when they unleash 100 bots on the site each pulling pages many times per second. Last year, I unblocked Baiduspider's IP addresses, and now the problem is back. ... (1 Reply)
Discussion started by: Neo
1 Replies

3. Shell Programming and Scripting

Performance problem in Shell Script

Hi, I am Shell script beginner. I wrote a shell programming that will take each line of a file1 and search for it in another file2 and give me the output of the lines that do not exist in the file2. I wrote it using do while nested loop but the problem here is its running for ever . Is there... (12 Replies)
Discussion started by: sakthisivi
12 Replies

4. AIX

AIX 6.1 Memory Performance issues

Good Day Everyone, Just wonder anyone has encounter AIX 6.1 Memory Performance issues ? What I have in my current scenario is we have 3 datastage servers (Segregate server and EE jobs - for those who know Datastage achitect) and 2 db servers(running HA to load balance 4 nodes partitions for... (3 Replies)
Discussion started by: ckwan
3 Replies

5. Solaris

Getcwd performance issues

Hello everyone, recently we have been experiencing performance issues with chmod. We managed to narrow it down to getcwd. The following folder exists: /Folder1/subfol1/subfol2/subfol3 cd /Folder1/subfol1/subfol2/subfol3 truss -D pwd 2>&1 | grep getcwd 0.0001... (4 Replies)
Discussion started by: KotekBury
4 Replies

6. AIX

Performance issues for LPAR with GPFS 3.4

Hi, We have GPFS 3.4 Installed on two AIX 6.1 Nodes. We have 3 GPFS Mount points: /abc01 4TB (Comprises of 14 x 300GB disks from XIV SAN) /abc02 4TB (Comprises of 14 x 300GB disks from XIV SAN) /abc03 1TB ((Comprises of Multiple 300GB disks from XIV SAN) Now these 40... (1 Reply)
Discussion started by: aixromeo
1 Replies

7. UNIX for Dummies Questions & Answers

Awk Performance Issues

Hi All, I'm facing an issue in my awk script. The script is processing a large text file having the details of a number of persons, each person's details being written from 100 to 250 tags as given below: 100 START| 101klklk| ... 245 opr| 246 55| 250 END| 100 START| ... 245 pp| 246... (4 Replies)
Discussion started by: pgp_acc1
4 Replies

8. Shell Programming and Scripting

Script Performance problem . urgent frnds

HI frnds I have one flat with data and am loading the data into oracle table. While loading , rejected records are captured in log file. Now I want to read the log file and get the all rejected records and the reason for the rejection. I developed the script . its finding 5000 rejected... (7 Replies)
Discussion started by: Gopal_Engg
7 Replies

9. Solaris

raidctl performance issues

using the internal 2 drives mirror was created using raidctl on 100's of our servers . sometime when one drive fails we dont face any issue & we replace the drive with out any problem . but sometimes when one drive fails , system becomes unresponsive and doesnot allow us to login , the only way to... (1 Reply)
Discussion started by: skamal4u
1 Replies

10. UNIX for Advanced & Expert Users

Performance of a shell script

Hiii, I wrote a shell script for testing purpose. I have to test around 200thousand entries with the script.When i am doing only for 6000 entries its taking almost 1hour.If i test the whole testingdata it will take huge amount of time. I just want to know is it something dependent on the... (2 Replies)
Discussion started by: namishtiwari
2 Replies
Login or Register to Ask a Question