Sponsored Content
Top Forums Shell Programming and Scripting Help 'speeding' up this 'parsing' script - taking 24+ hours to run Post 303015124 by newbie_01 on Wednesday 28th of March 2018 02:43:17 PM
Old 03-28-2018
Help 'speeding' up this 'parsing' script - taking 24+ hours to run

Hi,

I've written a ksh script that read a file and parse/filter/format each line. The script runs as expected but it runs for 24+ hours for a file that has 2million lines. And sometimes, the input file has 10million lines which means it can be running for more than 2 days and still not finish. And of course, SA's been chasing me up as it is showing in top as running like forever.

I need some advise on maybe instead of reading one line at a time, I can run an awk one liner instead. I wish I can code it in Perl but not sure how to. Most says it is faster in Perl but not sure how to use Perl-like equivalence of the UNIX command besides using system Smilie

Anyway, hopefully I can interest someone into looking into this.

Below is the excerpt / part of the script that is taking the most time:

Code:
for LOG in *search_string_found.out
#for LOG in *xyz
do
   server_db=`echo $LOG | awk -F"_" '{ print $1 }'`
   server_app=`echo $LOG | awk -F"_" '{ print $2 }'`
   echo "- [ `date` ] // `wc -l $LOG | awk '{ print $1 }'` lines ==> Processing $LOG // ${server_db} from ${server_app}"

#while IFS="*" read TS CS HOST RESULT SERVICE RETURNCODE
oIFS=$IFS
while read line
do
   IFS="*"
   echo "$line" | read TS CS HOST RESULT SERVICE RETURNCODE
   timestamp=`echo $TS | awk '{ print $2 }'`
   year=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $3 }'`
   day=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $1 }'`
   month=`echo $TS | awk '{ print $1 }' | awk -F"-" '{ print $2 }'`

   case $month in
      "JAN" ) mm="01" ;;
      "FEB" ) mm="02" ;;
      "MAR" ) mm="03" ;;
      "APR" ) mm="04" ;;
      "MAY" ) mm="05" ;;
      "JUN" ) mm="06" ;;
      "JUL" ) mm="07" ;;
      "AUG" ) mm="08" ;;
      "SEP" ) mm="09" ;;
      "OCT" ) mm="10" ;;
      "NOV" ) mm="11" ;;
      "DEC" ) mm="12" ;;
   esac
   TS2="$year-$mm-$day $timestamp"

   program=`echo $CS | awk -F"(" '{ print $4 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   user=`echo $CS | awk -F"(" '{ print $6 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   service_name=`echo $CS | awk -F"(" '{ print $8 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`

   app_protocol=`echo $HOST | awk -F"(" '{ print $3 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   app_host=`echo $HOST | awk -F"(" '{ print $4 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`
   app_port=`echo $HOST | awk -F"(" '{ print $5 }' | awk -F"=" '{ print $2 }' | awk -F")" '{ print $1}'`

   #echo "- line = $line"
   #echo "- timestamp = $TS"
   #echo "  TS2 = $TS2"
   #echo "- connectstring = $CS"
   #echo "  program = $program"
   #echo "  user = $user"
   #echo "  service_name = $service_name"
   #echo "- host = $HOST"
   #echo "  app_protocol = $app_protocol"
   #echo "  app_host = $app_host"
   #echo "  app_port = $app_port"
   #echo "- result = $RESULT"
   #echo "- service = $SERVICE"
   #echo "- returncode = $RETURNCODE"
   #echo "-------------------------------------------------------------"
   #echo

   RETURNCODE=`echo $RETURNCODE | sed "s/ *//g"`
   detail="$TS2^${server_db}^${server_app} = ${app_host}^$program^$user^${service_name}^$RETURNCODE^$line"
   #echo "${detail}" | tee -a ${f_report}
   echo "${detail}" >> ${f_report}
   IFS=$oIFS
done <  $LOG

Below are example entries of the input file that the script reads, it can be 2million lines at least and go to as much as 10million lines. I've change entries as they are customer data.

Code:
04-MAR-2018 03:19:01 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=test_app.abcde.xx.yy)(CID=(PROGRAM=chrome)(HOST=mnl0ia9b5.abcde.xx.yy)(USER=mickeyp0))(INSTANCE_NAME=test3)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.66.90.101)(PORT=60791)) * establish * test_app.abcde.xx.yy * 0
04-MAR-2018 03:19:01 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=test_app.abcde.xx.yy)(CID=(PROGRAM=chrome)(HOST=mnl0ia9b5.abcde.xx.yy)(USER=mickeyp0))(INSTANCE_NAME=test3)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.66.90.101)(PORT=60795)) * establish * test_app.abcde.xx.yy * 0
04-MAR-2018 03:21:07 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=mickeyp0))(SERVER=DEDICATED)(SERVICE_NAME=test_app.abcde.xx.yy)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.66.90.7)(PORT=14582)) * establish * test_app.abcde.xx.yy * 0
04-MAR-2018 03:22:25 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=mickeyp0))(SERVER=DEDICATED)(SERVICE_NAME=test_app.abcde.xx.yy)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.66.90.7)(PORT=15176)) * establish * test_app.abcde.xx.yy * 0
04-MAR-2018 03:24:01 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=test_app.abcde.xx.yy)(CID=(PROGRAM=chrome)(HOST=mnl0ia9b5.abcde.xx.yy)(USER=mickeyp0))(INSTANCE_NAME=test3)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.66.90.101)(PORT=60881)) * establish * test_app.abcde.xx.yy * 0
04-MAR-2018 03:24:01 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=test_app.abcde.xx.yy)(CID=(PROGRAM=chrome)(HOST=mnl0ia9b5.abcde.xx.yy)(USER=mickeyp0))(INSTANCE_NAME=test3)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.66.90.101)(PORT=60885)) * establish * test_app.abcde.xx.yy * 0
04-MAR-2018 03:29:01 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=test_app.abcde.xx.yy)(CID=(PROGRAM=chrome)(HOST=mnl0ia9b5.abcde.xx.yy)(USER=mickeyp0))(INSTANCE_NAME=test3)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.66.90.101)(PORT=60965)) * establish * test_app.abcde.xx.yy * 0
04-MAR-2018 03:29:01 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=test_app.abcde.xx.yy)(CID=(PROGRAM=chrome)(HOST=mnl0ia9b5.abcde.xx.yy)(USER=mickeyp0))(INSTANCE_NAME=test3)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.66.90.101)(PORT=60969)) * establish * test_app.abcde.xx.yy * 12514
04-MAR-2018 03:29:02 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=test_app.abcde.xx.yy)(CID=(PROGRAM=xyzimain)(HOST=mnl0ia9b5.abcde.xx.yy)(USER=mickeyp0))(INSTANCE_NAME=test3)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.66.90.101)(PORT=60973)) * establish * test_app.abcde.xx.yy * 0
04-MAR-2018 03:57:10 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=mickeyp0))(SERVER=DEDICATED)(SERVICE_NAME=test_app.abcde.xx.yy)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.66.90.7)(PORT=24394)) * establish * test_app.abcde.xx.yy * 12514

What I am wanting to do really in simplest term is as below:
  • Change the date format to YYYY-MM-DD. Main reason being is it is most convenient sorting in this date format
  • Filter some information from each line, i.e host name, IP, program name, service name, return code etc.

I then re-direct these formatted line/record to a file that I can check group by return code value or simply do a sort | uniq -c so it displays and show a count of occurrence.

Any advice much appreciated. Thanks in advance.
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

FTP taking ages to run.

Hi every one, We have HP UX server which normally loaded as avg load of 19-21. NOw when I try and do ftp to this server it takes ages to get the FTP prompt. I have seen this server loaded as max agv load of 35-40 tht time we never had such problems of FTP sessions. Now my new Unix admin... (1 Reply)
Discussion started by: nilesrex
1 Replies

2. Shell Programming and Scripting

How to make a script run for a maximum of "x" number of hours only

How to make a script run for a maximum of "x" number of hours only (7 Replies)
Discussion started by: ScriptDummy
7 Replies

3. UNIX for Dummies Questions & Answers

Speeding up a Shell Script (find, grep and a for loop)

Hi all, I'm having some trouble with a shell script that I have put together to search our web pages for links to PDFs. The first thing I did was: ls -R | grep .pdf > /tmp/dave_pdfs.outWhich generates a list of all of the PDFs on the server. For the sake of arguement, say it looks like... (8 Replies)
Discussion started by: Dave Stockdale
8 Replies

4. HP-UX

Crontab do not run on PM hours

Hi All I have a problem, I wonder if you can help me sort it out: I have the following entry in the cron: 00 1,13 * * * /home/report/opn_amt_gestores_credito.ksh > opn_amt_gestores_credito.log But the entry only runs at 01:07 I have stopped the cron deamon, and started, but it still... (39 Replies)
Discussion started by: fretagi
39 Replies

5. Shell Programming and Scripting

Parsing log file for last 2 hours

I want to parse a log file which i am grepping root user connection but is showing whole day and previous day detail as well. First i want to see last 2 hours log file then after that i want to search particular string. Lets suppose right now its 5:00PM, So i want to see the log of 3:00PM to... (6 Replies)
Discussion started by: learnbash
6 Replies

6. Shell Programming and Scripting

Help speeding up script

This is my first experience writing unix script. I've created the following script. It does what I want it to do, but I need it to be a lot faster. Is there any way to speed it up? cat 'Tax_Provision_Sample.dat' | sort | while read p; do fn=`echo $p|cut -d~ -f2,4,3,8,9`; echo $p >> "$fn.txt";... (20 Replies)
Discussion started by: JohnN6
20 Replies

7. UNIX for Advanced & Expert Users

Zip million files taking 12 hours or more

Hi I have task to zip files based on modified time but they are in millions and it is taking lot of time more than 12 hours and also eating up high cpu is there any other / better way to handle it quickly with less cpu consumptionfind . ! -name \"*.gz\" -mtime +7 -type f | grep -v '/.*/' |... (2 Replies)
Discussion started by: reldb
2 Replies

8. Shell Programming and Scripting

Speeding up shell script with grep

HI Guys hoping some one can help I have two files on both containing uk phone numbers master is a file which has been collated over a few years ad currently contains around 4 million numbers new is a file which also contains 4 million number i need to split new nto two separate files... (4 Replies)
Discussion started by: dunryc
4 Replies

9. Shell Programming and Scripting

Run a command once in three hours

Hi All, I have a bash script which is scheduled to run for every 20 minutes. Inside the bash script, one command which I am using need to be triggered only once in two or three hours.Is there anyway to achieve this. For example, if then echo "hi" else echo "Hello" UNIX Command---once... (5 Replies)
Discussion started by: ginrkf
5 Replies

10. Shell Programming and Scripting

Help with speeding up my working script to take less time - how to use more CPU usage for a script

Hello experts, we have input files with 700K lines each (one generated for every hour). and we need to convert them as below and move them to another directory once. Sample INPUT:- # cat test1 1559205600000,8474,NormalizedPortInfo,PctDiscards,0.0,Interface,BG-CTA-AX1.test.com,Vl111... (7 Replies)
Discussion started by: prvnrk
7 Replies
All times are GMT -4. The time now is 10:19 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy