Help with speeding up my working script to take less time - how to use more CPU usage for a script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with speeding up my working script to take less time - how to use more CPU usage for a script
# 1  
Old 06-09-2019
Help with speeding up my working script to take less time - how to use more CPU usage for a script

Hello experts,

we have input files with 700K lines each (one generated for every hour). and we need to convert them as below and move them to another directory once.

Sample INPUT:-
Code:
[root@tst01 INPUT]#  cat test1
1559205600000,8474,NormalizedPortInfo,PctDiscards,0.0,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,HistoricalInterfaceSpeed,1000000000,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,SpeedIn,1000000000,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,FrameSizeIn,209.65929490852145,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,PctDiscardsIn,0.0,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,NonunicastIn,124,Interface,BG-CTA-AX1.test.com,Vl111
[root@tst01 INPUT]#

Sample output:-
Code:
[root@tst01 INPUT]#  cat ../OUTPUT/test1
TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;PctDiscards;0.0;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;HistoricalInterfaceSpeed;1000000000;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;SpeedIn;1000000000;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;FrameSizeIn;209.65929490852145;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;PctDiscardsIn;0.0;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;NonunicastIn;124;Interface;Vl111
[root@tst01 INPUT]#

I wrote a script which is working - what does is to convert epoch time to normal date in the 1st column, replace 2nd column with fixed values (86400) and remap the remaining columns as they are into different columns
The problem here is that my script processing ~ 40 lines / second resulting only 144K lines are done in an hour. we need to finish all 700K in <1 hour. CPU usage is just 12% of 1 core where it has 12-cores in single CPU. How could I improve its speed (in terms of script) and how could I let my script use all CPU cores to do parellel processing?
THANKS

My working script but processes only 40 lines per second
Code:
cat /usr/local/bin/script.sh
#!/bin/bash
BASEDIR=/tmp/tsight
INPUTDIR=${BASEDIR}/INPUT
OUTPUTDIR=${BASEDIR}/OUTPUT
DONEDIR=${BASEDIR}/DONE
mkdir -p ${BASEDIR}/INPUT
mkdir -p ${BASEDIR}/OUTPUT
mkdir -p ${BASEDIR}/DONE
cd ${INPUTDIR}
for inp in *
do
tail -n +2 ${inp} >/tmp/tempp
\mv /tmp/tempp ${inp}
echo "TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE" >${OUTPUTDIR}/${inp}
cat ${inp} | while read line
do
TIMES=`echo "${line}" |awk -F, '{print $1}'`
DURA="86400"
INTMF=`echo "${line}" |awk -F, '{print $3}'`
METRIC=`echo "${line}" |awk -F, '{print $4}'`
VALU=`echo "${line}" |awk -F, '{print $5}'`
MFDISP=`echo "${line}" |awk -F, '{print $6}'`
DEVC=`echo "${line}" |awk -F, '{print $7}'`
CNAME=`echo "${line}" |awk -F, '{print $8}'`
#TIMES=`echo "${line}" |awk -F, '{print $1}'`
NON_MIL=`expr "${TIMES}" / 1000`
EPO2DT=`date -d @${NON_MIL} '+%Y-%m-%d %H:%M:%S'`
echo "${EPO2DT};${DURA};${DEVC};${INTMF};${METRIC};${VALU};${MFDISP};${CNAME}" >>${OUTPUTDIR}/${inp}
done
\mv ${BASEDIR}/INPUT/${inp} ${DONEDIR}
done

[root@tst01 INPUT]#


Last edited by Scrutinizer; 06-10-2019 at 02:17 AM.. Reason: quote tags -> code tags
# 2  
Old 06-09-2019
No surprise the execution of your script is a bit sluggish - you execute 16 processes per input line. As you are using awk anyhow, why not do the entire thing with it?
# 3  
Old 06-09-2019
Try this, save it as into file small.awk
Code:
BEGIN {
FS=","
OFS=";"
}
{
$1=strftime("%Y-%m-%d %H:%M:%S")
$2=86400
print $1,$2,$7,$3,$4,$5,$6,$NF
}

Run as :
awk -f small.awk test1 > ../OUTPUT/test1_done
See if this speeds up processing of one file.

Please specify the operating system in the future, when making such requests.
Due to date invocation in your script, i would figure linux.

Hope that helps
Regards
Peasant.

Last edited by Peasant; 06-09-2019 at 03:14 PM.. Reason: Mistake.
This User Gave Thanks to Peasant For This Post:
# 4  
Old 06-09-2019
Thanks Rudic & Peasant.

@Peasant - your solution worked awesome, didn't even take 1 second to finish a file.
# 5  
Old 06-09-2019
Three comments on Peasant's fine proposal:
- not all awk versions provide strftime(); gawk may be required.
- calling strftime() without a time stamp will return the system time; insert $1 for the desired output. Eliminating nanoseconds from it may be required.
- a heading was required.



For awks without strftime(), try (reducing process count as far as possible)


Code:
paste -d, <(date +"%Y-%m-%d %H:%M:%S" -f<(sed 's/^/@/; s/000,.*$//' file)) <(cut -d, -f2- file) | 
awk -F, -vOFS=";" '
BEGIN   {print "TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE"
        }
        {$2 = "86400" OFS $7
         $7 = $8; NF--
        }
1
'
TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;PctDiscards;0.0;Interface;Vl111
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;HistoricalInterfaceSpeed;1000000000;Interface;Vl111
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;SpeedIn;1000000000;Interface;Vl111
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;FrameSizeIn;209.65929490852145;Interface;Vl111
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;PctDiscardsIn;0.0;Interface;Vl111
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;NonunicastIn;124;Interface;Vl111

These 2 Users Gave Thanks to RudiC For This Post:
# 6  
Old 06-09-2019
Thanks Rudic for your script. My server is running RHEL 6 so i guess its awk has that capability.

I marked this thread as "solved"
# 7  
Old 06-09-2019
Even the shell becones faster if builtins are used: let the read command read the fields into variables, use $(( )) rather than expr, write the output file in one stream.
Code:
# set constants before the loop
DURA="86400"
cd "$INPUTDIR" || exit
for inp in *
do
  # the following code block has redirected stdin and stdout
  {
  # delete header line#1 
  read x
  # write header
  echo "TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE"
  while read TIMES x INTMF METRIC VALU MFDISP DEVC CNAME x
  do
    NON_MIL=$(( TIMES / 1000 ))
    EPO2DT=`date -d @${NON_MIL} '+%Y-%m-%d %H:%M:%S'`
    echo "$EPO2DT;$DURA;$DEVC;$INTMF;$METRIC;$VALU;$MFDISP;$CNAME"
  done
  } <"$inp"  >"$OUTPUTDIR/$inp"
  # after the block the files are closed
  \mv "$inp" "$DONEDIR"
done

This User Gave Thanks to MadeInGermany For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with Shell script that monitors CPU Usage

I'm a newbie to shell scripting, I was given this script to modify. This script that monitors when CPU Usage is too high based off the top command. The comparison is not working as it should. Its comparing a decimal to a regualar interger. When it send me an email, it send an email and ignores the... (21 Replies)
Discussion started by: mhannor
21 Replies

2. Linux

Ps command on cpu usage and time

Hi All, Am very new to Linux and unix ...need below help . need to list of process consuming more than 40% cpu and which are older than 10 days of a particular user .... Thanks V (4 Replies)
Discussion started by: venky456
4 Replies

3. Shell Programming and Scripting

Script for CPU usage -Linux

Hi all I was wondering if its possible to write a script to keep CPU usage at 90%-95%? for a single cpu linux server? I have a perl script I run on servers with multple cpu's and all I do is max all but one cpu to get into the 90'% utilised area. I now need a script that raises the CPU to... (4 Replies)
Discussion started by: sudobash
4 Replies

4. Shell Programming and Scripting

CPU usage script

Hello Friends, I am trying to create a shell script which will check the CPU utilization. I use command top to check the %CPU usage. It give s me below output Cpu states: CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS 0 0.31 9.6% 0.0% 6.1% 84.3% 0.0% 0.0%... (3 Replies)
Discussion started by: Nakul_sh
3 Replies

5. Shell Programming and Scripting

Shell script to calculate the max cpu usage from the main script

Hi All, I have a script which does report the cpu usuage, there are few output parameter/fields displayed from the script. My problem is I have monitor the output and decide which cpu number (column 2) has maximum value (column 6). Since the output is displayed/updated every seconds, it's very... (1 Reply)
Discussion started by: Optimus81
1 Replies

6. Shell Programming and Scripting

Shell script for logging cpu and memory usage of a Linux process

I am looking for a way to log and graphically display cpu and RAM usage of linux processes over time. Since I couldn't find a simple tool to so (I tried zabbix and munin but installation failed) I started writing a shell script to do so The script file parses the output of top command through... (2 Replies)
Discussion started by: andy_dufresne
2 Replies

7. HP-UX

Perl script limit cpu usage

Hi Experts, I am executing multiple instances(in parallel) of perl script on HP-UX box. OS is allocating substantial amount of CPU to these perl processes,resulting higher cpu utilization. Glance always shows perl processes are occupying majority of the CPU resource. It is causing slower... (2 Replies)
Discussion started by: sai_2507
2 Replies

8. Shell Programming and Scripting

shell script to alert cpu memory and disk usage help please

Hi all can any one help me to script monitoring CPU load avg when reaches threshold value and disk usage if it exceeds some % tried using awk but when df -h out put is in two different lines awk doesnt work for the particular output in two different line ( output for df -h is in two... (7 Replies)
Discussion started by: robo
7 Replies

9. AIX

Script to identify high CPU usage processes

Hi Guys, I need to write a script capable of identifying when a high cpu utilitzation process. It sounds simple but we are on a AIX 5.3 environment with Virtual CPU's (VP's) and logical CPU's. Please any ideas or tips would be highly appreciated. Thanks. Harby. (6 Replies)
Discussion started by: arizah
6 Replies

10. Shell Programming and Scripting

Help with bash script - Need to get CPU usage as a percentage

I'm writing a bash script to log some selections from a sensors output (core temp, mb temp, etc.) and I would also like to have the current cpu usage as a percentage. I have no idea how to go about getting it in a form that a bash script can use. For example, I would simply look in the output of... (3 Replies)
Discussion started by: graysky
3 Replies
Login or Register to Ask a Question