Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


Help with speeding up my working script to take less time - how to use more CPU usage for a script


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Help with speeding up my working script to take less time - how to use more CPU usage for a script

Hello experts,

we have input files with 700K lines each (one generated for every hour). and we need to convert them as below and move them to another directory once.

Sample INPUT:-
Code:
[root@tst01 INPUT]#  cat test1
1559205600000,8474,NormalizedPortInfo,PctDiscards,0.0,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,HistoricalInterfaceSpeed,1000000000,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,SpeedIn,1000000000,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,FrameSizeIn,209.65929490852145,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,PctDiscardsIn,0.0,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,NonunicastIn,124,Interface,BG-CTA-AX1.test.com,Vl111
[root@tst01 INPUT]#

Sample output:-
Code:
[root@tst01 INPUT]#  cat ../OUTPUT/test1
TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;PctDiscards;0.0;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;HistoricalInterfaceSpeed;1000000000;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;SpeedIn;1000000000;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;FrameSizeIn;209.65929490852145;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;PctDiscardsIn;0.0;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;NonunicastIn;124;Interface;Vl111
[root@tst01 INPUT]#

I wrote a script which is working - what does is to convert epoch time to normal date in the 1st column, replace 2nd column with fixed values (86400) and remap the remaining columns as they are into different columns
The problem here is that my script processing ~ 40 lines / second resulting only 144K lines are done in an hour. we need to finish all 700K in <1 hour. CPU usage is just 12% of 1 core where it has 12-cores in single CPU. How could I improve its speed (in terms of script) and how could I let my script use all CPU cores to do parellel processing?
THANKS

My working script but processes only 40 lines per second
Code:
cat /usr/local/bin/script.sh
#!/bin/bash
BASEDIR=/tmp/tsight
INPUTDIR=${BASEDIR}/INPUT
OUTPUTDIR=${BASEDIR}/OUTPUT
DONEDIR=${BASEDIR}/DONE
mkdir -p ${BASEDIR}/INPUT
mkdir -p ${BASEDIR}/OUTPUT
mkdir -p ${BASEDIR}/DONE
cd ${INPUTDIR}
for inp in *
do
tail -n +2 ${inp} >/tmp/tempp
\mv /tmp/tempp ${inp}
echo "TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE" >${OUTPUTDIR}/${inp}
cat ${inp} | while read line
do
TIMES=`echo "${line}" |awk -F, '{print $1}'`
DURA="86400"
INTMF=`echo "${line}" |awk -F, '{print $3}'`
METRIC=`echo "${line}" |awk -F, '{print $4}'`
VALU=`echo "${line}" |awk -F, '{print $5}'`
MFDISP=`echo "${line}" |awk -F, '{print $6}'`
DEVC=`echo "${line}" |awk -F, '{print $7}'`
CNAME=`echo "${line}" |awk -F, '{print $8}'`
#TIMES=`echo "${line}" |awk -F, '{print $1}'`
NON_MIL=`expr "${TIMES}" / 1000`
EPO2DT=`date -d @${NON_MIL} '+%Y-%m-%d %H:%M:%S'`
echo "${EPO2DT};${DURA};${DEVC};${INTMF};${METRIC};${VALU};${MFDISP};${CNAME}" >>${OUTPUTDIR}/${inp}
done
\mv ${BASEDIR}/INPUT/${inp} ${DONEDIR}
done

[root@tst01 INPUT]#


Last edited by Scrutinizer; 06-10-2019 at 02:17 AM.. Reason: quote tags -> code tags
# 2  
No surprise the execution of your script is a bit sluggish - you execute 16 processes per input line. As you are using awk anyhow, why not do the entire thing with it?
# 3  
Try this, save it as into file small.awk
Code:
BEGIN {
FS=","
OFS=";"
}
{
$1=strftime("%Y-%m-%d %H:%M:%S")
$2=86400
print $1,$2,$7,$3,$4,$5,$6,$NF
}

Run as :
awk -f small.awk test1 > ../OUTPUT/test1_done
See if this speeds up processing of one file.

Please specify the operating system in the future, when making such requests.
Due to date invocation in your script, i would figure linux.

Hope that helps
Regards
Peasant.

Last edited by Peasant; 06-09-2019 at 03:14 PM.. Reason: Mistake.
This User Gave Thanks to Peasant For This Post:
# 4  
Thanks Rudic & Peasant.

@Peasant - your solution worked awesome, didn't even take 1 second to finish a file.
# 5  
Three comments on Peasant's fine proposal:
- not all awk versions provide strftime(); gawk may be required.
- calling strftime() without a time stamp will return the system time; insert $1 for the desired output. Eliminating nanoseconds from it may be required.
- a heading was required.



For awks without strftime(), try (reducing process count as far as possible)


Code:
paste -d, <(date +"%Y-%m-%d %H:%M:%S" -f<(sed 's/^/@/; s/000,.*$//' file)) <(cut -d, -f2- file) | 
awk -F, -vOFS=";" '
BEGIN   {print "TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE"
        }
        {$2 = "86400" OFS $7
         $7 = $8; NF--
        }
1
'
TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;PctDiscards;0.0;Interface;Vl111
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;HistoricalInterfaceSpeed;1000000000;Interface;Vl111
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;SpeedIn;1000000000;Interface;Vl111
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;FrameSizeIn;209.65929490852145;Interface;Vl111
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;PctDiscardsIn;0.0;Interface;Vl111
2019-05-30 10:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;NonunicastIn;124;Interface;Vl111

These 2 Users Gave Thanks to RudiC For This Post:
# 6  
Thanks Rudic for your script. My server is running RHEL 6 so i guess its awk has that capability.

I marked this thread as "solved"
# 7  
Even the shell becones faster if builtins are used: let the read command read the fields into variables, use $(( )) rather than expr, write the output file in one stream.
Code:
# set constants before the loop
DURA="86400"
cd "$INPUTDIR" || exit
for inp in *
do
  # the following code block has redirected stdin and stdout
  {
  # delete header line#1 
  read x
  # write header
  echo "TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE"
  while read TIMES x INTMF METRIC VALU MFDISP DEVC CNAME x
  do
    NON_MIL=$(( TIMES / 1000 ))
    EPO2DT=`date -d @${NON_MIL} '+%Y-%m-%d %H:%M:%S'`
    echo "$EPO2DT;$DURA;$DEVC;$INTMF;$METRIC;$VALU;$MFDISP;$CNAME"
  done
  } <"$inp"  >"$OUTPUTDIR/$inp"
  # after the block the files are closed
  \mv "$inp" "$DONEDIR"
done

This User Gave Thanks to MadeInGermany For This Post:
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Script for CPU usage -Linux
sudobash
Hi all I was wondering if its possible to write a script to keep CPU usage at 90%-95%? for a single cpu linux server? I have a perl script I run on servers with multple cpu's and all I do is max all but one cpu to get into the 90'% utilised area. I now need a script that raises the CPU to...... Shell Programming and Scripting
4
Shell Programming and Scripting
CPU usage script
Nakul_sh
Hello Friends, I am trying to create a shell script which will check the CPU utilization. I use command top to check the %CPU usage. It give s me below output Cpu states: CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS 0 0.31 9.6% 0.0% 6.1% 84.3% 0.0% 0.0%...... Shell Programming and Scripting
3
Shell Programming and Scripting
Shell script to calculate the max cpu usage from the main script
Optimus81
Hi All, I have a script which does report the cpu usuage, there are few output parameter/fields displayed from the script. My problem is I have monitor the output and decide which cpu number (column 2) has maximum value (column 6). Since the output is displayed/updated every seconds, it's very...... Shell Programming and Scripting
1
Shell Programming and Scripting
Perl script limit cpu usage
sai_2507
Hi Experts, I am executing multiple instances(in parallel) of perl script on HP-UX box. OS is allocating substantial amount of CPU to these perl processes,resulting higher cpu utilization. Glance always shows perl processes are occupying majority of the CPU resource. It is causing slower...... HP-UX
2
HP-UX
Help with bash script - Need to get CPU usage as a percentage
graysky
I'm writing a bash script to log some selections from a sensors output (core temp, mb temp, etc.) and I would also like to have the current cpu usage as a percentage. I have no idea how to go about getting it in a form that a bash script can use. For example, I would simply look in the output of...... Shell Programming and Scripting
3
Shell Programming and Scripting

Featured Tech Videos