Sponsored Content
Top Forums Shell Programming and Scripting Help 'speeding' up this 'parsing' script - taking 24+ hours to run Post 303015326 by newbie_01 on Tuesday 3rd of April 2018 01:50:08 AM
Old 04-03-2018
Hi,

Sorry Corona688 and Don Cragun, I should have thought about how very so difficult and unfair of me not to post in an example output. Smilie

You are right that it is indeed a lot, lot, lot faster if it reads the whole file at once instead of line by one I kick off the script to run on a 10million lines over the weekend, I didn't get an easter miracle of any sort, it is still running at this time.

You can ignore or ideally forget the so horrible codes that I posted Smilie. Maybe I can explain what I've been trying to do as below.

So, here is an example raw input file, un-filtered

Code:
24-MAR-2018 07:59:52 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=42145)) * establish * testapp1_app.somewhere.out.ph * 0
24-MAR-2018 07:59:52 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=42149)) * establish * testapp1_app.somewhere.out.ph * 12514
24-MAR-2018 07:59:52 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=42153)) * establish * testapp1_app.somewhere.out.ph * 0
24-MAR-2018 07:59:52 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=42157)) * establish * testapp1_app.somewhere.out.ph * 12514
24-MAR-2018 07:59:52 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=42161)) * establish * testapp1_app.somewhere.out.ph * 12514
12-MAR-2018 10:04:38 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)(CID=(PROGRAM=sqlplus)(HOST=xxx00001.somewhere.out.ph)(USER=ogre01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.101)(PORT=12358)) * establish * testapp1_app.somewhere.out.ph * 12514
12-MAR-2018 16:23:09 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=11662)) * establish * testapp1_app.somewhere.out.ph * 12514
12-MAR-2018 16:23:09 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=11666)) * establish * testapp1_app.somewhere.out.ph * 12514
12-MAR-2018 16:23:09 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=11672)) * establish * testapp1_app.somewhere.out.ph * 12514
12-MAR-2018 16:23:09 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=11674)) * establish * testapp1_app.somewhere.out.ph * 0
12-MAR-2018 16:23:09 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=11680)) * establish * testapp1_app.somewhere.out.ph * 12514
12-MAR-2018 16:23:09 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=11682)) * establish * testapp1_app.somewhere.out.ph * 12514
12-MAR-2018 16:23:09 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=11686)) * establish * testapp1_app.somewhere.out.ph * 0
12-MAR-2018 16:23:09 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=11690)) * establish * testapp1_app.somewhere.out.ph * 12520
12-MAR-2018 16:23:09 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=11696)) * establish * testapp1_app.somewhere.out.ph * 12514

There can be million of these lines and at the moment, the script reads one line at a time and generate a formatted output like below.

Code:
2018-03-12 10:04:38  runserver01        = 66.65.60.101                testapp1_app.somewhere.out.ph       sqlplus         ogre01                    12514
2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 0
2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 0
2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12514
2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12514
2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12514
2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12514
2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12514
2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12514
2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12520
2018-03-24 07:59:52  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 0
2018-03-24 07:59:52  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 0
2018-03-24 07:59:52  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12514
2018-03-24 07:59:52  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12514
2018-03-24 07:59:52  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12514

I then use sort | uniq -c to do some sort of a count and comes up with below:

Code:
      1 2018-03-12 10:04:38  runserver01        = 66.65.60.101                testapp1_app.somewhere.out.ph       sqlplus         ogre01                    12514
      2 2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 0
      6 2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12514
      1 2018-03-12 16:23:09  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12520
      2 2018-03-24 07:59:52  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 0
      3 2018-03-24 07:59:52  runserver01        = 66.65.60.7                  JDBC Thin Client                    ogre01          testapp1_app.somewhere.out.ph 12514

All fields of the output file are from the input file with the exception of the second field that is showing up as runserver01. This is from running hostname. It doesn't have to be on the second field. it can be anywhere or can come in later on after all the filtering, it is just basically a way for me to figure out where I run the script from.

Most of the lines are of the following format:

Code:
12-MAR-2018 16:23:09 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=11662)) * establish * testapp1_app.somewhere.out.ph * 12514

Sometimes, it can be like below:

Code:
12-MAR-2018 10:04:38 *  (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)(CID=(PROGRAM=sqlplus)(HOST=xxx00001.somewhere.out.ph)(USER=ogre01)))  * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.101)(PORT=12358)) * establish *  testapp1_app.somewhere.out.ph * 12514

I don't know how to make awk differentiate between the two formats and filter/get the right information. Note that the information are in different order for these two strings.

And yes, running the whole file thru awk is faster instead of having to read one line at a time but I don't know how to get awk to do what I wanted so it comes up with the output format that I wanted.

I am looking at maybe do one run of awk changing the date format first and then the next awk is to filter out the CONNECT_DATA string into different parts.

But I can't figure out what to do, so for the first pass, I need to change

Code:
24-MAR-2018 07:59:52 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=42145)) * establish * testapp1_app.somewhere.out.ph * 0
12-MAR-2018 10:04:38 *  (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)(CID=(PROGRAM=sqlplus)(HOST=xxx00001.somewhere.out.ph)(USER=ogre01)))  * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.101)(PORT=12358)) * establish *  testapp1_app.somewhere.out.ph * 12514

to

Code:
2018-03-12 10:04:38 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph)(CID=(PROGRAM=sqlplus)(HOST=xxx00001.somewhere.out.ph)(USER=ogre01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.101)(PORT=12358)) * establish * testapp1_app.somewhere.out.ph * 12514
2018-03-24 07:59:52 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin  Client)(HOST=__jdbc__)(USER=ogre01))(SERVER=DEDICATED)(SERVICE_NAME=testapp1_app.somewhere.out.ph))  * (ADDRESS=(PROTOCOL=tcp)(HOST=66.65.60.7)(PORT=42145)) * establish *  testapp1_app.somewhere.out.ph * 0

How do I tell awk -F"*" to print $1 and the rest of the field with $1 to be further change to a YYYY-MM-DD format. The real reason behind formatting it to YYYY-MM-DD is because that works best for when doing the sort.

And then the next pass is supposed to filter it to be like

Code:
2018-03-12 10:04:38  runserver01        = 66.65.60.101                testapp1_app.somewhere.out.ph       sqlplus         ogre01                    12514
2018-03-24 07:59:52  runserver01        = 66.65.60.7                   JDBC Thin Client                    ogre01           testapp1_app.somewhere.out.ph 0

Or ideally be like

Code:
2018-03-12 16:23:09  runserver01        = 66.65.60.101                sqlplus                             ogre01          testapp1_app.somewhere.out.ph 12514
2018-03-24 07:59:52  runserver01        = 66.65.60.7                    JDBC Thin Client                    ogre01            testapp1_app.somewhere.out.ph 0

Please advise on how best to do what I am wanting to do. Apologies for not giving enough information earlier.

P.S:
That ksh script that I run processing a file that has 9890943 lines, it is still running, ps -o etime= -p 3036 says it has been running for 5-14:38:03, time to CTRL-C it Smilie
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

FTP taking ages to run.

Hi every one, We have HP UX server which normally loaded as avg load of 19-21. NOw when I try and do ftp to this server it takes ages to get the FTP prompt. I have seen this server loaded as max agv load of 35-40 tht time we never had such problems of FTP sessions. Now my new Unix admin... (1 Reply)
Discussion started by: nilesrex
1 Replies

2. Shell Programming and Scripting

How to make a script run for a maximum of "x" number of hours only

How to make a script run for a maximum of "x" number of hours only (7 Replies)
Discussion started by: ScriptDummy
7 Replies

3. UNIX for Dummies Questions & Answers

Speeding up a Shell Script (find, grep and a for loop)

Hi all, I'm having some trouble with a shell script that I have put together to search our web pages for links to PDFs. The first thing I did was: ls -R | grep .pdf > /tmp/dave_pdfs.outWhich generates a list of all of the PDFs on the server. For the sake of arguement, say it looks like... (8 Replies)
Discussion started by: Dave Stockdale
8 Replies

4. HP-UX

Crontab do not run on PM hours

Hi All I have a problem, I wonder if you can help me sort it out: I have the following entry in the cron: 00 1,13 * * * /home/report/opn_amt_gestores_credito.ksh > opn_amt_gestores_credito.log But the entry only runs at 01:07 I have stopped the cron deamon, and started, but it still... (39 Replies)
Discussion started by: fretagi
39 Replies

5. Shell Programming and Scripting

Parsing log file for last 2 hours

I want to parse a log file which i am grepping root user connection but is showing whole day and previous day detail as well. First i want to see last 2 hours log file then after that i want to search particular string. Lets suppose right now its 5:00PM, So i want to see the log of 3:00PM to... (6 Replies)
Discussion started by: learnbash
6 Replies

6. Shell Programming and Scripting

Help speeding up script

This is my first experience writing unix script. I've created the following script. It does what I want it to do, but I need it to be a lot faster. Is there any way to speed it up? cat 'Tax_Provision_Sample.dat' | sort | while read p; do fn=`echo $p|cut -d~ -f2,4,3,8,9`; echo $p >> "$fn.txt";... (20 Replies)
Discussion started by: JohnN6
20 Replies

7. UNIX for Advanced & Expert Users

Zip million files taking 12 hours or more

Hi I have task to zip files based on modified time but they are in millions and it is taking lot of time more than 12 hours and also eating up high cpu is there any other / better way to handle it quickly with less cpu consumptionfind . ! -name \"*.gz\" -mtime +7 -type f | grep -v '/.*/' |... (2 Replies)
Discussion started by: reldb
2 Replies

8. Shell Programming and Scripting

Speeding up shell script with grep

HI Guys hoping some one can help I have two files on both containing uk phone numbers master is a file which has been collated over a few years ad currently contains around 4 million numbers new is a file which also contains 4 million number i need to split new nto two separate files... (4 Replies)
Discussion started by: dunryc
4 Replies

9. Shell Programming and Scripting

Run a command once in three hours

Hi All, I have a bash script which is scheduled to run for every 20 minutes. Inside the bash script, one command which I am using need to be triggered only once in two or three hours.Is there anyway to achieve this. For example, if then echo "hi" else echo "Hello" UNIX Command---once... (5 Replies)
Discussion started by: ginrkf
5 Replies

10. Shell Programming and Scripting

Help with speeding up my working script to take less time - how to use more CPU usage for a script

Hello experts, we have input files with 700K lines each (one generated for every hour). and we need to convert them as below and move them to another directory once. Sample INPUT:- # cat test1 1559205600000,8474,NormalizedPortInfo,PctDiscards,0.0,Interface,BG-CTA-AX1.test.com,Vl111... (7 Replies)
Discussion started by: prvnrk
7 Replies
All times are GMT -4. The time now is 09:56 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy