Issue with awk script parsing log file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Issue with awk script parsing log file
# 8  
Old 05-30-2014
Apologies for confusion my requirement was to read the log file as it gets populated and print the counts, i am trying the piece meal approach hence my initial requirement was to get the count only. I did the below coding but the control is coming out as while loop reached last line of the log file it is reading, though technically the process is still running and writing into the log file slowly. Hence the reason i was trying to achieve this using while infinite loop but not sure how to exit from it or struggling to make it work. Please help if you have a better approach instead of reading twice with in awk itself or something else.

Code:
#!/bin/bash -x

while read line
do
COUNT=`awk '$3 ~ /X_fc_Loan/ && $6 == "[XMLTgt_FCSLOANS_Ver25_Norm])" {getline; if($5=="Requested:")l=$6} END { print l }' s_GenerateXMLDataFile.log`

if [ -z "$COUNT" ]
then
        COUNT="0"
fi

if [ "$COUNT1" != "$COUNT" ]
then
        echo "$COUNT"
fi
COUNT1="$COUNT"

ENDFLG=`echo "$line" | awk '{print $2" "$3" "$4" "$5}'`
if [ "$ENDFLG" = "TM_6020 Session [s_GenerateXMLDataFile] completed" ]
then
        break
fi

done < s_GenerateXMLDataFile.log


Last edited by Ariean; 05-30-2014 at 11:13 AM..
# 9  
Old 05-30-2014
If I understand correctly, you have a file that is being written by an unrelated process and you want to wait for the writer to finish writing before you take the next actions that will process that file. When the writer is finished, you want to store a value from the 6th field of the last line following a certain line in a shell variable. You know that the writer is finished writing you find a line in the file that contains four fields that when separated by single spaces will match the string TM_6020 Session [s_GenerateXMLDataFile] completed.

If that is a correct statement of what you're trying to do, the following should be a MUCH more efficient way to do it (using one invocation of tail, one invocation of awk, and reading the data from the file twice instead of using two invocations of awk for every line in the file, and reading all of the data from the file twice plus an additional time for every line in the file.) Furthermore, there is no guarantee with your current code that the writer will be done writing before your loop terminates. There is no loop in the following code, it just continues reading from the input file until the line is found indicating that the writer is done:
Code:
#!/bin/bash
COUNT=$(tail -f -n +1 s_GenerateXMLDataFile.log | awk '
$3 ~ /X_fc_Loan/ && $6 == "[XMLTgt_FCSLOANS_Ver25_Norm])" {
        getline
        if($5=="Requested:")
                l = $6
}
$2 == "TM_6020" && $3 == "Session" && $4 == "[s_GenerateXMLDataFile]" && \
$5 == "completed" {
        print l
        exit
}')

I don't know what your input looks like and you didn't show how (or even if) you're using COUNT or COUNT1 when you fall of the end of the code you've shown us, so obviously none of this has been tested. I hope you find it useful as you try to solve this problem.

If you want to try the above on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.
# 10  
Old 06-02-2014
Hello Don,
My requirement is to read the log file while it is being populated and print latest count which is ($6) in your awk script code, as the process keeps appending to the log file and when the process puts a line TM_6020 Session [s_GenerateXMLDataFile] completed that means the process has been completed populating the log file or the process has ended and awk script control should exit.

Now when i executed your script against the log file which is static now as i don't have the process running parallely now and it prints the last count in the log file which seems correct but for some reason the control is not coming out.


Thank you.
# 11  
Old 06-02-2014
Quote:
Originally Posted by Ariean
Hello Don,
My requirement is to read the log file while it is being populated and print latest count which is ($6) in your awk script code, as the process keeps appending to the log file and when the process puts a line TM_6020 Session [s_GenerateXMLDataFile] completed that means the process has been completed populating the log file or the process has ended and awk script control should exit.

Now when i executed your script against the log file which is static now as i don't have the process running parallely now and it prints the last count in the log file which seems correct but for some reason the control is not coming out.


Thank you.
I didn't write any new awk code; I slightly modified your two awk scripts and merged them into one script. If my script is picking up $6 from the wrong place, it is because that is what your script was doing.

Since you have not shown us anything at all about what your data looks like, I could only guess at what you're trying to do baed on your awk code. If I guessed wrong, you need to show us the format of the data (and a sample) we're supposed to process.

What do you mean the control is not coming out?

The script you showed us set COUNT, COUNT1, and ENDFLG and never did anything with any of them. I assumed you had code following what you showed us in your script that used one or more of those three variables (at least $COUNT).

My code only sets COUNT (since I couldn't see any need for the other two). I assumed you would add the other code you had hidden from us that would use $COUNT.

If you add the command:
Code:
printf 'COUNT is "%s"\n' "$COUNT"

to the end of the script I gave you, what does it print after the asynchronous writing process writes:
Code:
SomethingToIgnore TM_6020 Session [s_GenerateXMLDataFile] completed MaybeSomethingMoreToIgnore

into s_GenerateXMLDataFile.log?

If the writer terminates without writing the above line into the file, my script will hang forever waiting for the writer to finish its job. There is no way for this script to know that the writer is done writing if it doesn't write the line that says "I'm done" into the file.
# 12  
Old 06-02-2014
Don,
Apologies for confusion and let me repeat, my requirement is to read the log file (attached) and print the counts as the log file is being populated by Informatica process. Informatica job puts various current counts in the log file for different tables into which it is loading data into while it runs, however i am interested only the counts below the X_fc_Loan phrase and after "Requested:" phrase which are highlighted below and print these counts out.

Random Excerpt from log file:
Code:
WRITER_1_*_1> WRT_8161 
TARGET BASED COMMIT POINT  Thu May 29 16:04:08 2014
===================================================

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Provider (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1          Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Institution (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 17         Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1103520    Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Customer (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 583985     Applied: 0          Rejected: 0          Affected: 0         

WRITER_1_*_1> WRT_8161 
TARGET BASED COMMIT POINT  Thu May 29 16:04:12 2014
===================================================

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Provider (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1          Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Institution (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 17         Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1203840    Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Customer (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 583985     Applied: 0          Rejected: 0          Affected: 0         

WRITER_1_*_1> WRT_8161 
TARGET BASED COMMIT POINT  Thu May 29 16:04:15 2014
===================================================

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Provider (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1          Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Institution (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 17         Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Loan (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 1304160    Applied: 0          Rejected: 0          Affected: 0         

WRT_8036 Target: XMLTgt_FCSLoans25::X_fc_Customer (Instance Name: [XMLTgt_FCSLOANS_Ver25_Norm])
WRT_8038 Inserted rows - Requested: 583985     Applied: 0          Rejected: 0          Affected: 0


The highlighted counts above should be printed as script output as script executes or continues processing/reading the log file.
Required Output:
HTML Code:
1103520
1203840
1304160
The script should exit once it finds the below line written by Informatica job in to the log file, as that would be end of data load or end of Informatica job.
Code:
DIRECTOR> TM_6020 Session [s_GenerateXMLDataFile] completed at [Mon Jun 02 15:32:02 2014].

So to achieve this i came up with below non-working code which i am struggling to fix. While loop exits with out waiting for the Informatica job to finish writing the log file as it thinks it reached end of file though technically log file is still being written slowly by Informatica job. The awk script in below code prints the count or values which i was expecting as output. Hope this helps, please let me know if i am not clear.

Script: (Modified/Removed junk from Initial code to make it simpler/readable)
Code:
#!/bin/bash -x

while read line
do
COUNT=`awk '$3 ~ /X_fc_Loan/ && $6 == "[XMLTgt_FCSLOANS_Ver25_Norm])" {getline; if($5=="Requested:")l=$6} END { print l }' s_GenerateXMLDataFile.log`

echo "$COUNT"

ENDFLG=`echo "$line" | awk '{print $2" "$3" "$4" "$5}'`
if [ "$ENDFLG" = "TM_6020 Session [s_GenerateXMLDataFile] completed" ]
then
        break
fi

done < s_GenerateXMLDataFile.log

The output from the script would be captured by Java UI to show the progress of the data load to the user.

Thank you.

Last edited by Ariean; 06-02-2014 at 05:15 PM..
# 13  
Old 06-02-2014
As long as your asynchronous writer will always eventually write a line of the format:
Code:
DIRECTOR> TM_6020 Session [s_GenerateXMLDataFile] completed at [Mon Jun 02 15:32:02 2014].

into your log file as the last line of the file, the following will print the desired counts from your log file as they are added, and quit when the line above is found:
Code:
#!/bin/bash
trap 'kill $(cat $$.tailpid);rm -f $$.tailpid' EXIT
(tail -f -n +1 s_GenerateXMLDataFile.log& echo $! > $$.tailpid) | awk '
$3 ~ /X_fc_Loan/ && $6 == "[XMLTgt_FCSLOANS_Ver25_Norm])" {
	getline
	if($5=="Requested:")
		print $6
}
$2 == "TM_6020" && $3 == "Session" && $4 == "[s_GenerateXMLDataFile]" && \
$5 == "completed" {
	exit
}'
echo 'log file complete'

If you only want it to print counts that are added to the file after you start this script, or to only look at the last 10 lines in the log filie when the script starts and follow along as more data is added (instead of all counts from the start of the log file), change:
Code:
tail -f -n +1

to:
Code:
tail -f -n -1

or:
Code:
tail -f

respectively. Obviously, you can take out the echo at the end of the script if you don't want it.

Last edited by Don Cragun; 06-02-2014 at 07:31 PM.. Reason: Return script back to original version (after testing search from last line instead of 1st line).
This User Gave Thanks to Don Cragun For This Post:
# 14  
Old 06-06-2014
It worked thank you, i have some basic questions.

In this code statement (tail -f -n +1 s_GenerateXMLDataFile.log& echo $! > $$.tailpid) | awk ', could you please clarify if my understanding is wrong

1) you enclosed the statement in parantheses because you don't want to print the standard output to the teriminal instead you are piping the output to the awk script?

2) I don't understand what would be the output of echo $! as it is printing nothing when i execute at shell prompt? and why did you use the single ampersand symbol & between tail and echo commands as we would generally use double ampersand symbols && to combine to commands.


In this code statement trap 'kill $(cat $$.tailpid);rm -f $$.tailpid' EXIT

3) what is the difference between EXIT in capitals & exit in smaller case? i see both working
4) why you didn't put exit statement after commands before EXIT signal?

trap 'kill $(cat $$.tailpid);rm -f $$.tailpid; exit' EXIT

Thank you.

------------------------------------------------------------------------

Hello my requirement has changed as UI developer is having difficulties to capture the output of your shell script, now i have to get each value which is returned from this awk script into a shell variable and need to update that value into the table, with latest count as it gets populated in the log file.

For example if this is going to be output from the awk/shell script.
Code:
100320
200640
300960
401280
501600
601920
702240
802560
902880
923096
923096

First i should take 100320 and update the table, and then 200640 untill last value and exit. can you please help how i can do this?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing a log file and creating a report script

The log file is huge and lot of information, i would like to parse and make a report . below is the log file looks like: REPORT DATE: Mon Aug 10 04:16:17 CDT 2017 SYSTEN VER: v1.3.0.9 TERMINAL TYPE: prod SYSTEM: nb11cu51 UPTIME: 04:16AM up 182 days 57 mins min MODEL, TYPE, and SN:... (8 Replies)
Discussion started by: amir07
8 Replies

2. Shell Programming and Scripting

Issue in awk parsing under while loop

Hi I am trying to parse a grep output using awk. It works fine individually and not working under the loop with variable name assigned. cat > file.txt dict=/dictr/abcd/d1/wq:/dictr/abcd/d2/wq:/dictr/abcd/d3/wq: sample tried code Nos=`grep -w "dict" file.txt | awk -F"=" '{print... (10 Replies)
Discussion started by: ananan
10 Replies

3. Shell Programming and Scripting

Shell script not parsing complete file using AWK

Hi, I have shell script which will read single edi document and break data between ST & SE to separate files.Below example should create 3 separate files. I have written script with the below command and it is working fine for smaller files. awk -F\| -vt=`date +%m%d%y%H%M%S%s` \ ... (2 Replies)
Discussion started by: prasadm
2 Replies

4. Shell Programming and Scripting

Parsing out access.log with awk and grep

In part of my script I use awk to pull out the urls. awk '{print $8}' then I take them and send them to grep.` Some of them are straight .com/ or .org or whatever (address bar entries), while others are locations of images, js, etc. I'm trying to only pull any line that ends with .com/... (11 Replies)
Discussion started by: druisgod
11 Replies

5. Shell Programming and Scripting

Script for Parsing Log File

Working on a script that inputs an IP, parses and outputs to another file. A Sample of the log is as follows: I need the script to be able to input IP and print the data in an output file in the following format or something similar: Thanks for any help you can give me! (8 Replies)
Discussion started by: Winsarc
8 Replies

6. Shell Programming and Scripting

parsing issue with edi file

Hello, We have edi files we need to do some extra parsing on. There is a line that shows up that looks like this: GE|8,845|000000000 We need to parse the file, find the line ( that begins with GE "^GE" ), and remove the comma(s). What is the easiest way to do that ? I know I can grab... (5 Replies)
Discussion started by: fwellers
5 Replies

7. Shell Programming and Scripting

Log file issue within script

Hi, I have a script where it does several tasks and 3 of them being SQLPLUS activity. Within these SQLPLUS sessions, I have a spool file going but what ever is going on within each SQLPLUS session I would like to write it to my main log file where everything else is running. sqlplus -s <<... (2 Replies)
Discussion started by: ramangill
2 Replies

8. UNIX for Dummies Questions & Answers

Script for parsing details in a log file to a seperate file

Hi Experts, Im a new bee for scripting, I would ned to do the following via linux shell scripting, I have an application which throws a log file, on each action of a particular work with the application, as sson as the action is done, the log file would vanish or stops updating there, the... (2 Replies)
Discussion started by: pingnagan
2 Replies

9. Shell Programming and Scripting

Help with script parsing a log file

I have a large log file, which I want to first use grep to get the specific lines then send it to awk to print out the specific column and if the result is zero, don't do anything. What I have so far is: LOGDIR=/usr/local/oracle/Transcription/log ERRDIR=/home/edixftp/errors #I want to be... (3 Replies)
Discussion started by: mevasquez
3 Replies

10. Shell Programming and Scripting

Shell script for parsing 300mb log file..

am relatively new to Shell scripting. I have written a script for parsing a big file. The logic is: Apart from lot of other useless stuffs, there are many occurances of <abc> and corresponding </abc> tags. (All of them are properly closed) My requirement is to find a particular tag (say... (3 Replies)
Discussion started by: gurpreet470
3 Replies
Login or Register to Ask a Question