Extracting the required text from log files


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Extracting the required text from log files
# 1  
Old 06-20-2009
Extracting the required text from log files

It would be highly appreciable if any one helps me in this. I am trying to get it done through Java but I love unix and believe it can be done within minutes with couple of lines.

The input log file is a text file contains multiple entries seperated by a blank line.
Each seperated entry corresponds to upgrade process information of one file.

!ENTRY text.....<INFO> or <OKAY> <RESOURCE: /test/src/com/test1/*/test.java> 2009-06-18 13:01:01.181
!MESSAGE Requesting upgrade report for file: test.java

!ENTRY text.....<INFO> or <OKAY> <RESOURCE: /test/src/com/test1/*/test1.java> 2009-06-18 13:01:01.181
!MESSAGE information in test1.java will be upgraded.
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: full path /file name> 2009-06-18 13:02:25.681
!MESSAGE Will add import of org.apache.beehive.netui.pageflow.annotations.Jpf for JPF annotation support.

Each Entry starts with "!ENTRY" as shown above and will be followed by text "com.bea.workshop.upgrade81 " and then it will be followed by
two types of tags: <OKAY> and <INFO>
And then it will be followed by tag <RESOURCE:which contains the full path of the file and then followed by time stamp as shown above.

If it is <OKAY>, then second line will be as below
!MESSAGE Requesting upgrade report for file: filename ( Which is not of much importance for my output)

Ex:
!ENTRY text.....<INFO> or <OKAY> <RESOURCE: /test/src/com/test1/*/test.java> 2009-06-18 13:01:01.181
!MESSAGE Requesting upgrade report for file: file name

If it is <INFO> then also second line will be like
!MESSAGE Requesting upgrade report for file: filename
But it will certainly be followed by another set of multiple line pairs starting with tags !SUBENTRY,!MESSAGE like below

!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: path/filename> 2009-06-18 13:02:25.681
!MESSAGE Will add import of org.apache.beehive.netui.pageflow.annotations.Jpf for JPF annotation support.


My requirement is like below:

Requirement 1

1) For all entries containing <OKAY> tag, I need to extract the file names which will be after <RESOURCE:.....and before time stamp
Please note the entries are seperated by a blank line

Requirement 2

2) For all the lines with <INFO> tag, I would like to have a text file with entries like this

Full path and the file name and then in the next line
All the text after !MESAAGE right below the line containg the tag of corresponding "!SUBENTRY 1" tags


Example Input:
==============================
!ENTRY com.bea.workshop.upgrade81 <OKAY> <RESOURCE: /fullpathr/Test.java> 2009-06-18 13:02:28.368
!MESSAGE Requesting upgrade report for file: Test.java

!ENTRY com.bea.workshop.upgrade81 <OKAY> <RESOURCE: /fullpath/Test1.jpf> 2009-06-18 13:02:28.384
!MESSAGE Requesting upgrade report for file: Test1.jpf

!ENTRY com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test2.jpf> 2009-06-18 13:02:28.447
!MESSAGE Requesting upgrade report for file: Test2.jpf
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test2.jpf> 2009-06-18 13:02:28.447
!MESSAGE The Java 5 annotation Jpf.Controller needs to be added.
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test2.jpf> 2009-06-18 13:02:28.447
!MESSAGE More annotation of Jpf.Action needs to be added.

!ENTRY com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test3.jpf> 2009-06-18 13:02:28.634
!MESSAGE Requesting upgrade report for file: Test3.jpf
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test3.jpf> 2009-06-18 13:02:28.634
!MESSAGE The Java 5 annotation Jpf.Controller needs to be added.
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test3.jpf> 2009-06-18 13:02:28.634
!MESSAGE Will add import of org.apache.beehive.netui.pageflow.annotations.Jpf for JPF annotation support.
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test3.jpf> 2009-06-18 13:02:28.634
!MESSAGE ABC needs to be added.


Output for requirement1:
================================
Test.java
Test1.jpf

Output for requirement2:
================================
/fullpath/Test2.jpf
The Java 5 annotation Jpf.Controller needs to be added.
More annotation of Jpf.Action needs to be added.

/fullpath/Test3.jpf
The Java 5 annotation Jpf.Controller needs to be added.
Will add import of org.apache.beehive.netui.pageflow.annotations.Jpf for JPF annotation support.
ABC needs to be added.

Last edited by hareeshram; 06-20-2009 at 10:09 AM..
# 2  
Old 06-20-2009
This works, but probably only if your input is exactly as you described!

Code:
echo "Output from requirement 1"
echo "========================="
grep "<OKAY>.*RESOURCE" infile | sed -e "s/.*RESOURCE:.*\/\(.*\)>.*/\1/" | sort -u

echo
echo "Output from requirement 2"
echo "========================="
awk '
  /^!ENTRY.*<INFO>/ { X = 1; sub( /.*RESOURCE: /, "", $0 ); sub( />.*/, "", $0 ); print }
  (X == 1) && (/^!SUBENTRY/) { X++ }
  (X > 1) && ($1 ~ /^!MESSAGE/) { sub( /!MESSAGE /, "", $0 ); print }
  (X > 1) && ($1 ~ /^$/) { print ""; X = 0 }
' infile
 
 
Output:
Output from requirement 1
=========================
Test1.jpf
Test.java
 
Output from requirement 2
=========================
/fullpath/Test2.jpf
The Java 5 annotation Jpf.Controller needs to be added.
More annotation of Jpf.Action needs to be added.
 
/fullpath/Test3.jpf
The Java 5 annotation Jpf.Controller needs to be added.
Will add import of org.apache.beehive.netui.pageflow.annotations.Jpf for JPF annotation support.
ABC needs to be added.


Last edited by Scott; 06-20-2009 at 11:44 AM..
# 3  
Old 06-20-2009
Use gawk, nawk or /usr/xpg4/bin/awk on Solaris:

1.

Code:
awk '/<OKAY>/ { 
  sub(/>[^>]*$/, "")
  n = split($0, t, "/")
  print t[n]   
  }' infile

2.

Code:
awk '!NF { f = 0 }
/^!ENTRY.*<INFO>/ {
  sub(/>[^>]*$/, "")
  sub(/.*RESOURCE: /, "")
  print; f = 1
  }  
f && /!SUBENTRY/ { f++ }   
f > 1 && sub(/!MESSAGE /, "")
' infile


Last edited by radoulov; 06-20-2009 at 11:48 AM..
# 4  
Old 06-20-2009
Cool!

You win Smilie
# 5  
Old 06-20-2009
Quote:
Originally Posted by scottn
Cool!

You win Smilie
No,
I need to make the code more generic (I just modified it to remove the specific columns references).

Last edited by radoulov; 06-20-2009 at 12:08 PM.. Reason: wrong statement :), your output is correct
# 6  
Old 06-22-2009
That was so fast and accurate!

Thanks scottn and radoulov.
It worked fine for me

I have a slight change in format that I would expect for both requirements. Hope you would suggest me.

Requirement 1:

The file names should come with full paths

Requirement 2:
After the file name (with full path), whatever the messages that are being displayed (one message for one line), they are to be displayed without any duplication ( the same message in consecutive lines should be removed) per entry and all non duplicated entries should be separted by comma (rather than new line"). Unlike in the above case, both file and comma separated messages should come in the single line.

Once again many many thanks for keeping my spirit up in unix.

Example input
=========

Example Input:
==============================
!ENTRY com.bea.workshop.upgrade81 <OKAY> <RESOURCE: /fullpath/Test.java> 2009-06-18 13:02:28.368
!MESSAGE Requesting upgrade report for file: Test.java

!ENTRY com.bea.workshop.upgrade81 <OKAY> <RESOURCE: /fullpath/Test1.jpf> 2009-06-18 13:02:28.384
!MESSAGE Requesting upgrade report for file: Test1.jpf

!ENTRY com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test2.jpf> 2009-06-18 13:02:28.447
!MESSAGE Requesting upgrade report for file: Test2.jpf
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test2.jpf> 2009-06-18 13:02:28.447
!MESSAGE The Java 5 annotation Jpf.Controller needs to be added.
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test2.jpf> 2009-06-18 13:02:28.447
!MESSAGE More annotation of Jpf.Action needs to be added.

!ENTRY com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test3.jpf> 2009-06-18 13:02:28.634
!MESSAGE Requesting upgrade report for file: Test3.jpf
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test3.jpf> 2009-06-18 13:02:28.634
!MESSAGE The Java 5 annotation Jpf.Controller needs to be added.
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test3.jpf> 2009-06-18 13:02:28.634
!MESSAGE The Java 5 annotation Jpf.Controller needs to be added.
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test3.jpf> 2009-06-18 13:02:28.634
!MESSAGE Will add import of org.apache.beehive.netui.pageflow.annotations.Jpf for JPF annotation support.
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test3.jpf> 2009-06-18 13:02:28.634
!MESSAGE ABC needs to be added.
!SUBENTRY 1 com.bea.workshop.upgrade81 <INFO> <RESOURCE: /fullpath/Test3.jpf> 2009-06-18 13:02:28.634
!MESSAGE ABC needs to be added.


Output for requirement1:
================================
/fullpath/Test.java
/fullpath/Test1.jpf

Output for requirement2:
================================
/fullpath/Test2.jpf (separated by tab) The Java 5 annotation Jpf.Controller needs to be added,More annotation of Jpf.Action needs to be added.
/fullpath/Test3.jpf (separated by tab) The Java 5 annotation Jpf.Controller needs to be added,Will add import of org.apache.beehive.netui.pageflow.annotations.Jpf for JPF annotation support,ABC needs to be added.
# 7  
Old 06-22-2009
You may try something like this:

1.

Code:
awk '/<OKAY>/ { 
  sub(/>[^>]*$/, "")
  sub(/.*RESOURCE: /,"")
  print 
  }' infile

2.

Code:
awk 'END { if (r) print h "\t" r "." }
!NF { 
  if (r) print h "\t" r "." 
  f = r = 0; split("", t) 
  }
/^!ENTRY.*<INFO>/ { 
  sub(/>[^>]*$/, ""); sub(/.*RESOURCE: /, "")
  h = $0; f = 1 
  }  
f && /!SUBENTRY/ { f++ }   
f > 1 && sub(/!MESSAGE /, "") {
  sub(/.$/, ""); t[$0]++ || r = r ? r ", " $0 : $0 
  }' infile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting values based on line-column numbers from multiple text files

Dear All, I have to solve the following problems with multiple tab-separated text file but I don't know how. Any help would be greatly appreciated. I have access to Linux mint (but not as a professional). I have multiple tab-delimited files with the following structure: file1: 1 44 2 ... (5 Replies)
Discussion started by: Bastami
5 Replies

2. Shell Programming and Scripting

Extracting lines from text files in folder based on the numbers in another file

Hello, I have a file ff.txt that looks as follows *ABNA.txt 356 24 36 112 *AC24.txt 457 458 321 2 ABNA.txt and AC24.txt are the files in the folder named foo1. Based on the numbers in the ff.txt file, I want to extract the lines from the corresponding files in the foo1 folder and... (2 Replies)
Discussion started by: mohamad
2 Replies

3. Shell Programming and Scripting

Extracting Delimiter 'TAG' Data From log files

Hi I am trying to extract data from within a log file and output format to a new file for further manipulation can someone provide script to do this? For example I have a file as below and just want to extract all delimited variances of tag 32=* up to the delimiter "|" and output to a new file... (2 Replies)
Discussion started by: Buddyluv
2 Replies

4. Programming

extracting text files

i m unable to extract data from one text files to different text files..i am able to concat two text files in d same file ---------- Post updated at 03:21 PM ---------- Previous update was at 03:16 PM ---------- i want a c program for it (2 Replies)
Discussion started by: asd123
2 Replies

5. Shell Programming and Scripting

Extracting/condensing text from multiple files to multiples files

Hi Everyone, I'm really new to all this so I'm really hoping someone can help. I have a directory with ~1000 lists from which I want to extract lines from and write to new files. For simplicity lets say they are shopping lists and I want to write out the lines corresponding to apples to a new... (2 Replies)
Discussion started by: born2phase
2 Replies

6. Shell Programming and Scripting

matching and extracting info from text files

Hi all, I have two .txt file i.e. First text file: 2 4 1 4 Second text file 2 1.nii.gz 4 334.nii.gz 1 12.nii.gz 4 134.nii.gz If entry in 1st column of 1st text file matches the 1st column of 2nd text file, then copy the file (name of which is the second column) associated with... (4 Replies)
Discussion started by: vd24
4 Replies

7. Shell Programming and Scripting

Extracting anchor text and its URL from HTML files in BASH

Hi All, I have some HTML files and my requirement is to extract all the anchor text words from the HTML files along with their URLs and store the result in a separate text file separated by space. For example, <a href="/kid/stay_healthy/">Staying Healthy</a> which has /kid/stay_healthy/ as... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

8. Shell Programming and Scripting

Extracting information from Config files /text processing

Hello All, This is my first post on this forums, which I consider one of the best of its kind. The reason for my post is that I want to export some information form Nagios configuration files to a DB. I know that there are other tools available to do this, like NDO, monarch, etc... But I want to... (3 Replies)
Discussion started by: oconmx
3 Replies

9. Shell Programming and Scripting

Script to archive log files:Urgent Help required

I have no prior knowledge of Unix shell scripting,but my requriment demands to wrie a script that do fallowing things. 1.Delete older than one year log files 2.Moves files in to the directories as YYYYMM wise. 3.If files in $LOGDIR older than n=2 months tar and move them to $ARCHIVEDIR... (5 Replies)
Discussion started by: vamsx
5 Replies

10. Shell Programming and Scripting

Help required regarding Extracting lines from a file

I have a file containing the following contents All of us, including Zippy, our dog All of us, including Zippy and Zippy All of us, including Zippy and Zippy and Zelda Testing All of us Zippy Now, i wanna grep and get the lines which has only one occurance of word Zippy and starting with... (1 Reply)
Discussion started by: google_ever
1 Replies
Login or Register to Ask a Question