Find keywords in multiple log files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find keywords in multiple log files
# 8  
Old 06-04-2016
Search

The OS is AIX 7.1.

My program searches for certain keywords and its values from multiple text files and output the information to a text file and sends an email attachment. One of the Keyword is named real time . if real time row value in the text files is greater than 5:00:00 than output the column name and its value and the text filename that stores the information to progflag.txt.

Another keyword that is included in the search is an assignment operator named Memsize and its value. Memsize and its value and the text filename that stores the information are outputted to progflag.txt.

The last keyword that is included in the search is a directory name SASFoundation. SASfoundation and the text filename that stores the information are outputted to progflag.txt.

My problem is in progflag.txt, I am getting the headers with no column values. Below is the output when I run the code:

Code:
MEMSIZE SECOND   SASEXE   FILENAME

Here is what the output results need to show in progflag.txt
Code:
MEMSIZE   SECOND     SASEXE                     Filename
200                                                        SASFoundation_MEMSIZE.txt
400       06:00:00         SASFoundation        GT_5hr.txt

In the below example, there should be only 2 filenames in the progflag.txt and not three. For example, no_SASFoundation_no_MEMSIZE.txt doesn't meet the criteria so there shouldn't be any data for this file in progflag.txt.


Here is my code:
Code:
#!/bin/bash


cd /log/tmp/*.txt | awk -F '[=:]' '
  function pr() {printf FORMAT, K[1],K[2],K[3],K[0]}
  BEGIN {FORMAT="%s\t%s\t%16s\t%s\n"
      printf FORMAT, "MEMSIZE","SECOND","SASEXE","Filename\n"
        for(i=split("/Memsize/ $2, ,/Real Time/ $2 ,/SASFoundation/ $3",A,",");i;i--) L[A[i]]=i
      FORMAT="%s\t%.1f\t%16s\t%s\n"
  }
  FNR==1 {
      if(K[1] || K[2]>'5:00:00' || K[3]) pr()
       K[0]=FILENAME
      K[1]=K[2]=K[3]=x
  }
  $1 in L {v=$2;gsub("^[/ ]*","",v);gsub(/ *$/,"",v);K[L[$1]]=v}
  END{if(K[1] || K[2]>'5:00:00' || K[3]) pr()}' *.txt > progflag.txt

[ -s progflag.txt ] && mailx -s "subject text" -a  progflag.txt receiver@domain.com < "Code Need to be Evaluated"


Last edited by dellanicholson; 06-04-2016 at 09:04 PM.. Reason: ADD [CODE] [/CODE]
# 9  
Old 06-05-2016
Find keywords in multiple log files

My program run without error. The problem I am having.

The program isn't outputting field values with the column headers to file.txt.

Each of the column headers in file.txt has no data.

Code:
MEMSIZE  SECOND SASFoundation  Filename


The output results in file.txt should show:

Code:
MEMSIZE   SECOND      SASFoundation            Filename
200                                                             LT_5h_MEMSIZE.txt
400          06:00:00       SASFoundation            GT_5hr.txt

I realized the problem is gsub. I don't know enough about gsub to fix this
issue.

Code:
$1 in L{v=$2;gsub("^[/]*","",v)gsub(/*$/,"",v);gsub(v=$2"^[/]*","",v);K[L[$1]]=v}

The first gsub stored the field value for MEMSIZE and second gsub
stored the field value for real time and the last gsub stored the field
value for SASFoundation. The field values for headers are outputted to file.txt

Code:
#!/bin/bash

cd /tmp/log/*.log
awk -F '[= '':;.]' '
function pr() {if(NR>1) printf "%s\t%s\t%s\t%s\n", K[1],K[2],K[3],K[0]}
BEGIN {
printf "MEMSIZE\tSECOND\tSASFoundation\tFilename\n"
for(i=split("MEMSIZE ,real time ,SASFoundation",A,",");i;i--) L[A[i]]=i
}
FNR==1 {
pr()
K[0]=FILENAME
K[1]=K[2]=K[3]=x
}
$1 in L {v=$2;gsub("^[/ ]*","",v)gsub(/ *$/,"",v);gsub(v=$2"^[/ ]*","",v);K[L[$1]]=v}
 END{if(K[1] || K[2]>'5:00:00' || K[3]) pr()} *.txt > file.txt
[ -s file.txt ] && mailx -s "subject text" -a  file.txt receiver@domain.com < "Code Need to be Evaluated"

# 10  
Old 06-05-2016
There seem to be multiple issues with this code:
  • there is semicolon missing between the first two gsubs, is that a typo?
  • also the third gsub seems to have a spurious v=$2 in it, and if you leave that out it becomes identical to the the first gsub, so the third serves no purpose
  • There is a single quote missing at the end of the awk statements
  • In the field separators specification -F '[= '':;.]' the two quotes in the middle serve no purpose. Also also it seems ill adapted to splitting fields of the input file. With the given input $1 will only ever contain "MEMSIZE" and so that is the only time that the $1 in L condition is true, but then $2 is empty, but since the label in array L is "MEMORIZE " with a trailing space, even that will not match.
  • K[2]>'5:00:00' contains single quotes instead of double quotes, so this evaluates to K[2]>5:00:00, which is a syntax error

Last edited by Scrutinizer; 06-05-2016 at 10:13 PM..
# 11  
Old 06-05-2016
GSub

Thanks, Scrutinzer

Is there any way the gsub can be fixed that it will output the correct values?
# 12  
Old 06-05-2016
Quote:
Originally Posted by dellanicholson
The OS is AIX 7.1.

My program searches for certain keywords and its values from multiple text files and output the information to a text file and sends an email attachment. One of the Keyword is named real time . if real time row value in the text files is greater than 5:00:00 than output the column name and its value and the text filename that stores the information to progflag.txt.

Another keyword that is included in the search is an assignment operator named Memsize and its value. Memsize and its value and the text filename that stores the information are outputted to progflag.txt.

The last keyword that is included in the search is a directory name SASFoundation. SASfoundation and the text filename that stores the information are outputted to progflag.txt.

My problem is in progflag.txt, I am getting the headers with no column values. Below is the output when I run the code:

Code:
MEMSIZE SECOND   SASEXE   FILENAME

Here is what the output results need to show in progflag.txt
Code:
MEMSIZE   SECOND     SASEXE                     Filename
200                                                        SASFoundation_MEMSIZE.txt
400       06:00:00         SASFoundation        GT_5hr.txt

In the below example, there should be only 2 filenames in the progflag.txt and not three. For example, no_SASFoundation_no_MEMSIZE.txt doesn't meet the criteria so there shouldn't be any data for this file in progflag.txt.


Here is my code:
Code:
#!/bin/bash


cd /log/tmp/*.txt | awk -F '[=:]' '
  function pr() {printf FORMAT, K[1],K[2],K[3],K[0]}
  BEGIN {FORMAT="%s\t%s\t%16s\t%s\n"
      printf FORMAT, "MEMSIZE","SECOND","SASEXE","Filename\n"
        for(i=split("/Memsize/ $2, ,/Real Time/ $2 ,/SASFoundation/ $3",A,",");i;i--) L[A[i]]=i
      FORMAT="%s\t%.1f\t%16s\t%s\n"
  }
  FNR==1 {
      if(K[1] || K[2]>'5:00:00' || K[3]) pr()
       K[0]=FILENAME
      K[1]=K[2]=K[3]=x
  }
  $1 in L {v=$2;gsub("^[/ ]*","",v);gsub(/ *$/,"",v);K[L[$1]]=v}
  END{if(K[1] || K[2]>'5:00:00' || K[3]) pr()}' *.txt > progflag.txt

[ -s progflag.txt ] && mailx -s "subject text" -a  progflag.txt receiver@domain.com < "Code Need to be Evaluated"

I'm going to ignore most of your sample shell script for the moment because it doesn't seem to match any of your stated requirements. But, it is the only thing we have where you state what the explicit key words are that you are looking for in your text file. The key words your script defines are the literal strings: /Memsize/ $2, a literal single space character, /Real Time/ $2 , and /SASFoundation/ $3. Except for the second keyword in this (the single <space> character), I have not been able to find any of these key words in any of your sample files.

Searching through your sample input files for the data shown in your desired output above, I can find a line that would be matched by the ERE *real time * on a line that does NOT also contain the string seconds. Note that regular expressions and filename pattern matches are case-sensitive on UNIX and UNIX-like systems. Real Time and real time are NOT the same! Note that printing the value 6:00:00 from the input line:
Code:
      real time     6:00:00

(which does not contain the word seconds like other "real time" values:
Code:
      real time         0.06 seconds
      real time     3.01  seconds
      real time     0.3  seconds
      real time     3.0   seconds

under the heading SECONDS) is highly counterintuitive, and will NOT be displayed as you have requested using the printf format string %.1f. (Using that format with the input 6:00:00 would produce the output 6.0.) The string 6:00:00 seems to be hours, minutes, and second; not just seconds. And the test you're using to determine if a line should be printed is a string comparison; not a numeric comparison. With your test, a value of 51:00 (less than 1 hour) would compare greater than 5:00:00 and a value of 10:00:01 (more than 10 hours) would compare less than 5:00:00. Please provide a much clearer description of which lines containing real time should be reported and explain what should happen if more than one of those lines in a single input file are selected. (Your code would only the report the last selected line, if your code actually selected any lines matching this pattern. Is that what you want?)


The ERE MEMSIZE *= * seems to match the lines you are trying to grab from your input files:
Code:
MEMSIZE = 200;
MEMSIZE= 400;

The only line in any of your input files containing the string SASFoundation is:
Code:
z=/SAS/SAS94/SASFoundation/9.4;

which seems to have the key word z which is not mentioned anywhere in your description. Why is the value to be placed in your output under the heading SAXEXE file just the 3rd of the three or four directories named in the z key word's value?

The final field in your output is described in your explanation above as "the text filename that stores the information", and the MEMSIZE = 200; data in your output file comes from a file named SASFoundation_MEMSIZE.txt. But, the data for the last line of your sample output file comes from a file named more_than_5_hr.txt not from the file listed in your sample output: GT_5hr.txt.
# 13  
Old 06-06-2016
Quote:
Originally Posted by dellanicholson
My program run without error. The problem I am having.

The program isn't outputting field values with the column headers to file.txt.

Each of the column headers in file.txt has no data.

Code:
MEMSIZE  SECOND SASFoundation  Filename


The output results in file.txt should show:

Code:
MEMSIZE   SECOND      SASFoundation            Filename
200                                                             LT_5h_MEMSIZE.txt
400          06:00:00       SASFoundation            GT_5hr.txt

I realized the problem is gsub. I don't know enough about gsub to fix this
issue.

Code:
$1 in L{v=$2;gsub("^[/]*","",v)gsub(/*$/,"",v);gsub(v=$2"^[/]*","",v);K[L[$1]]=v}

The first gsub stored the field value for MEMSIZE and second gsub
stored the field value for real time and the last gsub stored the field
value for SASFoundation. The field values for headers are outputted to file.txt

Code:
#!/bin/bash

cd /tmp/log/*.log
awk -F '[= '':;.]' '
function pr() {if(NR>1) printf "%s\t%s\t%s\t%s\n", K[1],K[2],K[3],K[0]}
BEGIN {
printf "MEMSIZE\tSECOND\tSASFoundation\tFilename\n"
for(i=split("MEMSIZE ,real time ,SASFoundation",A,",");i;i--) L[A[i]]=i
}
FNR==1 {
pr()
K[0]=FILENAME
K[1]=K[2]=K[3]=x
}
$1 in L {v=$2;gsub("^[/ ]*","",v)gsub(/ *$/,"",v);gsub(v=$2"^[/ ]*","",v);K[L[$1]]=v}
 END{if(K[1] || K[2]>'5:00:00' || K[3]) pr()} *.txt > file.txt
[ -s file.txt ] && mailx -s "subject text" -a  file.txt receiver@domain.com < "Code Need to be Evaluated"

You say "My program run without error.", but with the 3rd line of your script being:
Code:
cd /tmp/log/*.log

I find that very hard to believe. This line will succeed if and only if there is exactly one file matching the pattern /tmp/log/*.log and that matching file is of type directory. Otherwise, that command will produce a diagnostic message. Since you are processing .txt files in that directory, please show us the exact output you get (in CODE tags) from the command:
Code:
ls -l /tmp/log/*.log/*.txt

You said: "I realized the problem is gsub. I don't know enough about gsub to fix this issue." I am not sure how your realized that gsub() is your problem (and it may be part of your problem), but you have a problem before you ever get to gsub(). With the field separators specified to be each occurrence of an equal sign, a space, a double-quote, a colon, a semicolon, or a period character in your input line and the strings that you are looking for in field 1 being MEMSIZE (which contains a trailing space character), real time (which contains an embedded and a trailing space character), and SASFoundation (which does not appear at the start of any line in any of your sample input files); there would seem to be zero chance that the condition $1 in L is ever going to be true for any of your sample input files. Therefore, none of your gsub() function calls will ever be executed in your script.

Instead of asking us to debug your gsub() function calls, please give us a CLEAR description in English of the logic used to determine:
  1. What keyword is being processed to get the value SASFoundation from the input line z=/SAS/SAS94/SASFoundation/9.4;. If you are processing the z keyword, why isn't the value for that keyword /SAS/SAS94/SASFoundation/9.4?
  2. Which lines containing real time need to be processed, what are the possible formats of the times specified on those lines, and how is that data supposed to be displayed in your output file?
  3. Since the last field in your output file is not always the name of the input file in which the rest of the data on that line was found, how is the data in that field determined?
  4. Why do your input and output files have incomplete last lines (with no line terminator) and why do they have DOS line separators? Why aren't your input and output files text files if they are named with the file extension .txt?
# 14  
Old 06-07-2016
I'm disappointed that you have chosen not to answer any of my questions (which would have helped give you code that might work for you), but maybe this will give you something you can adapt to your needs. It makes some wild assumptions based on sample input files you have provided in this thread, sample output files you have provided in this thread, sample code segments you have provided in this thread, statements you have made in this thread, and me reading a lot in between the lines:
  1. The input files you want to process are in the directory /tmp/log.
  2. The output file you want to produce should be placed in the directory /tmp/log.
  3. The name of the output file you want to produce is either file.txt or progflag.txt. (The following script uses the name progflag.txt.)
  4. You do not want to process your output file as an input file. (The following script ignores both file.txt and progflag.txt as input files.)
  5. All files in the directory /tmp/log whose names end with the string .txt (other than the two mentioned possible output files) are to be processed as input files.
  6. Your input files might or might not have DOS (CR-LF) line terminators instead of UNIX (LF) line terminators. If CR-LF line terminators are present, the CR should be removed before further processing an input line.
  7. Your input files might not have a line terminator on the last line. If an input file does not have a line terminator on the last line, a UNIX line terminator should be added.
  8. Your output file should be a properly formatted text file with UNIX line terminators.
  9. If an input file contains the string /SASFoundation/, an output line should be created in your output file with the string SASFoundation as the 3rd field in that line.
  10. If an input file contains a line matching the ERE ^MEMSIZE *= *[^;]*;{0,1}, an output line should be created in your output file with the string matched by the [^;]* portion of that ERE as the 1st field in that line.
  11. If an input line contains three words and the 1st word is real, and 2nd word is time, and the 3rd word matches the ERE [0-9]+:[0-9]{2}:[0-9]{2} (where the leading digit(s) represent hours, the middle digits represent minutes, and the last digits represent seconds) and the elapsed time represented by the 3rd word is greater than 5 hours; an output line should be created in your output file with the 3rd word (with a leading zero prepended if there is only one leading digit in that word) as the 2nd field in that line.
  12. If more than one line matching any one of the above three criteria would cause an output line to be created, the last line encountered in an input file meeting that criteria is the one used to determine what appears in the output line.
  13. If more than one of the criteria is found in a single input file, only one line of output should be produced for that input file and the 4th field in that output line should be the name of the input file from which that data was extracted.
Code:
#!/bin/bash
cd /tmp/log
for f in *.txt
do	# Skip output files
	[ "$f" = "file.txt" ] && continue
	[ "$f" = "progflag.txt" ] && continue

	# Add a header line for each remaining file to be processed, copy the
	# file to awk's standard input, and add a line terminator to the end of
	# each input file...
	printf '***File=%s\n' "$f"	# Header
	cat "$f"			# File contents
	echo				# Terminate last incomplete line
done | awk '
BEGIN {	FMT[0] = "%-9s%08s  %-15s%s\n"	# SECOND field format for HH:MM:SS
	FMT[1] = "%-9s%-10s%-15s%s\n"	# SECOND field format for other values
}
# Function to print data from data for one input file (including output file
# header before the first output produced).
function pr() {
	if(ms || rt || se) {
		# If we have not printed a header yet...
		if(!header) {
			# print a header.
			header = 1
			printf(FMT[1], "MEMSIZE", "SECOND", "SASEXE",
			    "Filename")
		}
		# Print data gathered from this input file...
		printf(FMT[length(rt) == 0], ms, rt, se, fn)
		ms = rt = se = ""
	}
}
{	# Convert DOS line terminators to UNIX line termiantors.
	sub(/\r$/, "")
}
/^\*\*\*File=/ {
	# File header found for a new input file...
	# Print data from previous file.
	pr()

	# Grab filename from this line.
	fn = substr($0, 9)
#	printf("fn=\"%s\" extracted from \"%s\"\n", fn, $0)
	next
}
/^MEMSIZE *=/ {
	# Grab MEMSIZE field data.
	split($0, fields, / *= *|;/)
	ms = fields[2]
#	printf("ms=\"%s\" extracted from \"%s\"\n", ms, $0)
	next
}
/\/SASFoundation\// {
	# If any line contains the literal string "/SASFoundation/", set se to
	# "SASFoundation".
	se = "SASFoundation"
#	printf("se=\"%s\" extracted from \"%s\"\n", se, $0)
	next
}
$1 == "real" && $2 == "time" && NF == 3 && split($3, fields, /:/) == 3 {
	# We have found a "real time" line with 3 fields and the 3rd field is of
	# the form hours:minutes:seconds.  Set rt to $3 if hours > 5 OR
	# (hours == 5 AND (minutes > 0 || seconds > 0)).
	if(fields[1] + 0 > 5 ||
		(fields[1] == 5 && (fields[2] != "00" || fields[3] != "00")))
		rt = $3
#	printf("rt\"%s\" extracted from \"%s\"\n", rt, $0)
	next
}
END {	# Print results from last input file.
	pr()
}' > progflag.txt

# Send mail if output was produced.
[ -s progflag.txt ] && echo "Code Need to be Evaluated" |
    mailx -s "subject text" -a  progflag.txt receiver@domain.com

This script was written using a Korn shell and tested with a Korn shell and with bash. It should work with any POSIX-conforming shell. If you want to try this on a Solaris/SunOS system, change awk in this script to /usr/xg4/bin/awk or nawk. If the file you uploaded as sample data for this thread are located in the directory /tmp/log this script creates a file named progflag.txt containing:
Code:
MEMSIZE  SECOND    SASEXE         Filename
400      06:00:00  SASFoundation  GT_5hr.txt
200                               SASFoundation_MEMSIZE.txt
400      06:00:00  SASFoundation  more_than_5_hr.txt

Of course, the script won't work if receiver@domain.com is not a valid e-mail address nor if your systems version of mailx does not include a -a file option to include file as an attachment to your mail message. (The POSIX standards do not include a mailx -a file option.)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find and replace from multiple files

Hello everybody, I need your help. I have a php site that was expoited, the hacker has injected into many php files a phishing code that was discovered and removed in order to have again a clean code. Now we need to remove from many php files that malware. I need to create a script that find and... (2 Replies)
Discussion started by: ninocap
2 Replies

2. Shell Programming and Scripting

Grep multiple keywords from a file

I have a script that will search for a keyword in all the log files. It work just fine. LOG_FILES={ "/Sandbox/logs/*" } for file in ${LOG_FILES}; do grep $1 $file done This only works for 1 keyword. What if I want to search for more then 1 keywords, say 4 or maybe even... (10 Replies)
Discussion started by: Loc
10 Replies

3. UNIX for Dummies Questions & Answers

Find keywords in multiple log files

The Problem that I am having is when the code ran and populated the progflag.csv file, columns MEMSIZE, SECOND and SASEXE were blank. The next problems are the IF else statement isn't working and the email function isn't sending the progflag.csv attachment. a. What I want the program to do is to... (2 Replies)
Discussion started by: dellanicholson
2 Replies

4. Shell Programming and Scripting

Search files in directory for keywords using bash

I have ~100 text files in a directory that I am trying to parse and output to a new file. I am looking for the words chr,start,stop,ref,alt in each of the files. Those fields should appear somewhere in those files. The first two fields of each new set of rows is also printed. Since this is on a... (7 Replies)
Discussion started by: cmccabe
7 Replies

5. Shell Programming and Scripting

Find keywords, and append at the end of line

Task: Find keywords in each line, and append at the end of line; if not found in the line, do nothing. the code is wrong. how to make it work. thanks a lot. cat keywords.txt | while read line; do awk -F"|" '{if (/$line/) {print $0"$line , ";} else print;}' outfile.txt > tmp ... (9 Replies)
Discussion started by: dtdt
9 Replies

6. UNIX for Advanced & Expert Users

Need to search for keywords within files modified at a certain time

I have a huge list of files in an Unix directory (around 10000 files). I need to be able to search for a certain keyword only within files that are modified between certain date and time, say for e.g 2012-08-20 12:30 to 2012-08-20 12:40 Can someone let me know what would be the fastest way... (10 Replies)
Discussion started by: virtual123
10 Replies

7. UNIX for Dummies Questions & Answers

finding keywords in many files using grep

Hi to all Sorry for the confusion because I did not explain the task clearly. There are many .hhr files in a folder There are so many lines in these .hhr files but I want only the following 2 lines to be transferred to the output file. The keyword No 1 and all the words in the next line They... (5 Replies)
Discussion started by: raghulrajan
5 Replies

8. Shell Programming and Scripting

Script to find & replace a multiple lines string across multiple php files and subdirectories

Hey guys. I know pratically 0 about Linux, so could anyone please give me instructions on how to accomplish this ? The distro is RedHat 4.1.2 and i need to find and replace a multiple lines string in several php files across subdirectories. So lets say im at root/dir1/dir2/ , when i execute... (12 Replies)
Discussion started by: spfc_dmt
12 Replies

9. Shell Programming and Scripting

Finding 50k Keywords in 3k files

Hi, I have a file with about 50k keywords. I have a requirement to scan about 3k files to identify which filename has which keyword i.e. an output like following: File1,Keyword1 File1,Keyword2 File3,Keyword1 ..... I have written a shell script which takes each of the 3k files, searches... (4 Replies)
Discussion started by: rjains
4 Replies

10. Shell Programming and Scripting

How to find particular string in multiple files

Hello friends, I have find a paticular string from the files present in my user for example: a username and password is hardcoded in multiple files which present in the my user.so I have to search about username in which files it is available.there are several dirctories are there,so... (5 Replies)
Discussion started by: sivaranga001
5 Replies
Login or Register to Ask a Question