Help with awk script


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Help with awk script
# 8  
Old 06-27-2014
Quote:
Originally Posted by nybbles2bytes
... ... ...

I saw what you meant by not piping to date so I've modified this to work the way date wants to work. So here's the code, as you can see I've pulled some functionality into th awk and got rid of the need for grep for the regex.

What I don't know how to do is get rid of using "date" through the loop as I am sure it's making it a very slow script by having to call it that way. One way I suppose would be to split the datetimes up and reorder them by YYYYMMDDHHMMSS and then just do a string compare but I'm not really sure how to do that in awk.

And of course any clues you have on whatever else might make it slow that we could change since I don't really know the ins and outs of awk.

Called with: sudo cat /var/log/httpd/rewrite.log|./monitor-rewrites.awk -v LookBack="10 hours"

Code:
#!/bin/awk -f

BEGIN{
  #
  # LookBack should be something like "1 hour" or "1 day" and is set on the command line with
  # -v LookBack='1 hour'. Best to make it the same and the crontab set frequency for calling
  # this script.
  #
        "date -d \"-"LookBack"\" \"+%s\""|getline RefDate;
  close("date -d \"-"LookBack"\" \"+%s\"");
  print "BEGIN";
}
{
        "date -d \"$(echo "$4"|tr -d \"[]\"|sed -e \"s#/#-#g\" -e \"s/:/ /1\")\" \"+%s\""|getline LogDate;
  close("date -d \"$(echo "$4"|tr -d \"[]\"|sed -e \"s#/#-#g\" -e \"s/:/ /1\")\" \"+%s\"");
  #
  #  Uncomment the following for testing
  # print "R:",RefDate, "  L:",LogDate, "  Diff:",(LogDate - RefDate), "  Line: "$0;
  #
  if (RefDate < LogDate) {
    if (/rewrite +.?(assets\/.+.? +-> +.?sites\/default\/files\/|sites\/default\/files\/.+.? +-> +.?assets\/)./) {
      print $4, $5, $10, $11, $12, $13;
    }
  }
}
END {
  print "END";
}

OK. You're making real progress.

You can simplify your life by moving the sudo into your script.
The cat is not needed and slows down your script (although not by a lot in this case); it just gives you an extra (unneeded) process and causes your script to read the entire contents of your input file twice (when only once is needed) and to write the entire contents of your input file (when it is not needed at all).

Assuming that the differences between the date stamps is only used for debugging (and isn't needed for further processing), then we can get rid of the invocations of date inside the loop (as you suggested) so we can just perform string comparisons. To do that we need to convert the month name into a month number and we need to use the awk substr() function a few times to rearrange the text in field 4 into the format:
YYYYMMDDHH:MM:SS. We could have removed the colons from this, but it would take two additional calls to substr(), so it was simpler to add the colons to the date output when setting RefDate. So if you rename your script to something like monitor-rewrites and invoke it with something like:
Code:
./monitor-rewrites /var/log/httpd/rewrite.log "10 hours"

where monitor-rewrites contains:
Code:
#!/bin/ksh
IAm=${0##*/}
if [ $# -ne 2 ]
then	printf "Usage: %s Logfile TimeRange\n" "$IAm" >&2
	printf '\twhere "Logfile" is a pathname to the logfile to be\n' >&2
	printf '\tprocessed and "TimeRange" is something like "1 day" or\n' >&2
	printf '\t"4 hours" spcifying which records from "Logfile" are\n' >&2
	printf '\tto be processed relative to the current time.\n' >&2
	exit 1
fi
Logfile="$1"
TimeRange="$2"

sudo awk -v RefDate=$(date -d "new - $TimeRange" '+%Y%m%d%H:%M:%S') '
BEGIN {	print "BEGIN"
	b2m["Jan"] = "01"; b2m["Feb"] = "02"; b2m["Mar"] = "03"
	b2m["Apr"] = "04"; b2m["May"] = "05"; b2m["Jun"] = "06"
	b2m["Jul"] = "07"; b2m["Aug"] = "08"; b2m["Sep"] = "09"
	b2m["Oct"] = "10"; b2m["Nov"] = "11"; b2m["Dec"] = "12"
}
{	# Sample field 4 data:
	#	[26/Jun/2014:13:41:16
	#	[dd/abm/YYYY:HH:MM:SS
	#	000000000111111111122
	#	123456789012345678901
	# Convert to:
	#	YYYYmmddHH:MM:SS (The colons are left in to reduce the number of
	#	substr() calls needed to reformat the date and time.
	LogDate=substr($4, 9, 4) b2m[substr($4, 5, 3)] substr($4, 2, 2) \
		substr($4, 14, 8)

	# Uncomment the following for testing
	# print "R:" RefDate " L:" LogDate " Line: " $0

	if(RefDate < LogDate) {
		if (/rewrite +.?(assets\/.+.? +-> +.?sites\/default\/files\/|sites\/default\/files\/.+.? +-> +.?assets\/)./) {
			print $4, $5, $10, $11, $12, $13
		}
	}
}
END {	print "END"
}' "$Logfile"

it should run a little bit faster since your awk script doesn't have to start up a shell to run date for every line your process in your file.

As I said before, I can't fully test this since the date utility on my system doesn't support the -d option and nothing in the sample input you have provided matches the ERE you're using to select lines to print, but this should be close to what you want.

I hope this helps.
# 9  
Old 07-02-2014
I didn't realize that you responded, thanks! ...for whatever reason I didn't get the email.

I did something very similar to you because as you surmised it was slow with the shell calls. So, I did get rid of the date like you, well, not in the begin statement but in the loop where it matters.

However, I ended up with a shell call in the loop anyway because one more criteria came to light which I had to find lines from another file and I couldn't find awk functions to process a 2nd file. I suspect this is the point where it would be smart to go to something more comprehensive such as PERL or in my case probably PHP.

But it's been a fun exercise and I want to see through ...here's my working script so far ...practically speaking the shell call in the loop is okay now because the call is only for the lines I want which is a tiny fraction of the total. However, there is still one piece I am unsure about which is the shell call in the loop uses "readline". I suspect that it's only returning one line even when there are more lines to read. I'm not sure how to get all the lines.
Code:
File Edit Options Buffers Tools AWK Help
#!/bin/awk -f

BEGIN{
  #
  # LookBack should be something like "1 hour" or "1 day" and is set on the command line with
  # -v LookBack='1 hour'. Best to make it the same and the crontab set frequency for calling
  # this script.
  #
        "date -d \"-"LookBack"\" \"+%s\""|getline RefDate;
  close("date -d \"-"LookBack"\" \"+%s\"");
  split("jan feb mar apr may jun jul aug sep oct nov dec", months, " ");
}
{
  s = tolower($4);
  gsub(/\//, ":", s);
  sub(/\[/, "", s);
  split(s, a, ":");
  for (i = 1; i < length(months); i++) {
    if (months[i] == a[2]) {
      a[2] = sprintf("%02d", i);
      break;
    }
  }
  LogDate = mktime(a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6]);
  if (RefDate < LogDate) {
    if (/rewrite +.?(assets\/.+.? +-> +.?sites\/default\/files\/|sites\/default\/files\/.+.? +-> +.?assets\/)./) {
     if (! /injector|css\/css_/) {
        # $11 = file with path that is being rewritten to another directory
        fn = $11;
        gsub(/[']/, "", fn);
              "grep \""fn"\" /var/log/httpd/access_log"|getline access_log;
        close("grep \""fn"\" /var/log/httpd/access_log");
        print $4, $5, $10, $11, $12, $13, "\nLines from access_log with the file:", fn, "\n", access_log, "\n";
     }
    }
  }
}

# 10  
Old 07-02-2014
Quote:
Originally Posted by nybbles2bytes
I didn't realize that you responded, thanks! ...for whatever reason I didn't get the email.

I did something very similar to you because as you surmised it was slow with the shell calls. So, I did get rid of the date like you, well, not in the begin statement but in the loop where it matters.

However, I ended up with a shell call in the loop anyway because one more criteria came to light which I had to find lines from another file and I couldn't find awk functions to process a 2nd file. I suspect this is the point where it would be smart to go to something more comprehensive such as PERL or in my case probably PHP.

... ... ...
Code:
File Edit Options Buffers Tools AWK Help
#!/bin/awk -f

BEGIN{
 ... ... ...
        # $11 = file with path that is being rewritten to another directory
        fn = $11;
        gsub(/[']/, "", fn);
              "grep \""fn"\" /var/log/httpd/access_log"|getline access_log;
        close("grep \""fn"\" /var/log/httpd/access_log");
        print $4, $5, $10, $11, $12, $13, "\nLines from access_log with the file:", fn, "\n", access_log, "\n";
 ... ... ...
}

Yes, awk's getline gets one line. Note that in case your filenames could contain any BRE matching characters, you should use fgrep (or grep -F). And, since both fgrep and egrep are very easy to implement in awk, you don't need to use a shell to invoke grep to extract matching lines from another file. Here are ways to do both. First using awk to simulate grep -F without using a shell:
Code:
        # $11 = file with path that is being rewritten to another directory
        fn = $11;
        gsub(/[']/, "", fn);
	print $4,$5,$10,$11,$12,$13 "\nLines from access_log with the file:", fn
	while((getline line < "/var/log/httpd/access_log") > 0)
		if(index(line, fn))
			print line
	close("/var/log/httpd/access_log")

and second, using grep -F but printing all matched lines instead of just the 1st matched line:
Code:
        # $11 = file with path that is being rewritten to another directory
        fn = $11;
        gsub(/[']/, "", fn);
	print $4,$5,$10,$11,$12,$13 "\nLines from access_log with the file:", fn
	cmd = "grep -F \"" fn "\" /var/log/httpd/access_log"
	while((cmd | getline line) > 0)
		print line
	close(cmd)

# 11  
Old 07-02-2014
Awesome. I used your second example so this works as desired:
Code:
#!/bin/awk -f

BEGIN{
  #
  # LookBack should be something like "1 hour" or "1 day" and is set on the command line with
  # -v LookBack='1 hour'. Best to make it the same and the crontab set frequency for calling
  # this script.
  #
        "date -d \"-"LookBack"\" \"+%s\""|getline RefDate;
  close("date -d \"-"LookBack"\" \"+%s\"");
  split("jan feb mar apr may jun jul aug sep oct nov dec", months, " ");
}
{
  s = tolower($4);
  gsub(/\//, ":", s);
  sub(/\[/, "", s);
  split(s, a, ":");
  for (i = 1; i < length(months); i++) {
    if (months[i] == a[2]) {
      a[2] = sprintf("%02d", i);
      break;
    }
  }
  LogDate = mktime(a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6]);
  if (RefDate < LogDate) {
    if (/rewrite +.?(assets\/.+.? +-> +.?sites\/default\/files\/|sites\/default\/files\/.+.? +-> +.?assets\/)./) {
      if (! /injector|css\/css_/) {
        # $11 = file with path that is being rewritten to another directory
        fn = $11;
        gsub(/[']/, "", fn);
        print $4,$5,$10,$11,$12,$13 "\nLines from access_log with the file:", fn;
        cmd = "grep -F \"" fn "\" /var/log/httpd/access_log";
        while((cmd | getline line) > 0) {
          print line;
        }
        close(cmd);
        print "\n";
      }
    }
  }
}

# 12  
Old 07-02-2014
Quote:
Originally Posted by nybbles2bytes
Awesome. I used your second example so this works as desired:
... ... ...
I'm glad to hear it is working for you.

Did you try the first example? Unless /var/log/httpd/access_log is a large file, the first example should be faster.
# 13  
Old 07-02-2014
I just chose the 2nd one because it looks cleaner, I can switch to the first but it really doesn't matter, I'm not dealing with speed issues at this point. They are both great for my purposes.

The only issue I am dealing with is that I saw that one of the paths had a file name with curly brackets --> { } <-- and now I am wondering if I should escape or double escape a number of characters in the variable fn.

Something like gsub(/([^a-zA-Z0-9\-])/, "\\1", fn) ? I haven't tested this I'm just guessing that this would be the general shape of a command to escape all characters that might interfere with the regex on the grep but I'm hoping you might have this handy somewhere as it can't be an uncommon problem to solve.
# 14  
Old 07-02-2014
Quote:
Originally Posted by nybbles2bytes
I just chose the 2nd one because it looks cleaner, I can switch to the first but it really doesn't matter, I'm not dealing with speed issues at this point. They are both great for my purposes.

The only issue I am dealing with is that I saw that one of the paths had a file name with curly brackets --> { } <-- and now I am wondering if I should escape or double escape a number of characters in the variable fn.

Something like gsub(/([^a-zA-Z0-9\-])/, "\\1", fn) ? I haven't tested this I'm just guessing that this would be the general shape of a command to escape all characters that might interfere with the regex on the grep but I'm hoping you might have this handy somewhere as it can't be an uncommon problem to solve.
No! Both of the methods I gave you treat fn as a fixed string, not a regular expression. Nothing special needs to be done in this case using the index() function.

The command I gave you to call grep -F quoted the fn string before giving it to the shell. This form will have a problem if a character in fn is a double-quote (") character, but characters like braces, brackets, parentheses, pipe symbols, and single-quotes should not be a problem.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell script to call and sort awk script and output

I'm trying to create a shell script that takes a awk script that I wrote and a filename as an argument. I was able to get that done but I'm having trouble figuring out how to keep the header of the output at the top but sort the rest of the rows alphabetically. This is what I have now but it is... (1 Reply)
Discussion started by: Eric7giants
1 Replies

2. Shell Programming and Scripting

awk script to call another script based on second column entry

Hi I have a text file (Input.txt) with two column entries separated by tab as given below: aaa str1 bbb str2 cccccc str3 dddd str4 eee str3 ssss str2 sdf str3 hhh str1 fff str2 ccc str3 ..... ..... ..... (1 Reply)
Discussion started by: my_Perl
1 Replies

3. UNIX for Dummies Questions & Answers

Passing shell script parameter value to awk command in side the script

I have a shell script (.sh) and I want to pass a parameter value to the awk command but I am getting exception, please assist. diff=$1$2.diff id=$2 new=new_$diff echo "My id is $1" echo "I want to sync for user account $id" ##awk command I am using is as below cat $diff |... (1 Reply)
Discussion started by: Sarita Behera
1 Replies

4. Post Here to Contact Site Administrators and Moderators

Unable to pass shell script parameter value to awk command in side the same script

Variable I have in my shell script diff=$1$2.diff id=$2 new=new_$diff echo "My id is $1" echo "I want to sync for user account $id" ##awk command I am using is as below cat $diff | awk -F'~' ''$2 == "$id"' {print $0}' > $new I could see value of $id is not passing to the awk... (0 Replies)
Discussion started by: Ashunayak
0 Replies

5. Shell Programming and Scripting

Calling shell script within awk script throws error

I am getting the following error while passing parameter to a shell script called within awk script. Any idea what's causing this issue and how to ix it ? Thanks sh: -c: line 0: syntax error near unexpected token `newline' sh: -c: line 0: `./billdatecalc.sh ... (10 Replies)
Discussion started by: Sudhakar333
10 Replies

6. Shell Programming and Scripting

Passing awk variable argument to a script which is being called inside awk

consider the script below sh /opt/hqe/hqapi1-client-5.0.0/bin/hqapi.sh alert list --host=localhost --port=7443 --user=hqadmin --password=hqadmin --secure=true >/tmp/alerts.xml awk -F'' '{for(i=1;i<=NF;i++){ if($i=="Alert id") { if(id!="") if(dt!=""){ cmd="sh someScript.sh... (2 Replies)
Discussion started by: vivek d r
2 Replies

7. Shell Programming and Scripting

Help: How to convert this bash+awk script in awk script only?

This is the final first release of the dynamic menu generator for pekwm (WM). #!/bin/bash function param_val { awk "/^${1}=/{gsub(/^${1}="'/,""); print; exit}' $2 } echo "Dynamic {" for CF in `ls -c1 /usr/share/applications/*.desktop` do name=$(param_val Name $CF) ... (3 Replies)
Discussion started by: alexscript
3 Replies

8. Shell Programming and Scripting

Call shell script function from awk script

hi everyone i am trying to do this bash> cat abc.sh deepak() { echo Deepak } deepak bash>./abc.sh Deepak so it is giving me write simply i created a func and it worked now i modified it like this way bash> cat abc.sh (2 Replies)
Discussion started by: aishsimplesweet
2 Replies

9. Shell Programming and Scripting

want to pass parameters to awk script from shell script

Hello, I have this awk script that I want to execute by passing parameters through a shell script. I'm a little confused. This awk script removes duplicates from an input file. Ok, so I have a .sh file called rem_dups.sh #!/usr/bin/sh... (4 Replies)
Discussion started by: script_op2a
4 Replies

10. Shell Programming and Scripting

create a shell script that calls another script and and an awk script

Hi guys I have a shell script that executes sql statemets and sends the output to a file.the script takes in parameters executes sql and sends the result to an output file. #!/bin/sh echo " $2 $3 $4 $5 $6 $7 isql -w400 -U$2 -S$5 -P$3 << xxx use $4 go print"**Changes to the table... (0 Replies)
Discussion started by: magikminox
0 Replies
Login or Register to Ask a Question