awk command optimization

09-14-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

SkySmart,
I have yet to see a clear definition of what the submitter wants to be matched by the patterns given. For example, with the following input:

Code:

error open
open database
database fail
fail accepted
		if ($i ~ "(error|fail|panic|open database|accepted)")
accepted error

error, fail, panic, open database, accepted.
error123 error456:error789
error error error

should error match error|, error,, and error123 as well as when error is at the start or end of a line and when it is preceded and followed by whitespace characters. The following seems to do what is wanted if all of the above are supposed to match:

Code:

strvar="error|fail|panic|open database|accepted"
awk -v patlist="$strvar" '
BEGIN {	npat = split(patlist, pl, "[|]")
	for(i = 1; i <= npat; i++)
		pat[i] = "(^|[^[:alpha:]])" pl[i] "([^[:alpha:]]|$)"
		# for(i=1; i <= npat; i++)
		#	printf("pl[%d]=%s, pat[%d]=%s\n", i, pl[i], i, pat[i])
}
$0 ~ patlist {	# printf("NR=%d, $0=%s\n", NR, $0)
	for(i = 1; i <= npat; i++) {
		a[i] += gsub(pat[i], "&")
		# printf("a[%s] = %d (for %s)\n",  i, a[i], pl[i])
	}
}
END {	for(i = 1; i <= npat; i++)
		printf("%s=%s\n", pl[i], a[i])
}' file

If you don't want to match 123error and error123, change both occurrences of :alpha: in the code above to :alnum:.

The the vast majority of your input lines will contain one or more of your search patterns, this will probably run faster if you remove the code shown in red.

With the above sample input file, this script produces the output:

Code:

error=9
fail=4
panic=2
open database=3
accepted=4

You also have yet to explain why your script was eliminating line 1 and lines 128501 and above from your input file. Until we know what is special about these lines and what is on them, we won't be able to help you make adjustments to these awk and egrep commands to solve your problem. I agree with Chubler_XL that egrep looks like a better solution than awk, but depending on what you're trying to match, the ERE operand may need to be significantly modified and the -w option removed before invoking egrep. Furthermore, if some lines do need to be treated specially, awk becomes more attractive.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

09-15-2014

Registered User

919, 3

Join Date: Dec 2006

Last Activity: 5 March 2020, 5:37 PM EST

Posts: 919

Thanks Given: 757

Thanked 3 Times in 3 Posts

thanks everyone. this has now been resolved. i took bits and pieces from every single post in here to get what i want.

thank you guys!

SkySmart

View Public Profile for SkySmart

Find all posts by SkySmart

Shell Programming and Scripting

awk command optimization

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need Optimization shell/awk script to aggreagte (sum) for all the columns of Huge data file

Discussion started by: kartikirans

2. Shell Programming and Scripting

Code optimization

Discussion started by: primo102

3. Shell Programming and Scripting

awk command optimization

Discussion started by: abhi1988sri

4. Shell Programming and Scripting

CPU optimization

Discussion started by: Gl@)!aTor

5. Shell Programming and Scripting

awk command in script gives error while same awk command at prompt runs fine: Why?

Discussion started by: catalys

6. Shell Programming and Scripting

Awk script gsub optimization

Discussion started by: Akshay

7. Shell Programming and Scripting

sed optimization

Discussion started by: njaiswal

8. Shell Programming and Scripting

AWK optimization

Discussion started by: majormark

9. Shell Programming and Scripting

script optimization

Discussion started by: vivek.gkp