Sponsored Content
Top Forums Shell Programming and Scripting Number of matches and matched pattern(s) in awk Post 302963255 by Don Cragun on Sunday 27th of December 2015 01:31:43 AM
Old 12-27-2015
Here is an alternative approach that will work with any standards-conforming version of awk. (Note that the standards say the behavior is unspecified if FS (or the ERE used in split()) is an empty string.
Code:
awk '
BEGIN {	printf("String   #_of_occurrences_of__pattern:pattern...\n")
}
{	printf("%s", left = $0)
	while(match(left, /[[:alnum:]]+/)) {
		# Throw away leading non-digit, non-alpha characters.
		if(RSTART > 1)
			left = substr(left, RSTART)
		if((num = left + 0) > 0) {
			# We have a string starting with a leading digit string.
			p = substr(left, len = length(num) + 1, num)
			left = substr(left, len + num)
			if(p in mcnt) {
				# We have seen this pattern before.
				mcnt[p]++
			} else {# We have not seen this pattern before.
				mcnt[mplist[++nmp] = p] = 1
			}
		} else {
			# We have a single alphabetic character string.
			p = substr(left, 1, 1)
			left = substr(left, 2)
			if(p in scnt) {
				# We have seen this pattern before.
				scnt[p]++
			} else {# We have not seen this pattern before.
				scnt[splist[++nsp] = p] = 1
			}
		}
	}
	# Print the results for this input line.
	# Print single character patterns.
	for(i = 1; i <= nsp; i++) {
		printf("   %d:%s", scnt[splist[i]], splist[i])
		delete scnt[splist[i]]
		delete splist[i]
	}
	# Print multiple character patterns.
	for(i = 1; i <= nmp; i++) {
		printf("   %d:%s", mcnt[mplist[i]], mplist[i])
		delete mcnt[mplist[i]]
		delete mplist[i]
	}
	print ""
	nmp = nsp = 0
}' file

If file contains:
Code:
!@#$%2QW5QWERTAB$%^&*
!|@|#|$|%|2QW|5QWERT|A|B|$|%|^|&|*
!@#$%2QW5QWERTAB$%^&*2QW5QWERTABAB
12ABCDEFGHIJKLMNABC
12ABCDEFGHIJKLMNABC#12ABCDEFGHIJKLMNDEF
~!@#$%^&*()_+
Aa@52ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz#aA
AAAAAAAAAAAAAAAAAAAAAAAaaaaaabbbbbbAAAAAAAAAAAAAAAAAA
!@#$%3ABC$%DE$%4Fghi3ABC^&*D$%^&

it produces the output:
Code:
String   #_of_occurrences_of__pattern:pattern...
!@#$%2QW5QWERTAB$%^&*   1:A   1:B   1:QW   1:QWERT
!|@|#|$|%|2QW|5QWERT|A|B|$|%|^|&|*   1:A   1:B   1:QW   1:QWERT
!@#$%2QW5QWERTAB$%^&*2QW5QWERTABAB   3:A   3:B   2:QW   2:QWERT
12ABCDEFGHIJKLMNABC   1:M   1:N   1:A   1:B   1:C   1:ABCDEFGHIJKL
12ABCDEFGHIJKLMNABC#12ABCDEFGHIJKLMNDEF   2:M   2:N   1:A   1:B   1:C   1:D   1:E   1:F   2:ABCDEFGHIJKL
~!@#$%^&*()_+
Aa@52ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz#aA   2:A   2:a   1:ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
AAAAAAAAAAAAAAAAAAAAAAAaaaaaabbbbbbAAAAAAAAAAAAAAAAAA   41:A   6:a   6:b
!@#$%3ABC$%DE$%4Fghi3ABC^&*D$%^&   2:D   1:E   2:ABC   1:Fghi

If someone wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk. (I don't think nawk knows how to handle the character class expression [[:alnum:]].)

PS: Note that this script uses a consistent field separator of three <space> characters instead of a mixture of pipe symbols and semicolons.

Last edited by Don Cragun; 12-27-2015 at 03:18 AM.. Reason: Add PS. & fix auto spell correct induced typo: s/album/alnum/
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to count pattern matches

i have an awk statement which i am using to count the number of occurences of the number ,5, in the file: awk '/,5,/ {count++}' TRY.txt | awk 'END { printf(" Total parts: %d",count)}' i know there is a total of 10 matches..what is wrong here? thanks (16 Replies)
Discussion started by: npatwardhan
16 Replies

2. Shell Programming and Scripting

awk to sum specific field when pattern matches

Trying to sum field #6 when field #2 matches string as follows: Input data: 2010-09-18-20.24.44.206117 UOWEXEC db2bp DB2XYZ hostname 1 2010-09-18-20.24.44.206117 UOWWAIT db2bp DB2XYZ hostname ... (3 Replies)
Discussion started by: ux4me
3 Replies

3. Shell Programming and Scripting

grep - match files containing minimum number of pattern matches

I want to search a bunch of files and list only those containing a minimum number of pattern matches. So if I want to identify files containing 3 (or more) instances of the pattern "said:" and I have file1 that contains the lines: He said: She said: and file2 that contains the lines: He... (3 Replies)
Discussion started by: stumpyuk
3 Replies

4. Shell Programming and Scripting

print the whole row in awk based on matched pattern

Hi, I need some help on how to print the whole data for unmatched pattern. i have 2 different files that need to be checked and print out the unmatched patterns into a new file. My sample data as follows:- File1.txt Id Num Activity Class Type 309 1.1 ... (5 Replies)
Discussion started by: redse171
5 Replies

5. Shell Programming and Scripting

awk with range but matches pattern

To match range, the command is: awk '/BEGIN/,/END/' but what I want is the range is printed only if there is additional pattern that matches in the range itself? maybe like this: awk '/BEGIN/,/END/ if only in that range there is /pattern/' Thanks (8 Replies)
Discussion started by: zorrox
8 Replies

6. Shell Programming and Scripting

Count number of pattern matches per line for all files in directory

I have a directory of files, each with a variable (though small) number of lines. I would like to go through each line in each file, and print the: -file name -line number -number of matches to the pattern /comp/ for each line. Two example files: cat... (4 Replies)
Discussion started by: pathunkathunk
4 Replies

7. Shell Programming and Scripting

awk to delete content before and after a matched pattern

Hello, I have been trying to write a script where I could get awk to delete data before and after a matched pattern. For eg Raw data Start NAME = John Age = 35 Occupation = Programmer City = New York Certification Completed = No Salary = 80000 End Start NAME = Mary Age = 25... (2 Replies)
Discussion started by: sidnow
2 Replies

8. Shell Programming and Scripting

Egrep patterns in a file and limit number of matches to print for each pattern match

Hi I need to egrep patterns in a file and limit number of matches to print for each matched pattern. -m10 option is not working out in my sun solaris 5.10 Please guide me the options to achieve. if i do head -10 , i wont be getting all pattern match results as output since for a... (10 Replies)
Discussion started by: ananan
10 Replies

9. Shell Programming and Scripting

awk Index to get position matches pattern

Input data as below (filetest.txt): 1|22 JAN Minimum Bal 20.00 | SAT 2|09 FEB Extract bal 168.00BR | REM 3|MIN BAL | LEX Output should be: ( If there is Date & Month in 2nd field of Input file, It should be seperated else blank. If There is Decimal OR Decimal & Currency in last of the 2nd... (7 Replies)
Discussion started by: JSKOBS
7 Replies

10. UNIX for Beginners Questions & Answers

find pattern matches in consecutive lines in certain fields-awk

I have a text file with many thousands of lines, a small sample of which looks like this: InputFile:PS002,003 D -1 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 6 6 -1 -1 -1 -1 0 509 0 PS002,003 PSQ 0 1 7 18 1 0 -1 1 1 3 -1 -1 ... (5 Replies)
Discussion started by: jvoot
5 Replies
All times are GMT -4. The time now is 06:26 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy