Search and Count Occurrences of Pattern in a File


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Search and Count Occurrences of Pattern in a File
# 1  
Old 06-03-2009
Search and Count Occurrences of Pattern in a File

I need to search and count the occurrences of a pattern in a file. The catch here is it's a pattern and not a word ( not necessarily delimited by spaces). For eg. if ABCD is the pattern I need to search and count, it can come in all flavors like (ABCD, ABCD), XYZ.ABCD=100, XYZ.ABCD>=500, XYZ.ABCD = 200 etc.

I tried using something like below for Word search and count ( got if from another post trying to count occurrences of word , but not sure how I could fit this in for a string which is not necessarily delimited.

Code:
srchWord="$1"
srchFilename="$2"
ctr=0
    while read strLine
    do
    for eachWord in `echo $strLine`
    do
    if [ "$eachWord" = "$srchWord" ]
    then
    ctr=`expr $ctr + 1`
    fi
    done
    done < srchFilename
    printf "\n\ntheFile contains %s : %d times\n\n" $srchWord $ctr

Example of file contents and two specific search scenarios I am trying to address
Code:
(
PABC_CUST_ACCT_DETL_CURR.ADB_STFC_BAL<=5000
OR
PABC_CUST_ACCT_DETL_CURR.ADB_STFC_BAL is NULL
)
lab_may09_params_tbl.AMF <= 59)
((PABC.CUST_ACCT_DETL_CURR.ADB_STFC_BAL <= 5000) OR PABC.CUST_ACCT_DETL_CURR.ADB_STFC_BAL IS NULL) 
(ADB_STFC_BAL=100
ADB_STFC_BAL)
PABC_CUST_ACCT_DETL_CURR.ADB_STFC_BAL^M
PABC_CUST_ACCT_DETL_CURR.ADB_STFC_BAL;
(ADB_STFC_BAL)^M

Scenarios which I need to address:-

1. Search by ADB_STFC_BAL
Expected Result : Count 9

2. Search by PABC_CUST_ACCT_DETL_CURR.ADB_STFC_BAL
Expected Result : Count 6

Any help with altering the above script to use the pattern or any ather approaches to solve the problem using awk or so are greatly appreciated. Files are pretty large and I need to do this for around 200 words.

Thanks in advance !
# 2  
Old 06-03-2009
something to start with:
Code:
nawk '{cnt+=gsub("ADB_STFC_BAL","&")}END {print cnt}' myFile.txt


Last edited by vgersh99; 06-03-2009 at 09:49 AM..
# 3  
Old 06-16-2009
Thanks vgersh99..it certainly helps.. don't have nawk in our server tried with awk and it works.. Have a few addl qs as I am not very familiar with awk..

I want to assign this output to a variable say $COUNT. Also assuming that my filename and pattern are stored in $i and $j respectively, how do I modify this same awk command? My intent is to pass the script a filelist and pattern list and loop thru it.

In the above example $i=myfile2.txt and $j=ADB_STFC_BAL

Thanks in advance for your help
# 4  
Old 06-16-2009
Hammer & Screwdriver Another approach, and lesson in expressions

I copied your file to a temp file on my system, called file26.

Code:
> tr " " "\n" <file26 | grep "ADB_STFC_BAL" | wc -l
9

> tr " " "\n" <file26 | grep "PABC_CUST_ACCT_DETL_CURR.ADB_STFC_BAL" | wc -l
4

> tr " " "\n" <file26 | grep 'PABC\.CUST_ACCT_DETL_CURR.ADB_STFC_BAL' | wc -l
2

Note that I matched your first count correctly. Now take a look at the 2nd and 3rd commands. A . in grep means to match any character, so I had to do a \. to escape the . character to precisely match on it.
And 4 & 2 are the correct match counts for your input file.

By the way, I often use this trick of replacing a space character with a new line [the first tr command] when trying to find unique matching.
# 5  
Old 06-16-2009
Code:
#!/bin/ksh

COUNT=$(awk '{cnt+=gsub(pat,"&")}END {print cnt}' pat="${j}" "${i}")

# 6  
Old 06-16-2009
Thanks again vgersh99.. I was able to get this working. But encountered in an awk error on some files

awk: Input line 06/15/2009 05:08:18. cannot be longer than 3,000 bytes.

Alternately, I tried accomplishing this using shell script

CNT=` cat $j | grep -i $i | sed 's/'$i'/&\n/g' | grep -wi $i | wc -l `

instead of

CNT=` awk -v pat=${i} '{cnt+=gsub(pat,"&")}END {print cnt}' $j `

That seemed to work on linux, but does not on HP-UX

Any ideas what I need to change to get it working in HP-UX
# 7  
Old 06-17-2009
This might work for you.

Code:
 
srchword=$1;
srchfile=$2;
ctr=`grep -io $srchword $srchfile| wc -l`


Last edited by BubbaJoe; 06-17-2009 at 02:14 PM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk variable search and line count between variable-search pattern

Input: |Running the Rsync|Sun Oct 16 22:48:01 BST 2016 |End of the Rsync|Sun Oct 16 22:49:54 BST 2016 |Running the Rsync|Sun Oct 16 22:54:01 BST 2016 |End of the Rsync|Sun Oct 16 22:55:45 BST 2016 |Running the Rsync|Sun Oct 16 23:00:02 BST 2016 |End of the Rsync|Sun Oct 16 23:01:44 BST 2016... (4 Replies)
Discussion started by: busyboy
4 Replies

2. Shell Programming and Scripting

Count occurrences in first column

input amex-11 10 abc amex-11 20 bcn amed-12 1 abc I tried something like this. awk '{h++}; END { for(k in h) print k, h }' rm1 output amex-11 1 10 abc amex-11 1 20 bcn amed-12 2 1 abc Note: The second column represents the occurrences. amex-11 is first one and amed-12 is the... (5 Replies)
Discussion started by: quincyjones
5 Replies

3. Shell Programming and Scripting

Identify file pattern, take count of pattern, then act

Guys - Need your ideas on a section of code to finish something up. To make a long story short, I'm parsing a print output file that goes to pre-printed forms. I'm intercepting it, parsing it, formatting it, cutting it up into individual pages, grabbing the text I want in zones, building an... (3 Replies)
Discussion started by: ampsys
3 Replies

4. Shell Programming and Scripting

Speed : awk command to count the occurrences of fields from one file present in the other file

Hi, file1.txt AAA BBB CCC DDD file2.txt abc|AAA|AAAabcbcs|fnwufnq bca|nwruqf|AAA|fwfwwefwef fmimwe|BBB|fnqwufw|wufbqw wcdbi|CCC|wefnwin|wfwwf DDD|wabvfav|wqef|fwbwqfwfe i need the count of rows of file1.txt present in the file2.txt required output: AAA 2 (10 Replies)
Discussion started by: mdkm
10 Replies

5. Shell Programming and Scripting

Search for a pattern in a String file and count the occurance of each pattern

I am trying to search a file for a patterns ERR- in a file and return a count for each of the error reported Input file is a free flowing file without any format example of output ERR-00001=5 .... ERR-01010=10 ..... ERR-99999=10 (4 Replies)
Discussion started by: swayam123
4 Replies

6. Shell Programming and Scripting

How to search number of occurrences of a particular string in a file through vi editor?

i have one file, i am doing 'vi Filename' now i want to search for particular string and i want to know how many times that string occurs in whole file (5 Replies)
Discussion started by: sheelsadan
5 Replies

7. Shell Programming and Scripting

Pattern search and count

Hi all, I need to search the database log find out the most frequently used tables for a certain period of time. The search pattern is : the database.table so, i need to look for ABCD.* in the entire log and then need the top ten tables. I thought of using awk, search for the pattern ... (7 Replies)
Discussion started by: ysvsr1
7 Replies

8. Shell Programming and Scripting

Count the number of occurrences of a pattern between each occurrence of a different pattern

I need to count the number of occurrences of a pattern, say 'key', between each occurrence of a different pattern, say 'lu'. Here's a portion of the text I'm trying to parse: lu S1234L_149_m1_vg.6, part-att 1, vdp-att 1 p-reserver IID 0xdb registrations: key 4156 4353 0000 0000 ... (3 Replies)
Discussion started by: slipstream
3 Replies

9. Shell Programming and Scripting

pattern search and count

i want to search a word in a file and find the count of occurences even if pattern occures twice in a same line. for example file has the following content. yes no no nooo yees no yes if I search for "no" it should give count as 4 Pls help. Thanks (9 Replies)
Discussion started by: RahulJoshi
9 Replies
Login or Register to Ask a Question