Pattern search and count


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pattern search and count
# 1  
Old 10-26-2011
Pattern search and count

Hi all,
I need to search the database log find out the most frequently used tables for a certain period of time.

The search pattern is : the database.table

so, i need to look for ABCD.* in the entire log and then need the top ten tables.

I thought of using awk, search for the pattern /ABCD./ write them to file and then count to get the top ten.

But the query text are so long, i am getting the following error:
Code:
awk '
{
for(i=1; i<=NF; i++)
  if($i ~ /ABCD./)
  printf($i); printf("\n")
}
' pbdw.d.txt > out.log


awk: Input line SELECT ACCT_NUM , ST cannot be longer than 3,000 bytes.
 The input line number is 204. The file is pbdw.d.txt.
 The source line number is 5.

I am looking for the easiest way to achieve this. Thanks in advance.

Last edited by Franklin52; 10-27-2011 at 03:19 AM.. Reason: Please use code tags, thank you
# 2  
Old 10-26-2011
If you're running on Sun/Solaris you might need to use nawk rather than awk. Older versions, including the default awk on Sun boxen (If I remember correctly), have smallish line length restrictions.

You can let awk find your table names and count them with something like this:

Code:
awk -v RS=" " '
    {
        n=split( $0, a, "\n" );
        for( i = 1; i <= n; i++ )
            if( (m = split( a[i], b, "." )) == 2 && b[1] == "ABCD" )
                count[b[2]]++;     # count each occurrence of table name
    }
    END {
        for( c in count )   #print each table and count
            printf( "%5d %s\n", count[c], c );
    } ' input-file-name |sort  -k 1nr,1

Setting the record seperator (RS) to blank, and splitting the input line on newline, might get you round the long line issue. Splitting each token and doing a direct compare for the database name should be quicker than a regexp match since your pattern is (I assume) a fixed string.

Last edited by agama; 10-26-2011 at 12:28 AM.. Reason: clarification
This User Gave Thanks to agama For This Post:
# 3  
Old 10-26-2011
It works fine for couple of logs, but for a few i am still getting error:

,'11 cannot be longer than 3,000 bytes.
The input line number is 1.13071e+06. The file is abc.txt.
The source line number is 1.
# 4  
Old 10-26-2011
That's an odd error message. Can you post the whole script?
# 5  
Old 10-26-2011
Code:
awk -v RS=" " '
    {
        n=split( $0, a, "\n" );
        for( i = 1; i <= n; i++ )
            if( (m = split( a[i], b, "." )) == 2 && b[1] == "PBDW" )
                count[b[2]]++;     # count each occurrence of table name
    }
    END {
        for( c in count )   #print each table and count
            printf( "%5d %s\n", count[c], c );
    } ' ABCD.D_P1.f.txt |sort  -k 1nr,1  > ABCD.D_P1.f.count.txt


Last edited by Franklin52; 10-27-2011 at 03:19 AM.. Reason: Please use code tags, thank you
# 6  
Old 10-26-2011
The only thing that I can think of is that you've got a record in the input file that has a 3000+ character record with no spaces. The whole reason I suggested setting the record separator (RS) to a space was to break long records up so that awk wouldn't choke on them.

I saw from your other post that fold didn't work. I'm surprised.

Not sure what else to suggest here.
# 7  
Old 10-27-2011
just write 'source data' and 'expected data' so ppl can understand faster what you want Smilie
tip78
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grep/awk using a begin search pattern and end search pattern

I have this fileA TEST FILE ABC this file contains ABC; TEST FILE DGHT this file contains DGHT; TEST FILE 123 this file contains ABC, this file contains DEF, this file contains XYZ, this file contains KLM ; I want to have a fileZ that has only (begin search pattern for will be... (2 Replies)
Discussion started by: vbabz
2 Replies

2. Shell Programming and Scripting

awk variable search and line count between variable-search pattern

Input: |Running the Rsync|Sun Oct 16 22:48:01 BST 2016 |End of the Rsync|Sun Oct 16 22:49:54 BST 2016 |Running the Rsync|Sun Oct 16 22:54:01 BST 2016 |End of the Rsync|Sun Oct 16 22:55:45 BST 2016 |Running the Rsync|Sun Oct 16 23:00:02 BST 2016 |End of the Rsync|Sun Oct 16 23:01:44 BST 2016... (4 Replies)
Discussion started by: busyboy
4 Replies

3. Shell Programming and Scripting

Identify file pattern, take count of pattern, then act

Guys - Need your ideas on a section of code to finish something up. To make a long story short, I'm parsing a print output file that goes to pre-printed forms. I'm intercepting it, parsing it, formatting it, cutting it up into individual pages, grabbing the text I want in zones, building an... (3 Replies)
Discussion started by: ampsys
3 Replies

4. Shell Programming and Scripting

Search for a pattern in a String file and count the occurance of each pattern

I am trying to search a file for a patterns ERR- in a file and return a count for each of the error reported Input file is a free flowing file without any format example of output ERR-00001=5 .... ERR-01010=10 ..... ERR-99999=10 (4 Replies)
Discussion started by: swayam123
4 Replies

5. Shell Programming and Scripting

Need one liner to search pattern and print everything expect 6 lines from where pattern match made

i need to search for a pattern from a big file and print everything expect the next 6 lines from where the pattern match was made. (8 Replies)
Discussion started by: chidori
8 Replies

6. Shell Programming and Scripting

Count the number of occurrences of a pattern between each occurrence of a different pattern

I need to count the number of occurrences of a pattern, say 'key', between each occurrence of a different pattern, say 'lu'. Here's a portion of the text I'm trying to parse: lu S1234L_149_m1_vg.6, part-att 1, vdp-att 1 p-reserver IID 0xdb registrations: key 4156 4353 0000 0000 ... (3 Replies)
Discussion started by: slipstream
3 Replies

7. UNIX for Dummies Questions & Answers

Search and Count Occurrences of Pattern in a File

I need to search and count the occurrences of a pattern in a file. The catch here is it's a pattern and not a word ( not necessarily delimited by spaces). For eg. if ABCD is the pattern I need to search and count, it can come in all flavors like (ABCD, ABCD), XYZ.ABCD=100, XYZ.ABCD>=500,... (6 Replies)
Discussion started by: tektips
6 Replies

8. Shell Programming and Scripting

search a pattern and if pattern found insert new pattern at the begining

I am trying to do some thing like this .. In a file , if pattern found insert new pattern at the begining of the line containing the pattern. example: in a file I have this. gtrow0unit1/gctunit_crrownorth_stage5_outnet_feedthru_pin if i find feedthru_pin want to insert !! at the... (7 Replies)
Discussion started by: pitagi
7 Replies

9. Shell Programming and Scripting

pattern search and count

i want to search a word in a file and find the count of occurences even if pattern occures twice in a same line. for example file has the following content. yes no no nooo yees no yes if I search for "no" it should give count as 4 Pls help. Thanks (9 Replies)
Discussion started by: RahulJoshi
9 Replies

10. Shell Programming and Scripting

nawk-how count the number of occurances of a pattern, when don't know the pattern

I've written a script to count the total size of SAN storage LUNs, and also display the LUN sizes. From server to server, the LUNs sizes differ. What I want to do is count the occurances as they occur and change. These are the LUN sizes: 49.95 49.95 49.95 49.95 49.95 49.95 49.95 49.95... (2 Replies)
Discussion started by: cyber111
2 Replies
Login or Register to Ask a Question