Determining Word Frequency of Specific Terms

03-05-2009

Registered User

135, 0

Join Date: Feb 2009

Last Activity: 10 February 2016, 9:34 PM EST

Posts: 135

Thanks Given: 7

Thanked 0 Times in 0 Posts

Determining Word Frequency of Specific Terms

Hello,
I require a perl script that will read a .txt file that contains words like

224.199.207.IN-ADDR.ARPA. IN NS NS1.internet.com.
4.200.162.207.in-addr.arpa. IN PTR beeriftw.internet.com.
arroyoeinternet.com. IN A 200.199.227.49

I want to focus on words:
IN NS
IN PTR
IN A
IN CNAME

I like to get a output that looks like:

Total number of NS records =
Total number of PTR records=
Total number A records=
Total number of CNAME=

Thanks in advance

richsark

View Public Profile for richsark

Find all posts by richsark

03-05-2009

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Code:

#!/usr/bin/perl -n
# cnt.pl	
	my $ns = 0;
	my $ptr = 0;
	my $a = 0;
	my $cname = 0;
	while(<>)
	{
	  if (/IN NS/   ) {$ns++;   }
               if (/IN PTR/  ) {$ptr++;  }
               if (/IN A/    ) {$a++;    }
               if (/IN CNAME/) {$cname++;}
             }

     print "NS    records =", $ns   , "\n";
     print "PTR   records =", $ptr  , "\n";
     print "A     records =", $a    , "\n";
     print "CNAME records =", $cname, "\n";

usage: cnt.pl < logfile

FWIW this is really not a perl type thing - awk is probably better IMO.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

03-05-2009

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Or:

Code:

perl -ane'
  $_{$F[2]}++;
  print map "Total number of $_ records:\t$_{$_}\n", 
    keys %_ if eof
  ' infile

With AWK:

(use nawk or /usr/xpg4/bin/awk on Solaris)

Code:

awk 'END {
  for (k in _) 
    printf "Total number of %s:\t%d\n", k, _[k]
	}
{ _[$3]++ }' infile

Last edited by radoulov; 03-05-2009 at 04:12 PM..

radoulov

View Public Profile for radoulov

Find all posts by radoulov

03-05-2009

Registered User

135, 0

Join Date: Feb 2009

Last Activity: 10 February 2016, 9:34 PM EST

Posts: 135

Thanks Given: 7

Thanked 0 Times in 0 Posts

I have many zone files or dns zones that contain various record types.

is it to much to ask to add some finesse to my request.

Example: I could have

db.208.199.11.0

That would contain the below information

Code:

224.199.207.IN-ADDR.ARPA. IN NS AIM1.internet.com.
4.200.162.207.in-addr.arpa. IN PTR beeriftw.internet.com.
arroyoeinternet.com. IN A 200.199.227.49

Then another file
db.explorer.com would contain

Code:

224.162.207.IN-ADDR.ARPA.       IN NS   pwedns1.internet.com.
224.162.207.IN-ADDR.ARPA.       IN NS   pmedns1.internet.com.
224.162.207.IN-ADDR.ARPA.       IN NS   phedns1.internet.com.
224.162.207.IN-ADDR.ARPA.       IN NS   auth100.ns.aut.net.

So what I am requesting is to create input file that has these names in it that would use your script to count against.

So the output may look like for each word in my input file

Code:

db.208.199.11.0:
Total number of A records = 684
Total number of PTR records = 306
Total number of CNAME records = 58
Total number of NS records = 1352

db.explorer.com;
Total number of A records = 6
Total number of PTR records = 30
Total number of CNAME records = 88
Total number of NS records = 55

So rather then having it look for each txt file like my original thought, is have the script reference a master input file.

Thanks in advance !

Last edited by radoulov; 03-06-2009 at 08:22 AM.. Reason: added code tags

richsark

View Public Profile for richsark

Find all posts by richsark

03-06-2009

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

No need to create a master input file, AWK (or Perl, whichever you prefer) could process multiple input files. So assuming all your files reside in the same directory and all filenames begin with the string db:

Code:

awk 'END {
  print f ":"
    for (Z in z)
      printf "Total number of %s records = %d\n", \
      Z, z[Z]
    print RS
    }
FNR == 1 {
  if (f) {
    print f ":"
    for (Z in z)
      printf "Total number of %s records = %d\n", \
      Z, z[Z]
    print RS
    }
    f = FILENAME
  }    
{ z[$3]++ }' db*

radoulov

View Public Profile for radoulov

Find all posts by radoulov

03-06-2009

Registered User

135, 0

Join Date: Feb 2009

Last Activity: 10 February 2016, 9:34 PM EST

Posts: 135

Thanks Given: 7

Thanked 0 Times in 0 Posts

Hi Thanks for your reply, I ran your code, the out put looks like:

db.255.0.0.0:
Total number of SOA records = 17
Total number of records = 187
Total number of Serial records = 17
Total number of NS records = 17
Total number of Retry records = 17
Total number of OF records = 17
Total number of PTR records = 166
Total number of ; records = 17
Total number of Refresh records = 17
Total number of from: records = 17
Total number of FILE records = 68
Total number of Expire records = 17

Its spitting out alot of stuff, not sure what some mean like :
Total number of ; records = 17
Total number of Expire records = 17

Where is it getting that from? and can we tweak it?

richsark

View Public Profile for richsark

Find all posts by richsark

03-06-2009

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Yes,
it seems that not all records have the same format. Could you post a bigger sample of your data that includes records containing the offending patterns (Serial, Retry, Expire etc.)?

Perhaps something like this will be sufficient:

Code:

awk 'END {
  print f ":"
    for (Z in z)
      printf "Total number of %s records = %d\n", \
      Z, z[Z]
    print RS
    }
FNR == 1 {
  if (f) {
    print f ":"
    for (Z in z)
      printf "Total number of %s records = %d\n", \
      Z, z[Z]
    print RS
    }
    f = FILENAME
  }    
$2 == "IN" { z[$3]++ }' db*

radoulov

View Public Profile for radoulov

Find all posts by radoulov

Shell Programming and Scripting

Determining Word Frequency of Specific Terms

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for a specific word and print only the word from the input file

Discussion started by: mohan_kumarcs

2. Shell Programming and Scripting

Count frequency of unique values in specific column

Discussion started by: owwow14

3. Shell Programming and Scripting

Shell scripting: frequency of specific word in a string and statistics

Discussion started by: kraterions

4. Shell Programming and Scripting

Convert a list of word/terms into their Regexp representation

Discussion started by: oly_r

5. Shell Programming and Scripting

Fetch entries in front of specific word till next word

Discussion started by: Priyanka Chopra

6. Shell Programming and Scripting

Help with calculating frequency of specific word in a string

Discussion started by: perl_beginner

7. UNIX for Dummies Questions & Answers

How to print line starts with specific word and contains specific word using sed?

Discussion started by: tmalik79

8. Shell Programming and Scripting

Word Frequency Sort

Discussion started by: gimley

9. Shell Programming and Scripting

word frequency counter - awk solution?

Discussion started by: irrevocabile

10. Shell Programming and Scripting

Word frequency with additional information

Discussion started by: ToeLint