Determining Word Frequency of Specific Terms


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Determining Word Frequency of Specific Terms
# 15  
Old 03-06-2009
Hi, I just noticed your comment on Specials. I found out that "SPECIALS" may contain MX records, TXT records or any record, so if we leave it alone, it wont care what type of record, just the count right?
# 16  
Old 03-06-2009
Now it counts only the PTR, MX, NS, CNAME and A records.
If you want it to count all type of special records containing the IN string, you should use this code (I fixed one more bug in the END block):

Code:
awk 'END {
  print f ":"
    for (Z in z)
      printf "Total number of %s records = %d\n", \
      Z, z[Z]
    if (sc) printf "Total number of Special records = %d\n", \
    sc  
    print RS
    }
FNR == 1 {
  if (f) {
    print f ":"
    for (Z in z)
      printf "Total number of %s records = %d\n", \
      Z, z[Z]
    if (sc) printf "Total number of Special records = %d\n", \
    sc    
    print RS
    split(x, z)
    s = sc = 0
    }
    f = FILENAME
  }    
$3 ~ /^(PTR|MX|NS|CNAME|A)$/ && !s { z[$3]++ }
s && $2 == "IN" { sc++ }
/SPECIALS/ { s = 1 }' db*

# 17  
Old 03-06-2009
Code:
#!/usr/bin/env python

import re

p_types = re.compile(r'^IN\s+(\w+)\s*.*$')

log_lines = open(r'c:\temp\temp.txt', 'r')

type_counts = {}

for line in log_lines:
    if p_types.search(line):
        log_type = p_types.sub(r'\1', line)
        type_counts.setdefault(log_type, 0)
        type_counts[log_type] += 1
        
for log_type, count in type_counts.items():
    
    print "Total number of %s records: %d" % (log_type, count)

# 18  
Old 03-06-2009
OK, So from my previous observation, the count for this particular zone came out like so:

db.local.internet.com:
Total number of CNAME records = 23
Total number of A records = 444
Total number of NS records = 6
Total number of Special records = 162

The count between A records and Special records for this off if different.

The SPECIALS section has mixed records types from MX, NS, A

So perhaps the count from any Specials remain separated from the master count.

I also noticed that under SPECIALS it has MX records, do you think it also was included in the master count for SPECIALS?

I guess its hard to know from this script what is counted from what. Unless you can think of a better way to report on this. I am just throwing ideas out.

What do you think?
# 19  
Old 03-06-2009
Let me clarify,
the last version of the code implements the following logic:

1. For every input file:

1.1. If the third field matches the following regular expression:

Code:
^(PTR|MX|NS|CNAME|A)$

Which means that the third field exactly matches one of the following strings:

Code:
PTR or MX or NS or CNAME or A

1.2. AND we've not yet reached the SPECIALS section:

Code:
!s

1.3. We count each occurrence of the value of the third field (building the associative array z).

Code:
{ z[$3]++ }

2. When we reach the SPECIALS section (s) AND the second field matches exactly the string IN we count every record:

Code:
s && $2 == "IN" { sc++ }

2.1. The following code marks the beginning of the SPECIALS section, it gets reset at the beginning of every file:

Code:
/SPECIALS/ { s = 1 }

Is the above logic clear and correct?
# 20  
Old 03-06-2009
Yes, Its clear now. I just wasn't sure. Thanks for the details radoulov.

I guess I have deviated from my request since this was learn as you go for me.

If its not too much to ask, Under the SPECIALS section we add Values so we know what it contains. My vision looks like

db.xyz
Total number of CNAME records = 23
Total number of A records = 23
Total number of NS records = 6
-----------------------------------------------
Total number of Special records = 8 <<<<<<< total from below
Total number of A records in Special = 3
Total number of NS records in Special = 2
Total number of MX records in Special = 2
Total number of PTR records in Special = 1

Again, thanks for all your efforts !!
# 21  
Old 03-06-2009
Yes, try the following code:

Code:
awk 'END {
  print f ":"
    for (Z in z)
      printf "Total number of %s records = %d\n", \
      Z, z[Z]
    if (sc) {
      print "-----------------------------------"
      printf "Total number of Special records = %d\n", \
      sc  
      for (S in sa)
        printf "Total number of %s records = %d\n", \
        S, sa[S]
        }        
    print RS
    }
FNR == 1 {
  if (f) {
    print f ":"
    for (Z in z)
      printf "Total number of %s records = %d\n", \
      Z, z[Z]
    if (sc) {
      print "-----------------------------------"
      printf "Total number of Special records = %d\n", \
      sc
      for (S in sa)
        printf "Total number of %s records = %d\n", \
        S, sa[S]
        }        
    print RS
    split(x, z)
    split(x, sa)
    s = sc = 0
    }
    f = FILENAME
  }    
$3 ~ /^(PTR|MX|NS|CNAME|A)$/ && !s { z[$3]++ }
s && $2 == "IN" { sc++; sa[$3]++ }
/SPECIALS/ { s = 1 }' db*

Be aware that it assumes that the SPECIALS section is always after the main section!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for a specific word and print only the word from the input file

Hi, I have a sample file as shown below, I am looking for sed or any command which prints the complete word only from the input file. Ex: $ cat "sample.log" I am searching for a word which is present in this file We can do a pattern search using grep but I need to cut only the word which... (1 Reply)
Discussion started by: mohan_kumarcs
1 Replies

2. Shell Programming and Scripting

Count frequency of unique values in specific column

Hi, I have tab-deliminated data similar to the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows: dot 3 cat 1 hot 1 is... (5 Replies)
Discussion started by: owwow14
5 Replies

3. Shell Programming and Scripting

Shell scripting: frequency of specific word in a string and statistics

Hello friends, I need a BIG help from UNIX collective intelligence: I have a CSV file like this: VALUE,TIMESTAMP,TEXT 1,Sun May 05 16:13:05 +0000 2013,"RT @gracecheree: Praying God sends me a really great man one day. Gotta trust in his timing. 0,Sun May 05 16:13:05 +0000 2013,@sendi__... (19 Replies)
Discussion started by: kraterions
19 Replies

4. Shell Programming and Scripting

Convert a list of word/terms into their Regexp representation

Ok this might sound pretty weird but here is the request. Running on a linux system in bash or Perl (i really don't know perl but the end user has a few pearl script already) Start File looks something like this (4000 entries) TEST PLAN T//TF T-TF TEST (T) Hacker ... I am thinking about... (3 Replies)
Discussion started by: oly_r
3 Replies

5. Shell Programming and Scripting

Fetch entries in front of specific word till next word

Hi all I have following file which I have to edit for research purpose file:///tmp/moz-screenshot.png body, div, table, thead, tbody, tfoot, tr, th, td, p { font-family: &quot;Liberation Sans&quot;; font-size: x-small; } Drug: KRP-104 QD Drug: Placebo Drug: Metformin|Drug:... (15 Replies)
Discussion started by: Priyanka Chopra
15 Replies

6. Shell Programming and Scripting

Help with calculating frequency of specific word in a string

Input file: #read_1 AWEAWQQRZZZQWQQWZ #read_2 ZZAQWRQTWQQQWADSADZZZ #read_3 POGZZZZZZADWRR . . Desired output file: #read_1 3 #read_1 1 #read_2 2 #read_2 3 #read_3 6 . . (3 Replies)
Discussion started by: perl_beginner
3 Replies

7. UNIX for Dummies Questions & Answers

How to print line starts with specific word and contains specific word using sed?

Hi, I have gone through may posts and dint find exact solution for my requirement. I have file which consists below data and same file have lot of other data. <MAPPING DESCRIPTION ='' ISVALID ='YES' NAME='m_TASK_UPDATE' OBJECTVERSION ='1'> <MAPPING DESCRIPTION ='' ISVALID ='NO'... (11 Replies)
Discussion started by: tmalik79
11 Replies

8. Shell Programming and Scripting

Word Frequency Sort

hello, Here is a program for creating a word-frequency # wf.gk --- program to generate word frequencies from a file { # remove punctuation: This will remove all punctuations from the file gsub(/_]/, "", $0) #Start frequency analysis for (i = 1; i <= NF; i++) freq++ } END #Print output... (11 Replies)
Discussion started by: gimley
11 Replies

9. Shell Programming and Scripting

word frequency counter - awk solution?

Dear all, i need your help on this. There is a text file, i need to count word frequency for each word with frequency >40 in each line of file and output it into another file with columns like this: word1,word2,word3, ...wordn 0,0,1 1,2,0 3,2,0 etc -- each raw represents... (13 Replies)
Discussion started by: irrevocabile
13 Replies

10. Shell Programming and Scripting

Word frequency with additional information

Hello everyone, I am using a chunk of code to display the frequency of a file name in a list of directories. The code looks like this: find . -name "*.log" | cut -d/ -f4 | cut -d. -f1 | awk '{print $1}' | sort | uniq -c | sort -nr The file paths would look something like this:... (1 Reply)
Discussion started by: ToeLint
1 Replies
Login or Register to Ask a Question