Sponsored Content
Top Forums Shell Programming and Scripting Shell scripting: frequency of specific word in a string and statistics Post 302804987 by kraterions on Friday 10th of May 2013 12:11:27 PM
Old 05-10-2013
Shell scripting: frequency of specific word in a string and statistics

Hello friends, I need a BIG help from UNIX collective intelligence:

I have a CSV file like this:

Code:
VALUE,TIMESTAMP,TEXT
1,Sun May 05 16:13:05 +0000 2013,"RT @gracecheree: Praying God sends me a really great man one day. Gotta trust in his timing. 
0,Sun May 05 16:13:05 +0000 2013,@sendi__ we're seeing that on 25th x,azzeslam,Azhar :),sendi__,,,,,
-1,Sun May 05 16:13:05 +0000 2013,still BN. in BaganSerai,Time_Lock,Azrif Asmi,,,,,,
0,Sun May 05 16:13:07 +0000 2013,Can't trust NO bitch!,_SoSoftWilliams,Kenya .. ‚τ°,,,,,,
0,Sun May 05 16:13:07 +0000 2013," me, i'll take some. όνΗ",_blasianBOMB,JohnnyRocket.,,,,,,
1,Sun May 05 16:13:07 +0000 2013,"she'll be okay,  dear @tweetsfrmleyka_",elisyax,-

Now, in order to get some statistical info, I'd like to extract specific words from field (TEXT), their values from the first filed (VALUE) and then obtain 2 CSV:

CSV 1:

WORDS to search in field=TEXT, case insensitive.
Search and list in decreasing order the first 50 words and their values as follow:

OUTPUT CSV 1:

Code:
WORD,TOTFrequency,(1)Frequency,(0)Frequency,(-1)Frequency,1%Frequency,0%Frequency,-1%Frequency
that,456,150,13,258,10%,40%,50%
really,345,212,115,100,52%,33%,15%
great,245,111,65,23,15%,15%,60%
day,123,55,25,32,20%,20%,60%


CSV 2:

WORDS to search in field=TEXT, case insensitive, where |=or ( that|really|great|day|.......)
Search and list in decreasing order specific words and their values as follow:

OUTPUT CSV

Code:
WORD,TOTFrequency,(1)Frequency,(0)Frequency,(-1)Frequency,1%Frequency,0%Frequency,-1%Frequency
that,456,150,13,258,10%,40%,50%
really,345,212,115,100,52%,33%,15%
great,245,111,65,23,15%,15%,60%
day,123,55,25,32,20%,20%,60%
.............

SHELL, AWK, PYTHON, ETC......

Many thanks for your BIG help in advance.
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!


---------- Post updated 05-10-13 at 11:11 AM ---------- Previous update was 05-09-13 at 11:29 AM ----------

I found this very handy script from radoulov and i guess that can be a good starting point:

Code:
awk 'END {
  print f ":"
    for (Z in z)
      printf "Total number of %s records = %d\n", \
      Z, z[Z]
    if (sc) {
      print "-----------------------------------"
      printf "Total number of Special records = %d\n", \
      sc  
      for (S in sa)
        printf "Total number of %s records = %d\n", \
        S, sa[S]
        }        
    print RS
    }
FNR == 1 {
  if (f) {
    print f ":"
    for (Z in z)
      printf "Total number of %s records = %d\n", \
      Z, z[Z]
    if (sc) {
      print "-----------------------------------"
      printf "Total number of Special records = %d\n", \
      sc
      for (S in sa)
        printf "Total number of %s records = %d\n", \
        S, sa[S]
        }        
    print RS
    split(x, z)
    split(x, sa)
    s = sc = 0
    }
    f = FILENAME
  }    
$3 ~ /^(PTR|MX|NS|CNAME|A)$/ && !s { z[$3]++ }
s && $2 == "IN" { sc++; sa[$3]++ }
/SPECIALS/ { s = 1 }' db*


some help in order to adapt?

many thanks friends!

Last edited by Scott; 05-10-2013 at 01:25 PM.. Reason: CODE tags, not ICODE tags, please.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Determining Word Frequency of Specific Terms

Hello, I require a perl script that will read a .txt file that contains words like 224.199.207.IN-ADDR.ARPA. IN NS NS1.internet.com. 4.200.162.207.in-addr.arpa. IN PTR beeriftw.internet.com. arroyoeinternet.com. IN A 200.199.227.49 I want to focus on words: IN... (23 Replies)
Discussion started by: richsark
23 Replies

2. Shell Programming and Scripting

Finding a word at specific location in a string

Hi All , I have different strings (SQL queries infact) of different lengths such as: 1. "SELECT XYZ FROM ABC WHERE ABC.DEF='123' " 2. "DELETE FROM ABC WHERE ABC.DEF='567'" 3. "SELECT * FROM ABC" I need to find out the word coming after the... (1 Reply)
Discussion started by: swapnil.nawale
1 Replies

3. Shell Programming and Scripting

search a word and print specific string using awk

Hi, I have list of directory paths in a variable and i want to delete those dirs and if dir does not exist then search that string and get the correct path from xml file after that delete the correct directory. i tried to use grep and it prints the entire line from the search.once i get the entire... (7 Replies)
Discussion started by: dragon.1431
7 Replies

4. Shell Programming and Scripting

awk or sed command to print specific string between word and blank space

My source is on each line 98.194.245.255 - - "GET /disp0201.php?poc=4060&roc=1&ps=R&ooc=13&mjv=6&mov=5&rel=5&bod=155&oxi=2&omj=5&ozn=1&dav=20&cd=&daz=&drc=&mo=&sid=&lang=EN&loc=JPN HTTP/1.1" 302 - "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.0.3705; .NET CLR... (5 Replies)
Discussion started by: elamurugu
5 Replies

5. Shell Programming and Scripting

search-word-print-specific-string

Hi, Our input xml looks like: <doc> <str name="account_id">1111</str> <str name="prd_id">DHEP155EK</str> </doc> - <doc> <str name="account_id">6666</str> <str name="prd_id">394531662</str> </doc> - <doc> <str name="account_id">6666</str> <str... (1 Reply)
Discussion started by: Jassz
1 Replies

6. Shell Programming and Scripting

Parse a String for a Specific Word

Hello, I'm almost there with scripting, and I've looked at a few examples that could help me out here. But I'm still at a lost where to start. I'm looking to parse each line in the log file below and save the output like below. Log File AABBCGCAT022|242|3 AABBCGCAT023|243|4... (6 Replies)
Discussion started by: ravzter
6 Replies

7. Shell Programming and Scripting

break the string and print it in a new line after a specific word

Hi Gurus I am new to this forum.. I am using HP Unix OS. I have one single string in input file as shown below Abc123 | cde | fgh | ghik| lmno | Abc456 |one |two |three | four | Abc789 | five | Six | seven | eight | Abc098 | ........ I want to achive the result in a output file as shown... (3 Replies)
Discussion started by: kannansr621
3 Replies

8. UNIX for Dummies Questions & Answers

How to print line starts with specific word and contains specific word using sed?

Hi, I have gone through may posts and dint find exact solution for my requirement. I have file which consists below data and same file have lot of other data. <MAPPING DESCRIPTION ='' ISVALID ='YES' NAME='m_TASK_UPDATE' OBJECTVERSION ='1'> <MAPPING DESCRIPTION ='' ISVALID ='NO'... (11 Replies)
Discussion started by: tmalik79
11 Replies

9. Shell Programming and Scripting

Help with calculating frequency of specific word in a string

Input file: #read_1 AWEAWQQRZZZQWQQWZ #read_2 ZZAQWRQTWQQQWADSADZZZ #read_3 POGZZZZZZADWRR . . Desired output file: #read_1 3 #read_1 1 #read_2 2 #read_2 3 #read_3 6 . . (3 Replies)
Discussion started by: perl_beginner
3 Replies

10. UNIX for Beginners Questions & Answers

Get string before specific word in UNIX

Hi All, I'm writing unix shell script and I have these files. I need to get name before _DETL.tmp. ABC_AAA_DETL.tmp ABC_BBB_DETL.tmp ABC_CCC_DETL.tmp PQR_DETL.tmp DEF_DETL.tmp JKL_DETL.tmp YUI_DETL.tmp TG_NM_DDD_DETL.tmp TG_NM_EEE_DETL.tmp GHJ_DETL.tmp RTY_DETL.tmp output will... (3 Replies)
Discussion started by: ace_friends22
3 Replies
All times are GMT -4. The time now is 06:55 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy