Sponsored Content
Top Forums Shell Programming and Scripting Shell scripting: frequency of specific word in a string and statistics Post 302805607 by Skrynesaver on Friday 10th of May 2013 12:37:57 PM
Old 05-10-2013
I'd try a Perl solution to be honest:
Code:
 perl -ne '
chomp;
@rec=split(/,/, $_, 3);
@words=split/\b\s*/,$rec[2];
map {$counts{lc($_)}++ if /^\w+$/;}@words;
END{
  @wanted=qw(that really great day);
  for (sort {$counts{b}<=>$counts{a}} @wanted){
    print "$_ $counts{$_}\n";
  }
} ' tmp/tmp.dat

I'm heading out now, but you could extend the counts data structure to count{total=>${TOTAL COUNTS TO DATE}, appeared=>{++ for each record it appeared in},0=>${+1 if $rec[0]==0}...} and that would allow you produce the extended table you require

Last edited by Skrynesaver; 05-10-2013 at 01:46 PM.. Reason: added wanted array and how to aproach the rest of the requirements
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Determining Word Frequency of Specific Terms

Hello, I require a perl script that will read a .txt file that contains words like 224.199.207.IN-ADDR.ARPA. IN NS NS1.internet.com. 4.200.162.207.in-addr.arpa. IN PTR beeriftw.internet.com. arroyoeinternet.com. IN A 200.199.227.49 I want to focus on words: IN... (23 Replies)
Discussion started by: richsark
23 Replies

2. Shell Programming and Scripting

Finding a word at specific location in a string

Hi All , I have different strings (SQL queries infact) of different lengths such as: 1. "SELECT XYZ FROM ABC WHERE ABC.DEF='123' " 2. "DELETE FROM ABC WHERE ABC.DEF='567'" 3. "SELECT * FROM ABC" I need to find out the word coming after the... (1 Reply)
Discussion started by: swapnil.nawale
1 Replies

3. Shell Programming and Scripting

search a word and print specific string using awk

Hi, I have list of directory paths in a variable and i want to delete those dirs and if dir does not exist then search that string and get the correct path from xml file after that delete the correct directory. i tried to use grep and it prints the entire line from the search.once i get the entire... (7 Replies)
Discussion started by: dragon.1431
7 Replies

4. Shell Programming and Scripting

awk or sed command to print specific string between word and blank space

My source is on each line 98.194.245.255 - - "GET /disp0201.php?poc=4060&roc=1&ps=R&ooc=13&mjv=6&mov=5&rel=5&bod=155&oxi=2&omj=5&ozn=1&dav=20&cd=&daz=&drc=&mo=&sid=&lang=EN&loc=JPN HTTP/1.1" 302 - "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.0.3705; .NET CLR... (5 Replies)
Discussion started by: elamurugu
5 Replies

5. Shell Programming and Scripting

search-word-print-specific-string

Hi, Our input xml looks like: <doc> <str name="account_id">1111</str> <str name="prd_id">DHEP155EK</str> </doc> - <doc> <str name="account_id">6666</str> <str name="prd_id">394531662</str> </doc> - <doc> <str name="account_id">6666</str> <str... (1 Reply)
Discussion started by: Jassz
1 Replies

6. Shell Programming and Scripting

Parse a String for a Specific Word

Hello, I'm almost there with scripting, and I've looked at a few examples that could help me out here. But I'm still at a lost where to start. I'm looking to parse each line in the log file below and save the output like below. Log File AABBCGCAT022|242|3 AABBCGCAT023|243|4... (6 Replies)
Discussion started by: ravzter
6 Replies

7. Shell Programming and Scripting

break the string and print it in a new line after a specific word

Hi Gurus I am new to this forum.. I am using HP Unix OS. I have one single string in input file as shown below Abc123 | cde | fgh | ghik| lmno | Abc456 |one |two |three | four | Abc789 | five | Six | seven | eight | Abc098 | ........ I want to achive the result in a output file as shown... (3 Replies)
Discussion started by: kannansr621
3 Replies

8. UNIX for Dummies Questions & Answers

How to print line starts with specific word and contains specific word using sed?

Hi, I have gone through may posts and dint find exact solution for my requirement. I have file which consists below data and same file have lot of other data. <MAPPING DESCRIPTION ='' ISVALID ='YES' NAME='m_TASK_UPDATE' OBJECTVERSION ='1'> <MAPPING DESCRIPTION ='' ISVALID ='NO'... (11 Replies)
Discussion started by: tmalik79
11 Replies

9. Shell Programming and Scripting

Help with calculating frequency of specific word in a string

Input file: #read_1 AWEAWQQRZZZQWQQWZ #read_2 ZZAQWRQTWQQQWADSADZZZ #read_3 POGZZZZZZADWRR . . Desired output file: #read_1 3 #read_1 1 #read_2 2 #read_2 3 #read_3 6 . . (3 Replies)
Discussion started by: perl_beginner
3 Replies

10. UNIX for Beginners Questions & Answers

Get string before specific word in UNIX

Hi All, I'm writing unix shell script and I have these files. I need to get name before _DETL.tmp. ABC_AAA_DETL.tmp ABC_BBB_DETL.tmp ABC_CCC_DETL.tmp PQR_DETL.tmp DEF_DETL.tmp JKL_DETL.tmp YUI_DETL.tmp TG_NM_DDD_DETL.tmp TG_NM_EEE_DETL.tmp GHJ_DETL.tmp RTY_DETL.tmp output will... (3 Replies)
Discussion started by: ace_friends22
3 Replies
cord2(1)						      General Commands Manual							  cord2(1)

Name
       cord2 - rearranges basic blocks in an executable file to facilitate better cache mapping.

Syntax
       cord2 [-v] [-o outfile] [-c cachewords] [-d] [-b bridge_limit] [-n] [-A addersfile] [[-C countsfile] ...] obj

Description
       The  cord2  command extracts basic blocks from a program and deposits them in a new area in the text, making jumps to and from that area as
       necessary.  By separating the basic blocks, you can reduce instruction cache miss rates.  The cord2 command takes the  output  of  a  pixie
       profiling run as input (see

       The executable object file has the suffix obj.  The cord2 command only requires one addersfile; it creates the filename by appending .Bbad-
       drs to the obj filename if none is specified with -A. Multiple counts files can be specified from many runs with multiple -C arguments.	If
       none are specified, cord2 creates the counts filename by appending .Counts to the obj name.

       Multiple  counts  files are added together into an internal counts array represented with C double-type elements. The counts array elements
       contain the density of a block or cycles/byte.  If you specify -n, then the counts are normalized  so  that  each  counts  array  entry	is
       cycles/totalcycles.   When one counts file is specified, the default is to favor small blocks; -n negates that.	When many counts files are
       specified, -n also negates favoring one counts file. This is because its totalcycles may exceed the totalcycles of another counts file.

       The cord2 command determines which basic blocks to insert by sorting the counts array and collecting the blocks	with  the  highest  counts
       that can fit into the new area.	The cord2 command may skip over huge blocks that do not fit at the end of the new area.

       Once  the  blocks are determined, they are inserted into the new area, and their original location is modified to jump to the new area.	At
       the end of each block in the new area, a jump is added  back  to  the  original	block's  subsequent  or  fall-through  location,  and  the
       branch/jump  target  (if necessary).  Both entering and exiting the new area is optimized to take advantage of other blocks in the new area
       and jump delay slots.

       Often, there may be one or more fall-through blocks of a block in the new area which are small, hardly ever used, and not in the new  area.
       If  the	block  following  these  fall-through blocks is in the new area, the fall-through blocks are called bridge blocks.  It may be more
       costly to generate jumps to and from bridge blocks rather than to simply copy them.

       The cord2 command allows you to specify that bridge blocks be added to the new area if they total less than the	bridge_limit  instructions
       between	two  new-area blocks. You can specify the bridge_limit with -b; the default is zero.  Bridge blocks can bump blocks out of the new
       area that might normally fit into it.

       Because the cord2 command works from profile output, the resulting binary is data dependent. In other words, it may perform  well  only	on
       the  same input data that generated the profile information, and may perform worse than the original binary on other data.  Furthermore, if
       the hot areas in the cache do not fit well into one cachepage, performance can degrade.

Options
       The cord2 command also accepts these options:

       -d   Fill the delay slots with nops only when adding jumps to and from the new area.

       -v   Print verbose information. This includes statistics about the cord2 process.

       -v -v
	    Print all of the -v information, but include detailed disassemblies of the code moved, changed, and generated by cord2.

       -c cachewords
	    Specify the number of words in the cache of the machine on which you want to execute.  This is actually the size of the new area.  The
	    cachesize  may be a misnomer, as you can specify a size other than your machine's cache size; however, it is probably the correct num-
	    ber.

       -o outputfile
	    Specify the output file.  If it is not specified, the default is a.out.cord2.

Restrictions
       The cord2 command adds the new area to the end of text so any program using the etext symbol may not work.  See

See Also
       pixie(1), cord(1)

								       RISC								  cord2(1)
All times are GMT -4. The time now is 07:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy