Sponsored Content
Top Forums Shell Programming and Scripting Script to count word occurrences, but exclude some? Post 302657019 by agama on Friday 15th of June 2012 10:16:38 PM
Old 06-15-2012
Given your requirements for black listing and finding capitalised leading letters, I'd probably have approached it this way:

Code:
awk '
    NR == FNR { $1; blist[$1]; next; }      # read black list

    {
        for( i=1; i <= NF; i++ )
        {
            ignore = ignore_nxt;
            ignore_nxt = 0;
            ignore_nxt = ( match(  $i, "[?.!]" ) && RSTART == length( $i ) );

            gsub( "[:,%?<>&@!=+.()]", "", $(i) );       # trash punctuation not considered part of a word
            if( length( $(i) ) > 3 )
            {
                count[$(i)]++;
                fc = substr( $(i), 1, 1 );
                if( !ignore && fc >= "A" && fc <= "Z" )
                    cap++;
            }
        }
    }
    END {
        printf( "words starting with a capital: %d\n", cap ) >"/dev/fd/2";  # out to stderr so it doesnt sort
        for( x in count )
        {
            if( !( x in blist ) )
                print x, count[x];
        }
    }
' blacklist.file text-file | sort -k 2nr,2

The captialisation is tricky. You can count all words with capitalised letters, or ignore those that immediatly follow a full stop (.), question (?) or explaination (!). The code above does the latter -- effectively counting proper names that appear in the middle of the sentence. You can comment out the statements that check for and set the ignore variables, and it will count all words that start with a capitalised letter and are larger than 3 characters in length.

Might not be exactly what you want, but it should give you an idea of one method.
This User Gave Thanks to agama For This Post:
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

count occurrences and substitute with counter

Hi Unix-Experts, I have a textfile with several occurrences of some string XXX. I'd like to count all the occurrences and number them in reverse order. E.g. input: XXX bla XXX foo XXX output: 3 bla 2 foo 1 I tried to achieve this with sed, but failed. Any suggestions? Thanks in... (4 Replies)
Discussion started by: ptob
4 Replies

2. Shell Programming and Scripting

Count the number of occurrences of the word

I am a newbie in UNIX shell script and seeking help on this UNIX function. Please give me a hand. Thanks. I have a large file. Named as 'MyFile'. It was tab-delmited. I am told to write a shell function that counts the number of occurrences of the ord “mysring” in the file 'MyFile'. (1 Reply)
Discussion started by: duke0001
1 Replies

3. Shell Programming and Scripting

Count occurrences in awk

Hello, I have an output from GDB with many entries that looks like this 0x00007ffff7dece94 39 in dl-fini.c 0x00007ffff7dece97 39 in dl-fini.c 0x00007ffff7ab356c 50 in exit.c 0x00007ffff7aed9db in _IO_cleanup () at genops.c:1022 115 in dl-fini.c 0x00007ffff7decf7b in _dl_sort_fini (l=0x0,... (6 Replies)
Discussion started by: ikke008
6 Replies

4. Shell Programming and Scripting

How to count occurrences in a specific column

Hi, I need help to count the number of occurrences in $3 of file1.txt. I only know how to count by checking one by one and the code is like this: awk '$3 ~ /aku hanya poyo/ {++c} END {print c}' FS="\t" file1.txt But this is not wise to do as i have hundreds of different occurrences in that... (10 Replies)
Discussion started by: redse171
10 Replies

5. Shell Programming and Scripting

Word Count In A Script

I am in need of a basic format to 1. list all files in a directory 2. list the # of lines in each file 3. list the # of words in each file If someone could give me a basic format i would appreicate it ***ALSO i can not use the FIND command*** (4 Replies)
Discussion started by: domdom110
4 Replies

6. Shell Programming and Scripting

Word Occurrences script using awk

I'm putting together a script that will the count the occurrences of words in text documents. It works fine so far, but I'd like to make a couple tweaks/additions: 1) I'm having a hard time displaying the array index number, tried freq which just spit 0's back at me 2) Is there any way to... (12 Replies)
Discussion started by: ksmarine1980
12 Replies

7. Shell Programming and Scripting

Count occurrences in first column

input amex-11 10 abc amex-11 20 bcn amed-12 1 abc I tried something like this. awk '{h++}; END { for(k in h) print k, h }' rm1 output amex-11 1 10 abc amex-11 1 20 bcn amed-12 2 1 abc Note: The second column represents the occurrences. amex-11 is first one and amed-12 is the... (5 Replies)
Discussion started by: quincyjones
5 Replies

8. UNIX for Beginners Questions & Answers

UNIX script to check word count of each word in file

I am trying to figure out to find word count of each word from my file sample file hi how are you hi are you ok sample out put hi 1 how 1 are 1 you 1 hi 1 are 1 you 1 ok 1 wc -l filename is not helping , i think we will have to split the lines and count and then print and also... (4 Replies)
Discussion started by: mirwasim
4 Replies

9. UNIX for Beginners Questions & Answers

awk or sed script to count number of occurrences and creating an average

Hi Friends , I am having one problem as stated file . Having an input CSV file as shown in the code U_TOP_LOGIC/U_HPB2/U_HBRIDGE2/i_core/i_paddr_reg_2_/Q,1,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,0,0... (4 Replies)
Discussion started by: kshitij
4 Replies
DD(1)							      General Commands Manual							     DD(1)

NAME
dd - disk dumper SYNOPSIS
dd [option = value] ... EXAMPLES
dd if=/dev/fd0 of=/dev/fd1 # Copy disk 0 to disk 1 dd if=x of=y bs=1w skip=4 # Copy x to y, skipping 4 words dd if=x of=y count=3 # Copy three 512-byte blocks DESCRIPTION
This command is intended for copying partial files. The block size, skip count, and number of blocks to copy can be specified. The options are: if = file - Input file (default is stdin) of = file - Output file (default is standard output) ibs = n - Input block size (default 512 bytes) obs = n - Output block size (default is 512 bytes) bs = n - Block size; sets ibs and obs (default is 512 bytes) skip = n - Skip n input blocks before reading seek = n - Skip n output blocks before writing count = n - Copy only n input blocks conv = lcase - Convert upper case letters to lower case conv = ucase - Convert lower case letters to upper case conv = swab - Swap every pair of bytes conv = noerror- Ignore errors and just keep going conv = silent- Suppress statistics (Minix specific flag) Where sizes are expected, they are in bytes. However, the letters w, b, or k may be appended to the number to indicate words (2 bytes), blocks (512 bytes), or K (1024 bytes), respectively. When dd is finished, it reports the number of full and partial blocks read and writ- ten. SEE ALSO
vol(1). DD(1)
All times are GMT -4. The time now is 09:05 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy