Sponsored Content
Top Forums Shell Programming and Scripting Script to count word occurrences, but exclude some? Post 302657143 by agama on Saturday 16th of June 2012 12:05:49 PM
Old 06-16-2012
I would take a slightly different approach. No need for the leading sed, and I would exclude the black list on the output of the awk assuming that will be a shorter list than the output of an initial sed. I'd also strip punctuation/special characters so that something like (word is counted as word without the paren. I'd also check the length after removing specials/punct so that (and is dropped if you want only words that have a length greater than 3.

This can be smashed onto one line, but it's easier to read and commented when written with some structure:

Code:
awk '
    BEGIN { RS = "[" FS "\n]" }         # break into records based on whitespace and newline (this may require gnu awk and not work in older versions)
    { 
        gsub( "[:,%?<>&@!=+.()]", "", $(i) );   # ditch unwanted punctuation before looking at len
        if( length( $0 ) > 3 )                  # keep only words long enough
            count[$0]++; 
    } 

    END {
        for( x in count )
            print x, count[x];
    }'  data-file | grep -v -f black-list |sort -k 2rn,2


Last edited by agama; 06-16-2012 at 01:06 PM.. Reason: clarification
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

count occurrences and substitute with counter

Hi Unix-Experts, I have a textfile with several occurrences of some string XXX. I'd like to count all the occurrences and number them in reverse order. E.g. input: XXX bla XXX foo XXX output: 3 bla 2 foo 1 I tried to achieve this with sed, but failed. Any suggestions? Thanks in... (4 Replies)
Discussion started by: ptob
4 Replies

2. Shell Programming and Scripting

Count the number of occurrences of the word

I am a newbie in UNIX shell script and seeking help on this UNIX function. Please give me a hand. Thanks. I have a large file. Named as 'MyFile'. It was tab-delmited. I am told to write a shell function that counts the number of occurrences of the ord “mysring” in the file 'MyFile'. (1 Reply)
Discussion started by: duke0001
1 Replies

3. Shell Programming and Scripting

Count occurrences in awk

Hello, I have an output from GDB with many entries that looks like this 0x00007ffff7dece94 39 in dl-fini.c 0x00007ffff7dece97 39 in dl-fini.c 0x00007ffff7ab356c 50 in exit.c 0x00007ffff7aed9db in _IO_cleanup () at genops.c:1022 115 in dl-fini.c 0x00007ffff7decf7b in _dl_sort_fini (l=0x0,... (6 Replies)
Discussion started by: ikke008
6 Replies

4. Shell Programming and Scripting

How to count occurrences in a specific column

Hi, I need help to count the number of occurrences in $3 of file1.txt. I only know how to count by checking one by one and the code is like this: awk '$3 ~ /aku hanya poyo/ {++c} END {print c}' FS="\t" file1.txt But this is not wise to do as i have hundreds of different occurrences in that... (10 Replies)
Discussion started by: redse171
10 Replies

5. Shell Programming and Scripting

Word Count In A Script

I am in need of a basic format to 1. list all files in a directory 2. list the # of lines in each file 3. list the # of words in each file If someone could give me a basic format i would appreicate it ***ALSO i can not use the FIND command*** (4 Replies)
Discussion started by: domdom110
4 Replies

6. Shell Programming and Scripting

Word Occurrences script using awk

I'm putting together a script that will the count the occurrences of words in text documents. It works fine so far, but I'd like to make a couple tweaks/additions: 1) I'm having a hard time displaying the array index number, tried freq which just spit 0's back at me 2) Is there any way to... (12 Replies)
Discussion started by: ksmarine1980
12 Replies

7. Shell Programming and Scripting

Count occurrences in first column

input amex-11 10 abc amex-11 20 bcn amed-12 1 abc I tried something like this. awk '{h++}; END { for(k in h) print k, h }' rm1 output amex-11 1 10 abc amex-11 1 20 bcn amed-12 2 1 abc Note: The second column represents the occurrences. amex-11 is first one and amed-12 is the... (5 Replies)
Discussion started by: quincyjones
5 Replies

8. UNIX for Beginners Questions & Answers

UNIX script to check word count of each word in file

I am trying to figure out to find word count of each word from my file sample file hi how are you hi are you ok sample out put hi 1 how 1 are 1 you 1 hi 1 are 1 you 1 ok 1 wc -l filename is not helping , i think we will have to split the lines and count and then print and also... (4 Replies)
Discussion started by: mirwasim
4 Replies

9. UNIX for Beginners Questions & Answers

awk or sed script to count number of occurrences and creating an average

Hi Friends , I am having one problem as stated file . Having an input CSV file as shown in the code U_TOP_LOGIC/U_HPB2/U_HBRIDGE2/i_core/i_paddr_reg_2_/Q,1,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,0,0... (4 Replies)
Discussion started by: kshitij
4 Replies
SQL::ReservedWords::PostgreSQL(3pm)			User Contributed Perl Documentation		       SQL::ReservedWords::PostgreSQL(3pm)

NAME
SQL::ReservedWords::PostgreSQL - Reserved SQL words by PostgreSQL SYNOPSIS
if ( SQL::ReservedWords::PostgreSQL->is_reserved( $word ) ) { print "$word is a reserved PostgreSQL word!"; } DESCRIPTION
Determine if words are reserved by PostgreSQL. METHODS
is_reserved( $word ) Returns a boolean indicating if $word is reserved by either PostgreSQL 7.3, 7.4, 8.0 or 8.1. is_reserved_by_postgresql7( $word ) Returns a boolean indicating if $word is reserved by either PostgreSQL 7.3 or 7.4. is_reserved_by_postgresql8( $word ) Returns a boolean indicating if $word is reserved by either PostgreSQL 8.0 or 8.1. reserved_by( $word ) Returns a list with PostgreSQL versions that reserves $word. words Returns a list with all reserved words. EXPORTS
Nothing by default. Following subroutines can be exported: is_reserved is_reserved_by_postgresql7 is_reserved_by_postgresql8 reserved_by words SEE ALSO
SQL::ReservedWords <http://www.postgresql.org/docs/manuals/> AUTHOR
Christian Hansen "chansen@cpan.org" COPYRIGHT
This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.8.8 2008-03-28 SQL::ReservedWords::PostgreSQL(3pm)
All times are GMT -4. The time now is 09:46 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy