I would take a slightly different approach. No need for the leading sed, and I would exclude the black list on the output of the awk assuming that will be a shorter list than the output of an initial sed. I'd also strip punctuation/special characters so that something like (word is counted as word without the paren. I'd also check the length after removing specials/punct so that (and is dropped if you want only words that have a length greater than 3.
This can be smashed onto one line, but it's easier to read and commented when written with some structure:
Code:
awk '
BEGIN { RS = "[" FS "\n]" } # break into records based on whitespace and newline (this may require gnu awk and not work in older versions)
{
gsub( "[:,%?<>&@!=+.()]", "", $(i) ); # ditch unwanted punctuation before looking at len
if( length( $0 ) > 3 ) # keep only words long enough
count[$0]++;
}
END {
for( x in count )
print x, count[x];
}' data-file | grep -v -f black-list |sort -k 2rn,2
Last edited by agama; 06-16-2012 at 01:06 PM..
Reason: clarification
Hi Unix-Experts,
I have a textfile with several occurrences of some string XXX. I'd like to count all the occurrences and number them in reverse order.
E.g. input: XXX bla XXX foo XXX
output: 3 bla 2 foo 1
I tried to achieve this with sed, but failed. Any suggestions?
Thanks in... (4 Replies)
I am a newbie in UNIX shell script and seeking help on this UNIX function. Please give me a hand. Thanks.
I have a large file. Named as 'MyFile'. It was tab-delmited. I am told to write a shell function that counts the number of occurrences of the ord “mysring” in the file 'MyFile'. (1 Reply)
Hello,
I have an output from GDB with many entries that looks like this
0x00007ffff7dece94 39 in dl-fini.c
0x00007ffff7dece97 39 in dl-fini.c
0x00007ffff7ab356c 50 in exit.c
0x00007ffff7aed9db in _IO_cleanup () at genops.c:1022
115 in dl-fini.c
0x00007ffff7decf7b in _dl_sort_fini (l=0x0,... (6 Replies)
Hi,
I need help to count the number of occurrences in $3 of file1.txt. I only know how to count by checking one by one and the code is like this:
awk '$3 ~ /aku hanya poyo/ {++c} END {print c}' FS="\t" file1.txt
But this is not wise to do as i have hundreds of different occurrences in that... (10 Replies)
I am in need of a basic format to
1. list all files in a directory
2. list the # of lines in each file
3. list the # of words in each file
If someone could give me a basic format i would appreicate it
***ALSO i can not use the FIND command*** (4 Replies)
I'm putting together a script that will the count the occurrences of words in text documents. It works fine so far, but I'd like to make a couple tweaks/additions:
1) I'm having a hard time displaying the array index number, tried freq which just spit 0's back at me
2) Is there any way to... (12 Replies)
input
amex-11 10 abc
amex-11 20 bcn
amed-12 1 abc
I tried something like this.
awk '{h++}; END { for(k in h) print k, h }' rm1
output
amex-11 1 10 abc
amex-11 1 20 bcn
amed-12 2 1 abc
Note: The second column represents the occurrences. amex-11 is first one and amed-12 is the... (5 Replies)
I am trying to figure out to find word count of each word from my file
sample file
hi how are you
hi are you ok
sample out put
hi 1
how 1
are 1
you 1
hi 1
are 1
you 1
ok 1
wc -l filename is not helping , i think we will have to split the lines and count and then print and also... (4 Replies)
Hi Friends ,
I am having one problem as stated file .
Having an input CSV file as shown in the code
U_TOP_LOGIC/U_HPB2/U_HBRIDGE2/i_core/i_paddr_reg_2_/Q,1,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,0,0... (4 Replies)
Discussion started by: kshitij
4 Replies
LEARN ABOUT DEBIAN
sql::reservedwords::postgresql
SQL::ReservedWords::PostgreSQL(3pm) User Contributed Perl Documentation SQL::ReservedWords::PostgreSQL(3pm)NAME
SQL::ReservedWords::PostgreSQL - Reserved SQL words by PostgreSQL
SYNOPSIS
if ( SQL::ReservedWords::PostgreSQL->is_reserved( $word ) ) {
print "$word is a reserved PostgreSQL word!";
}
DESCRIPTION
Determine if words are reserved by PostgreSQL.
METHODS
is_reserved( $word )
Returns a boolean indicating if $word is reserved by either PostgreSQL 7.3, 7.4, 8.0 or 8.1.
is_reserved_by_postgresql7( $word )
Returns a boolean indicating if $word is reserved by either PostgreSQL 7.3 or 7.4.
is_reserved_by_postgresql8( $word )
Returns a boolean indicating if $word is reserved by either PostgreSQL 8.0 or 8.1.
reserved_by( $word )
Returns a list with PostgreSQL versions that reserves $word.
words
Returns a list with all reserved words.
EXPORTS
Nothing by default. Following subroutines can be exported:
is_reserved
is_reserved_by_postgresql7
is_reserved_by_postgresql8
reserved_by
words
SEE ALSO
SQL::ReservedWords
<http://www.postgresql.org/docs/manuals/>
AUTHOR
Christian Hansen "chansen@cpan.org"
COPYRIGHT
This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.8.8 2008-03-28 SQL::ReservedWords::PostgreSQL(3pm)