Script to count word occurrences, but exclude some? Post: 302658069

Sponsored Content

Top Forums Shell Programming and Scripting Script to count word occurrences, but exclude some? Post 302658069 by Cronk on Monday 18th of June 2012 07:22:05 PM

06-18-2012

Registered User

I was trying to implement part of your suggestions but ended up with a blank Results file. Here is what I am using:

Code:

time tr -cs "[:alpha:]'" "\n" < $1 | grep -v -f blacklist.txt | sort | uniq -c | sort -rn >counts.txt

The only added part is the 'grep -v -f ...' that you suggested. I created the blacklist text file, one word per line. Blacklist file is in the same directory as the shell script. (Seems like it would complain if it couldn't find it.)

Thanks,
J

---------- Post updated at 03:16 PM ---------- Previous update was at 02:31 PM ----------

Ah, I think I found an answer. Not exactly sure what the difference is, but it appears to work. Smilie

(Remember, this is in a script and the $1 is the script parameter.)

Code:

time tr -cs "[:alpha:]'" "\n" < $1 | grep -viFf  blacklist.txt | sort | uniq -c | sort -rn >counts.txt

This also works (is apparently the same as the above?):

Code:

time tr -cs "[:alpha:]'" "\n" < $1 | fgrep -vif  blacklist.txt | sort | uniq -c | sort -rn >counts.txt

Now, why is it that :
"-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.
"

appears to work the way that the "-f" option SOUNDS like it would work?

---------- Post updated at 04:22 PM ---------- Previous update was at 03:16 PM ----------

OK, I have figured out a bit further: matching only words of 3 or more characters:

Code:

time tr -cs "[:alpha:]'" "\n" < $1 | fgrep -vif  blacklist.txt | egrep '\w{3,}' | sort | uniq -c | sort -rn >counts.txt

I think that the only missing part is the whole capitalization issue. But that isn't a pressing issue, I don't think. And it still appears to be running in less than .03 seconds! Gotta love the shell sometimes.

Thanks all for your suggestions.

-- J

Cronk

View Public Profile for Cronk

Find all posts by Cronk

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

count occurrences and substitute with counter

Hi Unix-Experts, I have a textfile with several occurrences of some string XXX. I'd like to count all the occurrences and number them in reverse order. E.g. input: XXX bla XXX foo XXX output: 3 bla 2 foo 1 I tried to achieve this with sed, but failed. Any suggestions? Thanks in...

2. Shell Programming and Scripting

Count the number of occurrences of the word

I am a newbie in UNIX shell script and seeking help on this UNIX function. Please give me a hand. Thanks. I have a large file. Named as 'MyFile'. It was tab-delmited. I am told to write a shell function that counts the number of occurrences of the ord �mysring� in the file 'MyFile'.

3. Shell Programming and Scripting

Count occurrences in awk

Hello, I have an output from GDB with many entries that looks like this 0x00007ffff7dece94 39 in dl-fini.c 0x00007ffff7dece97 39 in dl-fini.c 0x00007ffff7ab356c 50 in exit.c 0x00007ffff7aed9db in _IO_cleanup () at genops.c:1022 115 in dl-fini.c 0x00007ffff7decf7b in _dl_sort_fini (l=0x0,...

4. Shell Programming and Scripting

How to count occurrences in a specific column

Hi, I need help to count the number of occurrences in $3 of file1.txt. I only know how to count by checking one by one and the code is like this: awk '$3 ~ /aku hanya poyo/ {++c} END {print c}' FS="\t" file1.txt But this is not wise to do as i have hundreds of different occurrences in that...

5. Shell Programming and Scripting

Word Count In A Script

I am in need of a basic format to 1. list all files in a directory 2. list the # of lines in each file 3. list the # of words in each file If someone could give me a basic format i would appreicate it ***ALSO i can not use the FIND command***

6. Shell Programming and Scripting

Word Occurrences script using awk

I'm putting together a script that will the count the occurrences of words in text documents. It works fine so far, but I'd like to make a couple tweaks/additions: 1) I'm having a hard time displaying the array index number, tried freq which just spit 0's back at me 2) Is there any way to...

7. Shell Programming and Scripting

Count occurrences in first column

input amex-11 10 abc amex-11 20 bcn amed-12 1 abc I tried something like this. awk '{h++}; END { for(k in h) print k, h }' rm1 output amex-11 1 10 abc amex-11 1 20 bcn amed-12 2 1 abc Note: The second column represents the occurrences. amex-11 is first one and amed-12 is the...

8. UNIX for Beginners Questions & Answers

UNIX script to check word count of each word in file

I am trying to figure out to find word count of each word from my file sample file hi how are you hi are you ok sample out put hi 1 how 1 are 1 you 1 hi 1 are 1 you 1 ok 1 wc -l filename is not helping , i think we will have to split the lines and count and then print and also...

9. UNIX for Beginners Questions & Answers

awk or sed script to count number of occurrences and creating an average

Hi Friends , I am having one problem as stated file . Having an input CSV file as shown in the code U_TOP_LOGIC/U_HPB2/U_HBRIDGE2/i_core/i_paddr_reg_2_/Q,1,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1,0,0,0,0...

LEARN ABOUT MOJAVE

uniq

UNIQ(1) 						    BSD General Commands Manual 						   UNIQ(1)

NAME

     uniq -- report or filter out repeated lines in a file

SYNOPSIS

     uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]]

DESCRIPTION

     The uniq utility reads the specified input_file comparing adjacent lines, and writes a copy of each unique input line to the output_file.	If
     input_file is a single dash ('-') or absent, the standard input is read.  If output_file is absent, standard output is used for output.  The
     second and succeeding copies of identical adjacent input lines are not written.  Repeated lines in the input will not be detected if they are
     not adjacent, so it may be necessary to sort the files first.

     The following options are available:

     -c      Precede each output line with the count of the number of times the line occurred in the input, followed by a single space.

     -d      Only output lines that are repeated in the input.

     -f num  Ignore the first num fields in each input line when doing comparisons.  A field is a string of non-blank characters separated from
	     adjacent fields by blanks.  Field numbers are one based, i.e., the first field is field one.

     -s chars
	     Ignore the first chars characters in each input line when doing comparisons.  If specified in conjunction with the -f option, the
	     first chars characters after the first num fields will be ignored.  Character numbers are one based, i.e., the first character is
	     character one.

     -u      Only output lines that are not repeated in the input.

     -i      Case insensitive comparison of lines.

ENVIRONMENT

     The LANG, LC_ALL, LC_COLLATE and LC_CTYPE environment variables affect the execution of uniq as described in environ(7).

EXIT STATUS

     The uniq utility exits 0 on success, and >0 if an error occurs.

COMPATIBILITY

     The historic +number and -number options have been deprecated but are still supported in this implementation.

SEE ALSO

     sort(1)

STANDARDS

     The uniq utility conforms to IEEE Std 1003.1-2001 (``POSIX.1'') as amended by Cor. 1-2002.

HISTORY

     A uniq command appeared in Version 3 AT&T UNIX.

BSD
								 December 17, 2009							       BSD

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

count occurrences and substitute with counter

Discussion started by: ptob

2. Shell Programming and Scripting

Count the number of occurrences of the word

Discussion started by: duke0001

3. Shell Programming and Scripting

Count occurrences in awk

Discussion started by: ikke008

4. Shell Programming and Scripting

How to count occurrences in a specific column

Discussion started by: redse171

5. Shell Programming and Scripting

Word Count In A Script

Discussion started by: domdom110

6. Shell Programming and Scripting

Word Occurrences script using awk

Discussion started by: ksmarine1980

7. Shell Programming and Scripting

Count occurrences in first column

Discussion started by: quincyjones

8. UNIX for Beginners Questions & Answers

UNIX script to check word count of each word in file

Discussion started by: mirwasim

9. UNIX for Beginners Questions & Answers

awk or sed script to count number of occurrences and creating an average

Discussion started by: kshitij

LEARN ABOUT MOJAVE

uniq