Sponsored Content
Full Discussion: Count unique words
Top Forums UNIX for Beginners Questions & Answers Count unique words Post 302991765 by wisecracker on Thursday 16th of February 2017 05:51:33 AM
Old 02-16-2017
(Apologies for any typos.)
RudiC has already given you a starter with this, assume 'FN' is pointing to an Entertainment text file:-
Code:
FN='31_October_2012_Entertainment1094.txt'
TMP=${FN##*_}
TMP=${TMP%%[0-9]*}

This would give you a result inside the TMP variable, Entertainment .

So your logic would require a count for each file containing 'Entertainment'.
Similarly for the others.

So what would your logic be to obtain your count(s) per category?

You are here to learn how to do it for yourself and the best way is to attempt something no matter how bad your code looks. We are not here to ridicule your attempts but to correct your logic so that you understand what is going on and become capable of doing it again if need be.
If it is JUST the filenames you want then this will _perhaps_ help:-
ls *.txt > /your/path/to/filenames which will create a single text file with your thousands of filenames ONLY inside it.
grep is your friend here.

However if you intend to read EACH individual file to count these words also, then this is a totally different _animal_.

Last edited by wisecracker; 02-16-2017 at 07:12 AM.. Reason: Added the 'grep' line.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to read all the unique words in a text file

How can i read all the unique words in a file, i used - cat comment_file.txt | /usr/xpg6/bin/tr -sc 'A-Za-z' '/012' and cat comment_file.txt | /usr/xpg6/bin/tr -sdc 'A-Za-z' '/012' but they didnt worked..... (5 Replies)
Discussion started by: aditya.ece1985
5 Replies

2. Shell Programming and Scripting

Finding the number of unique words in a file

find the number of unique words in a file using sort com- mand. (7 Replies)
Discussion started by: abhikamune
7 Replies

3. Shell Programming and Scripting

Shell script to find out words, replace them and count words

hello, i 'd like your help about a bash script which: 1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel, 2.finds the link which leads to the download location of the Latest Stable Kernel version, (the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies

4. Homework & Coursework Questions

unique words in files of folder and its subfolders

Hello, I tried to count all unique words of all files in one folder and its subfolders. Can anybody say me, why this doesnt work: ls| find -d | cat | tr "\ " "\n"| uniq -u | wc -l ??? Cat writes only the names of those files, but not the wors, which should be in them. Thanks for any advice. ... (9 Replies)
Discussion started by: Dworza
9 Replies

5. Shell Programming and Scripting

display unique words.

I am having a file with duplicate words how can I eliminate them ant,bat bat,cat cat a.txt | grep -bat | awk '{print $1}' expecting o/p as ant,bat,cat How can I display the output as ant,bat,cat in a single line and no duplicates exists. (2 Replies)
Discussion started by: shikshavarma
2 Replies

6. Shell Programming and Scripting

Unique words in each line

In each row there could be repetition of a word. I want to delete all repetitions and keep unique occurrences. Example: a+b+c ab+c ab+c abbb+c ab+bbc a+bbbc aaa aaa aaa Output: a+b+c ab+c abbb+c ab+bbc a+bbbc aaa (6 Replies)
Discussion started by: Viernes
6 Replies

7. Shell Programming and Scripting

awk to count using each unique value

Im looking for an awk script that will take the unique values in column 5, then print and count the unique values in column 6. CA001011500 11111 11111 -9999 201301 AAA CA001012040 11111 11111 -9999 201301 AAA CA001012573 11111 11111 -9999 201301 BBB CA001012710 11111 11111 -9999 201301... (4 Replies)
Discussion started by: ncwxpanther
4 Replies

8. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies

9. Shell Programming and Scripting

Count occurrence of column one unique value having unique second column value

Hello Team, I need your help on the following: My input file a.txt is as below: 3330690|373846|108471 3330690|373846|108471 0640829|459725|100001 0640829|459725|100001 3330690|373847|108471 Here row 1 and row 2 of column 1 are identical but corresponding column 2 value are... (4 Replies)
Discussion started by: angshuman
4 Replies

10. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies
SHTOOL-MDATE.TMP(1)					      GNU Portable Shell Tool					       SHTOOL-MDATE.TMP(1)

NAME
shtool-mdate - GNU shtool pretty-print last modification time SYNOPSIS
shtool mdate [-n|--newline] [-z|--zero] [-s|--shorten] [-d|--digits] [-f|--field-sep str] [-o|--order spec] path DESCRIPTION
This command pretty-prints the last modification time of a given file or directory path, while still allowing one to specify the format of the date to display. OPTIONS
The following command line options are available. -n, --newline By default, output is written to stdout followed by a "newline" (ASCII character 0x0a). If option -n is used, this newline character is omitted. -z, --zero Pads numeric day and numeric month with a leading zero. Default is to have variable width. -s, --shorten Shortens the name of the month to a english three character abbreviation. Default is full english name. This option is silently ignored when combined with -d. -d, --digits Use digits for month. Default is to use a english name. -f, --field-sep str Field separator string between the day month year tripple. Default is a single space character. -o, --order spec Specifies order of the day month year elements within the tripple. Each element represented as a single character out of ``"d"'', ``"m"'' and ``"y"''. The default for spec is ``"dmy"''. EXAMPLE
# shell script shtool mdate -n / shtool mdate -f '/' -z -d -o ymd foo.txt shtool mdate -f '-' -s foo.txt HISTORY
The GNU shtool mdate command was originally written by Ulrich Drepper in 1995 and revised by Ralf S. Engelschall <rse@engelschall.com> in 1998 for inclusion into GNU shtool. SEE ALSO
shtool(1), date(1), ls(1). 18-Jul-2008 shtool 2.0.8 SHTOOL-MDATE.TMP(1)
All times are GMT -4. The time now is 02:39 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy