UNIX for Beginners Questions & Answers

If you're not sure where to post a Unix or Linux question, post it here. All unix and Linux beginners welcome in this forum!

Count unique words


👤 Login to reply

    #1  
Old 02-15-2017
imranrasheedamu imranrasheedamu is offline
Registered User
 
Join Date: Jun 2016
Last Activity: 9 June 2018, 11:07 PM EDT
Posts: 14
Thanks: 11
Thanked 0 Times in 0 Posts
Count unique words

Dear all,

I would like to know how to list and count unique words in thousands number of text files.

Please help me out
thanks in advance
Sponsored Links
    #2  
Old 02-15-2017
joeyg's Unix or Linux Image
joeyg joeyg is offline Forum Staff  
modérateur
 
Join Date: Dec 2007
Last Activity: 20 July 2018, 12:23 PM EDT
Location: Within two miles of a Dunkin donuts.
Posts: 2,486
Thanks: 145
Thanked 209 Times in 184 Posts
What have you tried?

Also, due to the vague nature of this request, it appears that this may be homework/classwork. If it is, there are specific rules relative to schoolwork.
The Following 2 Users Say Thank You to joeyg For This Useful Post:
imranrasheedamu (02-16-2017), rbatte1 (02-15-2017)
Sponsored Links
    #3  
Old 02-16-2017
imranrasheedamu imranrasheedamu is offline
Registered User
 
Join Date: Jun 2016
Last Activity: 9 June 2018, 11:07 PM EDT
Posts: 14
Thanks: 11
Thanked 0 Times in 0 Posts
Dear Joeyg

I have a list of thousands of text files like


Code:
3_March_2013_Front19.txt
10_May_2014_Page326.txt
5_October_2013_Sports36.txt
27_September_2010_Health314.txt
19_December_2012_Page316.txt
31_October_2012_Entertainment1094.txt
15_April_2013_Front14.txt
1_March_2013_Science&Technology33.txt
6_March_2012_MuslimWorld2.txt
19_October_2012_MuslimWorld4.txt
7_February_2012_International312.txt
23_August_2012_Front8.txt
24_July_2012_National22.txt
25_September_2012_Front20.txt
3_October_2014_Page35.txt

So, I would like to count the of total number and unique words for all files based on fourth field of the filename.

e.g.

Code:
if(filename==National)
count total and unique words

if(filename==International)
count total and unique words

if(filename==Health)
count total and unique words

and so on...

Please help me


Moderator's Comments:
Count unique words Please use CODE tags as required by forum rules!

Last edited by RudiC; 02-16-2017 at 03:14 AM.. Reason: Added CODE tags.
    #4  
Old 02-16-2017
RudiC RudiC is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 21 July 2018, 12:24 PM EDT
Location: Aachen, Germany
Posts: 13,082
Thanks: 452
Thanked 4,017 Times in 3,693 Posts
How about sth along this line?
Code:
for FN in *.txt
  do    TMP=${FN##*_}
        TMP=${TMP%%[0-9]*}
        echo "if(filename==$TMP)"
        echo count total and unique words
        echo
  done
if(filename==Page)
count total and unique words

if(filename==Front)
count total and unique words

.
.
.

Please note that your pseudo code is not somewhere near any real code doing what you seem to describe.
The Following User Says Thank You to RudiC For This Useful Post:
imranrasheedamu (02-16-2017)
Sponsored Links
    #5  
Old 02-16-2017
imranrasheedamu imranrasheedamu is offline
Registered User
 
Join Date: Jun 2016
Last Activity: 9 June 2018, 11:07 PM EDT
Posts: 14
Thanks: 11
Thanked 0 Times in 0 Posts
Actually Sir, I like to count words all those files whose filename contains National, Page, International, Health & Entertainment etc.
Sponsored Links
    #6  
Old 02-16-2017
RudiC RudiC is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 21 July 2018, 12:24 PM EDT
Location: Aachen, Germany
Posts: 13,082
Thanks: 452
Thanked 4,017 Times in 3,693 Posts
So - how would you do that? And how would you handle the results?
Sponsored Links
    #7  
Old 02-16-2017
wisecracker's Unix or Linux Image
wisecracker wisecracker is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 21 July 2018, 5:17 PM EDT
Location: Loughborough
Posts: 1,300
Thanks: 388
Thanked 353 Times in 278 Posts
(Apologies for any typos.)
RudiC has already given you a starter with this, assume 'FN' is pointing to an Entertainment text file:-
Code:
FN='31_October_2012_Entertainment1094.txt'
TMP=${FN##*_}
TMP=${TMP%%[0-9]*}

This would give you a result inside the TMP variable, Entertainment .

So your logic would require a count for each file containing 'Entertainment'.
Similarly for the others.

So what would your logic be to obtain your count(s) per category?

You are here to learn how to do it for yourself and the best way is to attempt something no matter how bad your code looks. We are not here to ridicule your attempts but to correct your logic so that you understand what is going on and become capable of doing it again if need be.
If it is JUST the filenames you want then this will _perhaps_ help:-
ls *.txt > /your/path/to/filenames which will create a single text file with your thousands of filenames ONLY inside it.
grep is your friend here.

However if you intend to read EACH individual file to count these words also, then this is a totally different _animal_.

Last edited by wisecracker; 02-16-2017 at 06:12 AM.. Reason: Added the 'grep' line.
Sponsored Links
👤 Login to reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How count the number of two words associated with the two words occurring in the file? jmarx Shell Programming and Scripting 1 05-06-2014 05:12 PM
Unique words in each line Viernes Shell Programming and Scripting 6 01-27-2013 01:17 PM
display unique words. shikshavarma Shell Programming and Scripting 2 04-16-2012 04:03 AM
unique words in files of folder and its subfolders Dworza Homework & Coursework Questions 9 03-14-2011 03:24 AM
Shell script to find out words, replace them and count words alex83 Shell Programming and Scripting 3 12-05-2010 04:18 PM



All times are GMT -4. The time now is 07:26 PM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
×
UNIX.COM Login
Username:
Password:  
Show Password





Not a Forum Member?
Forgot Password?