Counting number of files that contain words stored in another file

01-28-2011

Registered User

190, 1

Join Date: Jan 2011

Last Activity: 11 October 2017, 1:35 PM EDT

Location: Nowhere

Posts: 190

Thanks Given: 227

Thanked 1 Time in 1 Post

Counting number of files that contain words stored in another file

Hi All,

I have written a script on this but it does not do the requisite job. My requirement is this:

1. I have two kinds of files each with different extensions. One set of files are *.dat (6000 unique DAT files all in one directory) and another set *.dic files (6000 unique DIC files in all in the same directory where DAT files are located)

2. The files only contain words all in new lines. For example:
1.dat contains something like this:

Code:

computer
red
apple
orange

1.dic looks like this:

Code:

computer
apple
red
blue

3. For every corresponding DAT file there is a DIC file. For 1.dat, I have 1.dic, 2.dat and 2.dic .......6000.dat and 6000.dic

4. What I want to do is to read every word from DIC files and search in all DAT files and find the number of DAT files that contain that word from the DIC file and store the result in FIL files. This means I have to only count once in the DAT files even if that word appears several times in that DAT file. For example:
1.dic contains 10 words, I read every word from 1.dic line by line and search in all DAT files as to how many DAT files contain that word from 1.dic. Then I write the result (i.e. count values) in every line in 1.fil. Similarly, I read every word in 2.dic line by line, search words in all DAT files and write the count values in 2.fil. My 2.fil should look something like this:

Code:

i.e word in the first line (of 2.dic) appears 2 times in all the DAT files (counting that word only once in all DAT files even if one DAT file contains that word several times). Same thing has to be done with all the 6000 DIC files.

What I have done so far:

Code:

for DAT in *.dat
do
for DIC in *.dic
do
while read word
CNT=$(basename "$DAT" .dat).fil
DIC=$(basename "$DAT" .dat).dic
grep -il "$word" | find . | wc -l $DIC $DAT > $FIL
done
done

shoaibjameel123

View Public Profile for shoaibjameel123

Find all posts by shoaibjameel123

01-28-2011

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Bumping up posts or double posting is not permitted in these forums.

Please read the rules, which you agreed to when you registered, if you have not already done so.

You may receive an infraction for this. If so, don't worry, just try to follow the rules more carefully. The infraction will expire in the near future

Thank You.

The UNIX and Linux Forums.

Please continue here:

https://www.unix.com/shell-programmin...ther-file.html

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

Shell Programming and Scripting

Counting number of files that contain words stored in another file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Discussion started by: jmarx

2. Shell Programming and Scripting

Eliminating words from a file through ngrams stored in another file

Discussion started by: gimley

3. Shell Programming and Scripting

Counting occurrences of all words in multiple files

Discussion started by: twjolson

4. Shell Programming and Scripting

Counting occurrence of all words in a file

Discussion started by: r4v3n

5. Shell Programming and Scripting

Help in counting the no of repeated words with count in a file

Discussion started by: bha148

6. Programming

Counting the words in a file

Discussion started by: ramkrix

7. UNIX for Dummies Questions & Answers

counting words then amending to a file

Discussion started by: iago

8. UNIX for Dummies Questions & Answers

count the number of files which have a search string, but counting the file only once

Discussion started by: sudheshnaiyer

9. UNIX for Dummies Questions & Answers

Counting number of files in a directory

Discussion started by: iamalex

10. Shell Programming and Scripting

Counting words in a file

Discussion started by: r0mulus