Counting number of files that contain words stored in another file Post: 302491735

Sponsored Content

Top Forums Shell Programming and Scripting Counting number of files that contain words stored in another file Post 302491735 by shoaibjameel123 on Friday 28th of January 2011 06:43:37 AM

01-28-2011

Registered User

Counting number of files that contain words stored in another file

Hi All,

I have written a script on this but it does not do the requisite job. My requirement is this:

1. I have two kinds of files each with different extensions. One set of files are *.dat (6000 unique DAT files all in one directory) and another set *.dic files (6000 unique DIC files in all in the same directory where DAT files are located)

2. The files only contain words all in new lines. For example:
1.dat contains something like this:

Code:

computer
red
apple
orange

1.dic looks like this:

Code:

computer
apple
red
blue

3. For every corresponding DAT file there is a DIC file. For 1.dat, I have 1.dic, 2.dat and 2.dic .......6000.dat and 6000.dic

4. What I want to do is to read every word from DIC files and search in all DAT files and find the number of DAT files that contain that word from the DIC file and store the result in FIL files. This means I have to only count once in the DAT files even if that word appears several times in that DAT file. For example:
1.dic contains 10 words, I read every word from 1.dic line by line and search in all DAT files as to how many DAT files contain that word from 1.dic. Then I write the result (i.e. count values) in every line in 1.fil. Similarly, I read every word in 2.dic line by line, search words in all DAT files and write the count values in 2.fil. My 2.fil should look something like this:

Code:

i.e word in the first line (of 2.dic) appears 2 times in all the DAT files (counting that word only once in all DAT files even if one DAT file contains that word several times). Same thing has to be done with all the 6000 DIC files.

What I have done so far:

Code:

for DAT in *.dat
do
for DIC in *.dic
do
while read word
CNT=$(basename "$DAT" .dat).fil
DIC=$(basename "$DAT" .dat).dic
grep -il "$word" | find . | wc -l $DIC $DAT > $FIL
done
done

shoaibjameel123

View Public Profile for shoaibjameel123

Find all posts by shoaibjameel123

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Counting words in a file

I'm trying to figure out a way to count the number of words in the follwing file: cal 2002 > file1 Is there anyway to do this without using wc but instead using the cut command?

2. UNIX for Dummies Questions & Answers

Counting number of files in a directory

Some simple questions from a simple man. If i wanted to count the number of files contained within a directory, say /tmp would ls -l /tmp � wc -l suffice and will it be accurate? second one: How would i check the number of files with a certain string in the filename, in the same directory. ...

3. UNIX for Dummies Questions & Answers

count the number of files which have a search string, but counting the file only once

I need to count the number of files which have a search string, but counting the file only once if search string is found. eg: File1: Please note that there are 2 occurances of "aaa" aaa bbb ccc aaa File2: Please note that there are 3 occurances of "aaa" aaa bbb ccc...

4. UNIX for Dummies Questions & Answers

counting words then amending to a file

i want to count the number of words in a file and then redirect this to a file echo 'total number of words=' wc -users>file THis isnt working, anyone any ideas.

5. Programming

Counting the words in a file

Please find the below program. It contains the purpose of the program itself. /* Program : Write a program to count the number of words in a given text file */ /* Date : 12-June-2010 */ # include <stdio.h> # include <stdlib.h> # include <string.h> int main( int argc, char *argv ) {...

6. Shell Programming and Scripting

Help in counting the no of repeated words with count in a file

Hi Pls help in solving my doubt.Iam having file like below file1.txt priya jenny jenny priya raj radhika priya bharti bharti Output required: I need a output like count of repeated words with name for ex: priya 3 jenny 2

7. Shell Programming and Scripting

Counting occurrence of all words in a file

Hi, Given below is the input file: http://i53.tinypic.com/2vmvzb8.png Given below is what the output file should look like: http://i53.tinypic.com/1e6lfq.png I know how to count the occurrence of 1 word from a file, but not all of them. Can someone help please? An explanation on the...

8. Shell Programming and Scripting

Counting occurrences of all words in multiple files

Hey Unix gurus, I would like to count the number occurrences of all the words (regardless of case) across multiple files, preferably outputting them in descending order of occurrence. This is well beyond my paltry shell scripting ability. Researching, I can find many scripts/commands that...

9. Shell Programming and Scripting

Eliminating words from a file through ngrams stored in another file

Hello, I have a large data file which contains a huge amount of garbage i.e. words which do not exist in the language. An example will make this clear: kpaware nlupset rrrbring In other words these words are invalid in English and constitute garbage in the data. I have identified such...

10. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l

LEARN ABOUT V7

dh_compress

DH_COMPRESS(1)							     Debhelper							    DH_COMPRESS(1)

NAME

       dh_compress - compress files and fix symlinks in package build directories

SYNOPSIS

       dh_compress [debhelperoptions] [-Xitem] [-A] [file...]

DESCRIPTION

       dh_compress is a debhelper program that is responsible for compressing the files in package build directories, and makes sure that any
       symlinks that pointed to the files before they were compressed are updated to point to the new files.

       By default, dh_compress compresses files that Debian policy mandates should be compressed, namely all files in usr/share/info,
       usr/share/man, files in usr/share/doc that are larger than 4k in size, (except the copyright file, .html and other web files, image files,
       and files that appear to be already compressed based on their extensions), and all changelog files. Plus PCF fonts underneath
       usr/share/fonts/X11/

FILES

       debian/package.compress
	   These files are deprecated.

	   If this file exists, the default files are not compressed. Instead, the file is ran as a shell script, and all filenames that the shell
	   script outputs will be compressed. The shell script will be run from inside the package build directory. Note though that using -X is a
	   much better idea in general; you should only use a debian/package.compress file if you really need to.

OPTIONS

       -Xitem, --exclude=item
	   Exclude files that contain item anywhere in their filename from being compressed. For example, -X.tiff will exclude TIFF files from
	   compression.  You may use this option multiple times to build up a list of things to exclude.

       -A, --all
	   Compress all files specified by command line parameters in ALL packages acted on.

       file ...
	   Add these files to the list of files to compress.

CONFORMS TO

       Debian policy, version 3.0

SEE ALSO

       debhelper(7)

       This program is a part of debhelper.

AUTHOR

       Joey Hess <joeyh@debian.org>

11.1.6ubuntu2							    2018-05-10							    DH_COMPRESS(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Counting words in a file

Discussion started by: r0mulus

2. UNIX for Dummies Questions & Answers

Counting number of files in a directory

Discussion started by: iamalex

3. UNIX for Dummies Questions & Answers

count the number of files which have a search string, but counting the file only once

Discussion started by: sudheshnaiyer

4. UNIX for Dummies Questions & Answers

counting words then amending to a file

Discussion started by: iago

5. Programming

Counting the words in a file

Discussion started by: ramkrix

6. Shell Programming and Scripting

Help in counting the no of repeated words with count in a file

Discussion started by: bha148

7. Shell Programming and Scripting

Counting occurrence of all words in a file

Discussion started by: r4v3n

8. Shell Programming and Scripting

Counting occurrences of all words in multiple files

Discussion started by: twjolson

9. Shell Programming and Scripting

Eliminating words from a file through ngrams stored in another file

Discussion started by: gimley

10. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Discussion started by: jmarx

LEARN ABOUT V7

dh_compress