Sponsored Content
Top Forums Shell Programming and Scripting Finding compound words from a set of files from another set of files Post 302550601 by shoaibjameel123 on Friday 26th of August 2011 09:22:35 AM
Old 08-26-2011
Finding compound words from a set of files from another set of files

Hi All,

I am completely stuck here.

I have a set of files (with names A.txt, B.txt until L.txt) which contain words like these:

Code:
computer
random access memory
computer networking
mouse
terminal
windows

All the files from A.txt to L.txt have the same format i.e. complete words in newlines.

I have another set of text files (files names as 1.dat, 2.dat till n.dat where n is an integer number) which are complete texts like:

Code:
use descriptive titles when posting for example do not post questions with subjects like help me urgent or doubt post subjects like execution problems with cron or help with backup shell script.

As you can see in the complete texts:
1. All are in lowercase.
2. No punctuation, no full-stops. Only alphabetical texts.


Task:

1. Open 1.dat
2. Open A.txt, B.txt, C.txt until L.txt
3. Find all the words from A.txt and the "word's position in A.txt i.e. its line number" (both single words and compound words like random access memory) which are there in 1.dat
4. Then open B.txt and find words and line numbers in 1.dat and do the same until L.txt
5. Then open 1.num and write the "average of the line numbers of the matching words from all the TXT files".
6. Then open 2.dat and again A.txt till L.txt and then make 2.num and keep doing until all DAT files have been read and all corresponding NUM files are being created. So, in the end I'll have as many NUM files as DAT files containing only one number in each of them which have the average of the line numbers in them.

I have done something and it indeed works very well in matching both single and compound words but I am not able to loop it up for files and find the average of the line number. I have used PERL but for me sed or awk will also do as I just care for the output.

Code:
#!/usr/bin/perl
print "Enter a File name :";
chomp ($file = <STDIN>);
print "\n Searching file :";
if (-e $file)
{
    print "File Found\n";

    $lines = `wc -l < $file`;
    chomp $lines;

    print "Total number of lines in the file = $lines \n";

    print "Enter the pattern to search :";
    chomp ($pattern = <STDIN>);
    print "\n";
    # to search the no of words (pattern search)
    $abc=`grep "$pattern" $file`;
    print "here are the results ...\n$abc\n";
}
else{
    print "File not Found\n";
}

I am using Linux with BASH.
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Purging a Set of Files

Hi Frineds, I want to delete a set of files which are older than 7 days from teh current date.I am totally enw to shell scripting, can anyone help me with a sample code to list out the files which are older and then remove them from the directory. Please help THanks Viswa (5 Replies)
Discussion started by: svishh123
5 Replies

2. UNIX for Dummies Questions & Answers

Create individual tgz files from a set of files

Hello I have a ton of files in a directory of the format app.log.2008-04-04 I'd like to run a command that would archive each of these files as app.log.2008-04-04.tgz I tried a few combinations of find with xargs etc but no luck. Thanks Amit (4 Replies)
Discussion started by: amitg
4 Replies

3. UNIX for Dummies Questions & Answers

move a set of files

Hi Everyone!!! Is there any command to move/copy set of files in a specific range. Eg : I have 800 text files in a directory A1 ... A800 I would like to copy only files in range A40 ... A250. I can acheive this using a "for" loop , but I guess there could be some command or... (8 Replies)
Discussion started by: joey_reddy
8 Replies

4. Shell Programming and Scripting

search of common words in set of files

Hi, I have a set of simple, one columned text files (in thousands). file1: a b c d file 2: b c d e and so on. There is a collection of words in another file: b d b c d e I have to find out the set of words (in each row) is present or absent in the given set of files. So, the... (4 Replies)
Discussion started by: mala
4 Replies

5. Shell Programming and Scripting

Finding the most frequently occurring set of words

Hi guys, I have a file with a list of phoneme for words, it looks like this: AILS EY1 L Z AIMLESSLY EY1 M L AH0 S L IY0 AIMONE EY1 M OW2 N AIMS EY1 M Z AINGE EY1 NG AINGE(2) EY1 N JH AINLEY EY1 N L IY0 AINSLIE EY1 N Z L IY0 AIR EH1 R AIRBAGS EH1 R B AE2 G Z and I need to... (5 Replies)
Discussion started by: Andrew9191
5 Replies

6. UNIX for Dummies Questions & Answers

Adding words after a set of words

Greetings. I am a UNIX newbies. I am currently facing difficulties dealing with a large data set and I would like to ask for helps. I have a input file like this: ak 1 AAM1 ak 2 AAM1 ak 3 AAM1 ak 11 AMM2 ak 12 AMM2 ak 13 AMM2 ak 14 AMM2 Is there any possibility for me to... (7 Replies)
Discussion started by: Amanda Low
7 Replies

7. Shell Programming and Scripting

Find Set of files

All, I am trying to find a set of files, it could be one file OR set of file , all with extension .DAT I need to do some acticity, only if the files exist in a partificular folder like if ; then CntV=`ls $Landing/*.DAT |wc -l` echo "Lst Value " $Cnt... (3 Replies)
Discussion started by: Shanks
3 Replies

8. Shell Programming and Scripting

Help needed with shell script to search and replace a set of strings among the set of files

Hi, I am looking for a shell script which serves the below purpose. Please find below the algorithm for the same and any help on this would be highly appreciated. 1)set of strings need to be replaced among set of files(directory may contain different types of files) 2)It should search for... (10 Replies)
Discussion started by: Amulya
10 Replies

9. Shell Programming and Scripting

Finding non-existing words in a list of files in a directory and its sub-directories

Hi All, I have a list of words (these are actually a list of database table names separated by comma). Now, I want to find only the non-existing list of words in the *.java files of current directory and/or its sub-directories. Sample list of words:... (8 Replies)
Discussion started by: Bhanu Dhulipudi
8 Replies
PREZIP-BIN(1)						 Aspell Abbreviated User's Manual					     PREZIP-BIN(1)

NAME
prezip-bin - prefix zip delta word list compressor/decompressor SYNOPSIS
prezip-bin [ -V | -d | -z ] DESCRIPTION
prezip-bin compresses/decompresses sorted word lists from standard input to standard output. Prezip-bin is similar to word-list-compress(1) but it allows a larger character set of {0x00...0x09, 0x0B, 0x0C, 0x0E...0xFF} and multi-words larger than 255 characters in length. It can also decompress word-list-compress(1) compatible files. COMMANDS
Prezip-bin accepts only one of these commands. -V Display prezip-bin version number to standard output. -d Read a compressed word list from standard input and decompress it to standard output. This can be a word-list-compress(1) or a prezip-bin compressed file. -z Read a binary word list from standard input and compress it to standard output. EXAMPLES
prezip-bin -d <wordlist.cwl >wordlist.txt Decompress file wordlist.cwl to text file wordlist.txt prezip-bin -z <wordlist.txt >wordlist.pz 2>errors.txt Compress wordlist.txt to binary file wordlist.pz and send any error messages to a text file named errors.txt LC_COLLATE=C sort -u <wordlist.txt | prezip-bin -z >wordlist.pz Sort a word list, then pipe it to prezip-bin to create a compressed binary wordlist.pz file. prezip-bin -d <words.pz | aspell create master ./words.rws Decompress a wordlist, then pipe it to aspell(1) to create a spelling list. Please check the aspell(1) info manual for proper usage and options. TIPS
Prezip-bin is best used with sorted word list type files. It is not a general purpose compression program since resulting files may actu- ally increase in size. Unlike word-list-compress(1) if your word list has leading or trailing blank spaces for formatting purposes, you should remove them first before you compress your list using prezip-bin -z , otherwise those spaces will be included in the compressed binary output. DIAGNOSTICS
Prezip-bin normally exits with a return code of 0. If it encounters an error, a message is sent to standard error output (stderr), and prezip-bin exits with a non-zero return value. Error messages are listed below: (display help/usage message) Unknown command given on the command line so prezip-bin displays a usage message to standard error output. unknown format The input file appears not to be an expected format, or may possibly be a more advanced format. The output file will be empty. corrupt input This is only for the decompression command -d. The input file appeared to be of a correct format, but something appears wrong now. There may be some valid data in output, but due to input corruption, the rest of the file can not be completed. unexpected EOF The input file appeared okay but ended sooner than expected, therefore the output file is not complete. SEE ALSO
aspell(1), aspell-import(1), run-with-aspell(1), word-list-compress(1) Aspell is fully documented in its Texinfo manual. See the `aspell' entry in info for more complete documentation. REPORTING BUGS
For help, see the Aspell homepage at <http://aspell.net>. Send bug reports/comments to the Aspell user list at the above address. AUTHOR
This info page was written by Jose Da Silva <digital@joescat.com>. prezip-bin-0.1.2 2005-09-30 PREZIP-BIN(1)
All times are GMT -4. The time now is 01:06 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy