Extracting words from file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting words from file
# 1  
Old 07-09-2011
Question Extracting words from file

I am having a file from which i need to extract different length words into different file. For example 2 letter word into file2, 3 letter word into file3 and so on....

I did one using grep and shell script..
Code:
for (( i=1; i<7; i++))
do
  egrep -o  '\<\(?[a-zA-Z]{$i}\)?\>' $1 | sort -u -f|tr [A-Z] [a-z] >file$i
done

But it is too slow. any better idea? Thanks in advance

Last edited by Scott; 07-09-2011 at 07:59 PM.. Reason: Added code tags
# 2  
Old 07-09-2011
Hi,

Test next 'perl' program:
Code:
$ cat script.pl
use warnings;
use strict;

@ARGV == 1 or die "Usage: perl $0 <input-file>\n";

my %word_length;

while ( <> ) {
        chomp;
        my @words = split /[^[:alpha:]]+/;
        my %repeated_word;
        for my $word ( @words ) {
                push @{ $word_length{ length $word } }, $word unless $repeated_word{ $word }++;
        }
}

for my $length ( keys %word_length ) {
        my $outfile = "file" . $length;
        open my $fh, ">", $outfile or do {
                warn "Cannot open $outfile: $!\n";
                next;
        };
        for my $word ( @{ $word_length{ $length } } ) {
                printf $fh "%s\n", $word;
        }

        close $fh or warn "Cannot close $outfile: $!\n";
}
$ cat infile
This is an example to 
test if 
my perl program works
as expected.
$ perl script.pl
Usage: perl script.pl <input-file>
$ perl script.pl infile
$ ls -1 file*
file2
file4
file5
file7
file8

Regards,
Birei
# 3  
Old 07-09-2011
nawk.

Code:
#!/usr/bin/awk -f
BEGIN { FS="[^A-Za-z]" }
{
        for (i=1;i<=NF;i++)
                if ((len = length($i)) < 7 && len >= 1)
                        a[tolower($i)]++
}
END {
        for (e in a)
                print e >> "file" length(e) ".txt"
}

Code:
mute@goflex:~/test$ ./extract.awk infile
mute@goflex:~/test$ grep -H -E ? file?.txt
file2.txt:my
file2.txt:to
file2.txt:an
file2.txt:as
file2.txt:if
file2.txt:is
file4.txt:this
file4.txt:perl
file4.txt:test
file5.txt:works

# 4  
Old 07-10-2011
Code:
awk '{for (i=1;i<=NF;i++) {if ($i~/^[a-zA-Z]+$/) print tolower($i)> "file" length($i)}}' infile

# 5  
Old 07-11-2011
A pure AWK or Perl solution is probably the most efficient approach, but here's one that makes do without:
Code:
#!/bin/sh

tr -cs A-Za-z '[\n*]' < "$1" | sort -uf | tr A-Z a-z |
while read w; do
        [ -n "$w" ] && [ ${#w} -lt 7 ] && printf '%s\n' "$w" >> file${#w}
done

Regards,
Alister
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting words and lines based on keywords

Hello! I'm trying to process a text file and am stuck at 2 extractions. Hoping someone can help me here: 1. Given a line in a text file and given a keyword, how can I extract the word preceeding the keyword using a shell command/script? For example: Given a keyword "world" in the line: ... (2 Replies)
Discussion started by: seemad
2 Replies

2. Shell Programming and Scripting

Extracting Words from Text

Hi there, Unix Gurus Back in September last year you helped me find a way to extract the words in brackets in a textfile to a new one. In that case my textfile was made up of sentences containing an only bracketed word per sentence/line: 1. If the boss's son had been , someone would... (9 Replies)
Discussion started by: eldeingles
9 Replies

3. Shell Programming and Scripting

grep - Extracting multiple key words from stdout

Hello. From command line, the command zypper info nxclient return a bloc of data : linux local # zypper info nxclient Loading repository data... Reading installed packages... Information for package nxclient: Repository: zypper_local Name: nxclient Version: 3.5.0-7 Arch: x86_64... (7 Replies)
Discussion started by: jcdole
7 Replies

4. Shell Programming and Scripting

Splitting Concatenated Words in Input File with Words from a Master File

Hello, I have a complex problem. I have a file in which words have been joined together: Theboy ranslowly I want to be able to correctly split the words using a lookup file in which all the words occur: the boy ran slowly slow put child ly The lookup file which is meant for look up... (21 Replies)
Discussion started by: gimley
21 Replies

5. Shell Programming and Scripting

Help with extracting words from fixed length files

I am very new to scripting and need to write a script that will extract the account number from a line that begins with HDR. For example, the file is as follows HDR2010072600300405505100726 00300405505 LBJ FREEWAY DALLAS TELEGRAPH ... (9 Replies)
Discussion started by: bds052189
9 Replies

6. UNIX for Dummies Questions & Answers

Extracting only words from a log file

hello: i have a file and i am trying to extract only unique words from that file. i used the command: cat messages.1 | tr " " "\n" | sort | uniq -c but using this command outputs everything unique in the file be it words, numbers, like all the characters..i need a command which will only... (6 Replies)
Discussion started by: vikbenq
6 Replies

7. Shell Programming and Scripting

words extracting

Hi, Pls assist. dn: uid=test,ou=test,dc=com description: password sunIdentityServerDeviceStatus: Active uid: test objectClass: sunIdentityServerDevice objectClass: iplanet-am-user-service objectClass: top objectClass: iPlanetPreferences sunIdentityServerDeviceType: blabla cn: default... (3 Replies)
Discussion started by: hudson03051nh
3 Replies

8. Shell Programming and Scripting

Extracting part of line between two words

Hi, I have a file few hundred MB's with text like one below in single line. 20091117 abc xyg 20091117 def ghi 20091118 ppp ttt 20091118 zzz zzz xxx I need to extract part of line from 1st occurence of pattern 20091117 till first occurence of another pattern 20091118. I tried... (3 Replies)
Discussion started by: artistic94555
3 Replies

9. Shell Programming and Scripting

Extracting Text Between Two Words

Hi all! Im trying to extract a portion of text from a KML and put it into a new file. Im trying to get all of the points out of it, ignoring everything else so I need only the text between <Placement> and </Placement>. Is there a way to make it extract all instances of these points and not just... (2 Replies)
Discussion started by: Grizzly
2 Replies

10. Shell Programming and Scripting

extracting some words

i run a command that submits a word to WordNET which stores the search results in a document which looks like this... i searched "car" in this instance and id like to extract auto, automobile, machine, and store it in a file with the , , stripped away just the words. WordNET's results' template... (2 Replies)
Discussion started by: mark_nsx
2 Replies
Login or Register to Ask a Question