Sponsored Content
Full Discussion: search from a list of words
Top Forums Shell Programming and Scripting search from a list of words Post 302543414 by agama on Sunday 31st of July 2011 04:00:13 PM
Old 07-31-2011
Yes, making a pass across your 12,000 record data file for each entry in the list isn't very efficient. First thing I'll point out is that your for loop will not be listing the contents of the list, but the file name. You'd need something like this:

Code:
#!/bin/bash
while read i 
do
grep -wi '$i' list2.txt >> $i.txt
done <list1.txt

This reads the contents of list1.txt placing each line into the variable i. Still not efficient, but I wanted to point out the problem with your code.

Using awk, you can make one pass across each file. Way more efficient in terms of numer of i/o operations, but not as efficient as writing a programme to do the same thing in C.

Code:
#!/usr/bin/env ksh

# assume list1 list2 are placed on the command line
awk -v list=$1 '
    BEGIN {
        while( (getline<list) > 0 )   # load all target words from first list
            targets[$1] = 1;
        close( list );
    }

    {
        for( i = 1; i <= NF; i++ )  # examine each token to see if it is a target
        {
            if( targets[$(i)] )   # if this token in the input is in the target list, save the line
            {
                printf( "%s\n", $0 ) >>$(i) ".txt";
                close( $(i) ".txt" );    # prevent problems if process limit for number of open files is small
                break;      # remove if line can have multiple targets
            }
            else
              delete  targets[$(i)];    # prevent an entry for every word 
        }
    }
' $2
exit

You could make this more efficient by tracking most recently used files and allowing awk to keep some number (100) open and closing the rest. The programme would be executing far less opens/closes on the output files. You'd probably not have any issue keeping 212 of them open, but if your target list grows, or your system has smallish quotas on open files, you could have issues which is why I suggested closing the file after each write. Another, and easier, way would be to write a single output file of the form <filename> <text> as an intermediate file. Once the initial processing is finished, the intermediate file could be sorted and a single pass made to write each separate file. This has the advantage of opening/closing each output file just once and thus avoids the efficiency problems in my example above.

The need for the delete stems from some awk implementations which create an entry in the hash when the test is made (when targets[foo] does not exist). Without the delete, the hash will eventually contain an entry for every word in the list2.txt file rather than just the ones from the first list. These extra entries all have the value 0, so the programme works, but the memory usage is unnecessarily large. The delete statement prevents awk from keeping entries in the target hash that have a zero value, but it adds to the execution time.

Last edited by agama; 07-31-2011 at 05:03 PM.. Reason: additional thought about output
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

search for words in file

hi all, i would like to search in a directory. all files they were found shoul be opend and looked about a keyword. if keyword is found i want to see the name of the file. i've rtfm of find and have a command like this : find /etc -exec cat \{}\ | grep KEYWORD but don't work, and : find... (4 Replies)
Discussion started by: Agent_Orange
4 Replies

2. Shell Programming and Scripting

search two words in sed

I've following sed command working fine - sed '/search_pattern1/ !s/pattern1/pattern2/" file Now, I want to search two patterns - search_pattern1 and search_pattern2 . How can put these into above sed statement ? Thanks in advance. (12 Replies)
Discussion started by: ajitkumar2
12 Replies

3. UNIX for Dummies Questions & Answers

search words in different file

Hi, I have 1 - 100 file I want the list of such file which contains word 'internet' Please provide command to do this (3 Replies)
Discussion started by: kaushik02018
3 Replies

4. Shell Programming and Scripting

Search 3 words

Hi All, I have almost 1000+ files and I want to search specific pattern. Looking forwarded your input. Pls note that need to ignore words in between /* */ Search for: "insert into xyz" (Which procedure contain all 3). Expected output: procedure test1 procedure test2 procedure test3 File... (12 Replies)
Discussion started by: susau_79
12 Replies

5. Shell Programming and Scripting

want to search for the words in the files

Hi Friends, I have been trying to write the script since morning and reached some where now. but i think i am stuck in the final step. please help I want to search the strings below in red in the be be searched in the directories below. How can i do that in my shell script. Thanks Adi ... (8 Replies)
Discussion started by: asirohi
8 Replies

6. Shell Programming and Scripting

search several words with awk command

Hello, I want to test if i find the word CACCIA AND idlck in a file, i have to print a message Ok. For that , i need to user a awk command with a && logical. Can you help me ? :confused: ### CACCIA: DEBUT ### if $(grep -wqi "$2" /etc/passwd); then && rm /etc/security/.idlck ... (3 Replies)
Discussion started by: khalidou13
3 Replies

7. Shell Programming and Scripting

Search between two words

Hello, I try to print out with sed or awk the 21.18 between "S3 Temperature" and "GrdC" in a text file. The blanks are all real blanks no tabs. Only the two first chars from temperture are required. So the "21" i need as output. S3 Temperatur 21.18 GrdC No Alarm ... (3 Replies)
Discussion started by: felix123
3 Replies

8. Shell Programming and Scripting

Search string within a file and list common words from the line having the search string

Hi, Need your help for this scripting issue I have. I am not really good at this, so seeking your help. I have a file looking similar to this: Hello, i am human and name=ABCD. How are you? Hello, i am human and name=PQRS. I am good. Hello, i am human and name=ABCD. Good bye. Hello, i... (12 Replies)
Discussion started by: royzlife
12 Replies

9. UNIX for Beginners Questions & Answers

Non-root script used search and list specific key words

Hy there all. Im new here. Olso new to terminal & bash, but it seams that for me it's much easyer to undarsatnd scripts than an actual programming language as c or anyother languare for that matter. S-o here is one og my home works s-o to speak. Write a shell script which: -only works as a... (1 Reply)
Discussion started by: Crisso2Face
1 Replies

10. Shell Programming and Scripting

Search words in any quote position and then change the words

hi, i need to replace all words in any quote position and then need to change the words inside the file thousand of raw. textfile data : "Ninguno","Confirma","JuicioABC" "JuicioCOMP","Recurso","JuicioABC" "JuicioDELL","Nulidad","Nosino" "Solidade","JuicioEUR","Segundo" need... (1 Reply)
Discussion started by: benjietambling
1 Replies
foreach(n)						       Tcl Built-In Commands							foreach(n)

__________________________________________________________________________________________________________________________________________________

NAME
foreach - Iterate over all elements in one or more lists SYNOPSIS
foreach varname list body foreach varlist1 list1 ?varlist2 list2 ...? body _________________________________________________________________ DESCRIPTION
The foreach command implements a loop where the loop variable(s) take on values from one or more lists. In the simplest case there is one loop variable, varname, and one list, list, that is a list of values to assign to varname. The body argument is a Tcl script. For each element of list (in order from first to last), foreach assigns the contents of the element to varname as if the lindex command had been used to extract the element, then calls the Tcl interpreter to execute body. In the general case there can be more than one value list (e.g., list1 and list2), and each value list can be associated with a list of loop variables (e.g., varlist1 and varlist2). During each iteration of the loop the variables of each varlist are assigned consecutive values from the corresponding list. Values in each list are used in order from first to last, and each value is used exactly once. The total number of loop iterations is large enough to use up all the values from all the value lists. If a value list does not contain enough elements for each of its loop variables in each iteration, empty values are used for the missing elements. The break and continue statements may be invoked inside body, with the same effect as in the for command. Foreach returns an empty string. EXAMPLES
The following loop uses i and j as loop variables to iterate over pairs of elements of a single list. set x {} foreach {i j} {a b c d e f} { lappend x $j $i } # The value of x is "b a d c f e" # There are 3 iterations of the loop. The next loop uses i and j to iterate over two lists in parallel. set x {} foreach i {a b c} j {d e f g} { lappend x $i $j } # The value of x is "a d b e c f {} g" # There are 4 iterations of the loop. The two forms are combined in the following example. set x {} foreach i {a b c} {j k} {d e f g} { lappend x $i $j $k } # The value of x is "a d e b f g c {} {}" # There are 3 iterations of the loop. SEE ALSO
for(n), while(n), break(n), continue(n) KEYWORDS
foreach, iteration, list, looping Tcl foreach(n)
All times are GMT -4. The time now is 06:16 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy