Hello,
I'm trying to write a bash script that will search for words from one list that may be found in another list. Once the record is found, it will create a new text file for each word.
For example, list1.txt contains the following:
List2.txt contains
Since Dog and Cat are found in both files, two files will be created. The first file (Dog) will be a .txt file containing
The second file will be called Cat.txt and will have
Here's what I have so far. I'm stuck and I'm not quite sure how to proceed
I'm dealing with VERY large files where list1.txt contains 213 entries while list2.txt contaings 12,000 entries. I think I'm on the right track, but my method seems like it would also take a VERY long time since it's a FOR LOOP for each iteration (yikes!)
Any help would be greatly appreciated.
Last edited by radoulov; 07-31-2011 at 06:50 PM..
Reason: Code tags!
Yes, making a pass across your 12,000 record data file for each entry in the list isn't very efficient. First thing I'll point out is that your for loop will not be listing the contents of the list, but the file name. You'd need something like this:
This reads the contents of list1.txt placing each line into the variable i. Still not efficient, but I wanted to point out the problem with your code.
Using awk, you can make one pass across each file. Way more efficient in terms of numer of i/o operations, but not as efficient as writing a programme to do the same thing in C.
You could make this more efficient by tracking most recently used files and allowing awk to keep some number (100) open and closing the rest. The programme would be executing far less opens/closes on the output files. You'd probably not have any issue keeping 212 of them open, but if your target list grows, or your system has smallish quotas on open files, you could have issues which is why I suggested closing the file after each write. Another, and easier, way would be to write a single output file of the form <filename> <text> as an intermediate file. Once the initial processing is finished, the intermediate file could be sorted and a single pass made to write each separate file. This has the advantage of opening/closing each output file just once and thus avoids the efficiency problems in my example above.
The need for the delete stems from some awk implementations which create an entry in the hash when the test is made (when targets[foo] does not exist). Without the delete, the hash will eventually contain an entry for every word in the list2.txt file rather than just the ones from the first list. These extra entries all have the value 0, so the programme works, but the memory usage is unnecessarily large. The delete statement prevents awk from keeping entries in the target hash that have a zero value, but it adds to the execution time.
Last edited by agama; 07-31-2011 at 06:03 PM..
Reason: additional thought about output
thanks for the reply. I understand that my method is inefficient, but I was wondering why the following wont work. Do I have a syntax error somewhere? When I run the following code, I get the error "syntax error near unexpected token 'done'"
PHP Code:
#!/bin/bash while read word; do grep -w "$word" list2.txt done < list1.txt >> "$word".txt cat "$word".txt
When I run the command
PHP Code:
grep -w SAMPLE_TEXT list2.txt
it gives me the desired output.
Last edited by jl487; 07-31-2011 at 08:11 PM..
Reason: additional info
You're on the right track. The redirection to $word.txt needs to happen inside of the loop. Yes, you can redirect the output of the loop to a file, but that output file is opened once by the shell at the start of the loop. When the loop starts $word is empty and thus you're getting a syntax error (nothing after >>). This is the small change that will get you going:
Further, your cat command will only have the last word from list1 to work on unless you put it into a loop too:
hi,
i need to replace all words in any quote position and then need to change the words inside the file thousand of raw.
textfile data :
"Ninguno","Confirma","JuicioABC"
"JuicioCOMP","Recurso","JuicioABC"
"JuicioDELL","Nulidad","Nosino"
"Solidade","JuicioEUR","Segundo"
need... (1 Reply)
Hy there all. Im new here. Olso new to terminal & bash, but it seams that for me it's much easyer to undarsatnd scripts than an actual programming language as c or anyother languare for that matter.
S-o here is one og my home works s-o to speak.
Write a shell script which:
-only works as a... (1 Reply)
Hi,
Need your help for this scripting issue I have. I am not really good at this, so seeking your help.
I have a file looking similar to this:
Hello, i am human and name=ABCD.
How are you?
Hello, i am human and name=PQRS.
I am good.
Hello, i am human and name=ABCD.
Good bye.
Hello, i... (12 Replies)
Hello,
I try to print out with sed or awk the 21.18 between "S3 Temperature" and "GrdC" in a text file.
The blanks are all real blanks no tabs.
Only the two first chars from temperture are required. So the "21" i need as output.
S3 Temperatur 21.18 GrdC No Alarm
... (3 Replies)
Hello,
I want to test if i find the word CACCIA AND idlck in a file, i have to print a message Ok.
For that , i need to user a awk command with a && logical.
Can you help me ?
:confused:
### CACCIA: DEBUT ###
if $(grep -wqi "$2" /etc/passwd); then
&& rm /etc/security/.idlck
... (3 Replies)
Hi Friends,
I have been trying to write the script since morning and reached some where now. but i think i am stuck in the final step. please help
I want to search the strings below in red in the be be searched in the directories below. How can i do that in my shell script.
Thanks
Adi
... (8 Replies)
Hi All,
I have almost 1000+ files and I want to search specific pattern. Looking forwarded your input. Pls note that need to ignore words in between /* */
Search for: "insert into xyz" (Which procedure contain all 3).
Expected output:
procedure test1
procedure test2
procedure test3
File... (12 Replies)
I've following sed command working fine -
sed '/search_pattern1/ !s/pattern1/pattern2/" file
Now, I want to search two patterns - search_pattern1 and search_pattern2 .
How can put these into above sed statement ?
Thanks in advance. (12 Replies)
hi all,
i would like to search in a directory. all files they were found shoul be opend and looked about a keyword. if keyword is found i want to see the name of the file. i've rtfm of find and have a command like this :
find /etc -exec cat \{}\ | grep KEYWORD
but don't work, and :
find... (4 Replies)