many strings against 5million lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting many strings against 5million lines
# 1  
Old 01-19-2006
many strings against 5million lines

I have 1200 or so strings (some strings have spaces, some dont) that I need to match against a 5million line text file. I wanted to do a:

for line in `cat strings.txt`; do grep $line 5mill.txt;done

But obviously that is reading each space as a different $line. So I have been trying to use read, but Im having a tough time getting it to do what I want. I have been tempted to go to perl or php for this but my limited skills are what they are.

Another thing I might point out is that I only need one match per string, there will probably be many for each string.

Im pretty sure that a shell script might not be what I want here, but any advice would be much appreciated.
# 2  
Old 01-19-2006
If the 5 million lines can be read into memory, use this Ruby program.

Code:
strings = IO.readlines( ARGV[0] ).map{|x| x.chomp}
lines = IO.readlines( ARGV[1] )

strings.each{|str|
  lines.each{|line|
    if line.index( str )
      puts line
      break
    end
  }
}

Run it with
Code:
ruby match.rb strings.txt 5mill.txt


Last edited by futurelet; 01-20-2006 at 02:50 AM..
# 3  
Old 01-19-2006
Quote:
Originally Posted by r0sc0
for line in `cat strings.txt`; do grep $line 5mill.txt;done
Code:
while read line; do
    grep "$line" 5mill.txt
done < strings.txt

Code:
grep -f strings.txt 5mill.txt

or (depending on OS)

fgrep strings.txt 5mill.txt

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Display all lines after and before two different strings

Hi, What is the easiest way to display set of lines after a search string (x) and before search (y) . The grep -A -B doesn't seem to be helpful in this case. Any ideas.. -Kevin (2 Replies)
Discussion started by: Kevin Tivoli
2 Replies

2. Shell Programming and Scripting

Copying lines between two strings

Hello All, I want to copy some lines from one file to other with following condition. Only lines between two specified strings should copy. Example :- "My First String " Some_Other_String .... Some_Other_String .... Some_Other_String .... "My Second String" So only... (5 Replies)
Discussion started by: anand.shah
5 Replies

3. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

4. UNIX for Dummies Questions & Answers

How to sort lines by strings between ()?

Hello,everyone. I am learning some Info commands.I put all commands and their explanations in a file. This is a part of it: ESC PgUp (scroll-other-window-backward)Scroll the other window backward ESC Right (forward-word) Move forward a word ESC r (move-to-window-line) ESC TAB... (3 Replies)
Discussion started by: vic005
3 Replies

5. Shell Programming and Scripting

Find lines containing two strings

How can i find lines which contains two strings (or two charcters) let say "ABC" and "DEF". Line: SEFGWN;BVABCFSDFBDEF (3 Replies)
Discussion started by: ksailesh
3 Replies

6. Shell Programming and Scripting

Removing empty lines(space) between two lines containing strings

Hi, Please provide shell script to Remove empty lines(space) between two lines containing strings in a file. Input File : A1/EXT "BAP_BSC6/07B/00" 844 090602 1605 RXOCF-465 PDTR11 1 SITE ON BATTERY A2/EXT... (3 Replies)
Discussion started by: sudhakaryadav
3 Replies

7. Shell Programming and Scripting

Grep and delete lines except the lines with strings

Hi I am writing a script which should read a file and search for certain strings 'approved' or 'removed' and retain only those lines that contain the above strings. Ex: file name 'test' test: approved package waiting for approval package disapproved package removed package approved... (14 Replies)
Discussion started by: vj8436
14 Replies

8. Shell Programming and Scripting

Print all the lines between 2 specified strings

Hi All, I have a file in which i want to print all the lines between 2 defined strings. Ex- I have file with data as follows STEP1:- ----- has some 20 -30 lines of data STEP2:- ----- has some 20 -30 lines of data So i want to print those lines between STEP1 & STEP2. (line including STEP1)... (7 Replies)
Discussion started by: digitalrg
7 Replies

9. Shell Programming and Scripting

How to find the lines which do not have certain strings

Hi, guys. I have one question: How can I search the lines in a file which do not have certain string in it. For example, the file is called shadow, the contents of it is below: **************************** ... brownj:SFSM$DFAAA2313:0:0:50:7 hynesp:MNBADF$23$adfd:0:0:50:7... (2 Replies)
Discussion started by: daikeyang
2 Replies

10. Shell Programming and Scripting

using AWK see the upper lines and lower lines of the strings??

Hi experts, You cool guys already given me the awk script below- awk '/9366109380/,printed==5 { ++printed; print; }' 2008-09-14.0.log Morever, i have one more things- when i awk 9366109380, i can also see the Upper 3 lines as well as below 5 lines of that string. Line 1.... (3 Replies)
Discussion started by: thepurple
3 Replies
Login or Register to Ask a Question