grep -f file1 file2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting grep -f file1 file2
# 1  
Old 01-26-2011
grep -f file1 file2

Hi
I started to learn bash a week ago. I need filter the strings from the last column of a "file2" that match with a column from an other "file1"

file1:

chr10100036394-100038350AK077761
chr10100041065-100046547AK032226
chr10100041065-100046547AK016270
chr10100041065-100046547AK078231
...


file2:

chr10100036394-100038350 BC082587 100.00 30 0 0 1 30 868 897 5e-09 60.0 chr10100036394-100038350BC082587
chr10100036394-100038350 AK160693 100.00 30 0 0 1 30 901 930 5e-09 60.0 chr10100036394-100038350AK160693
chr10100036394-100038350 AK082894 100.00 30 0 0 1 30 871 900 5e-09 60.0 chr10100036394-100038350AK082894
chr10100036394-100038350 AK077761 100.00 30 0 0 1 30 913 942 5e-09 60.0 chr10100036394-100038350AK077761
chr10100039992-100040948 AK078231 100.00 30 0 0 1 30 551 580 5e-09 60.0 chr10100039992-100040948AK078231
chr10100039992-100040948 AK077761 100.00 30 0 0 1 30 545 574 5e-09 60.0 chr10100039992-100040948AK077761
chr10100039992-100040948 AK036647 100.00 30 0 0 1 30 533 562 5e-09 60.0 chr10100039992-100040948AK036647
chr10100039992-100040948 AK032226 100.00 30 0 0 1 30 506 535 5e-09 60.0 chr10100039992-100040948AK032226
chr10100039992-100040948 AK016270 100.00 30 0 0 1 30 382 411 5e-09 60.0 chr10100039992-100040948AK016270
chr10100039992-100040948 AK015251 100.00 30 0 0 1 30 499 528 5e-09 60.0 chr10100039992-100040948AK015251
chr10100041065-100044896 AK043358 100.00 30 0 0 1 30 3118 3147 5e-09 60.0 chr10100041065-100044896AK043358
chr10100041065-100046547 BC082587 100.00 30 0 0 1 30 383 412 5e-09 60.0 chr10100041065-100046547BC082587
chr10100041065-100046547 AK160693 100.00 30 0 0 1 30 416 445 5e-09 60.0 chr10100041065-100046547AK160693
chr10100041065-100046547 AK082894 100.00 30 0 0 1 30 386 415 5e-09 60.0 chr10100041065-100046547AK082894
chr10100041065-100046547 AK078231 100.00 30 0 0 1 30 434 463 5e-09 60.0 chr10100041065-100046547AK078231
chr10100041065-100046547 AK077761 100.00 30 0 0 1 30 428 457 5e-09 60.0 chr10100041065-100046547AK077761
chr10100041065-100046547 AK036647 100.00 30 0 0 1 30 416 445 5e-09 60.0 chr10100041065-100046547AK036647
chr10100041065-100046547 AK032226 100.00 30 0 0 1 30 389 418 5e-09 60.0 chr10100041065-100046547AK032226
chr10100041065-100046547 AK016270 100.00 30 0 0 1 30 265 294 5e-09 60.0 chr10100041065-100046547AK016270
chr10100041065-100046547 AK015251 100.00 30 0 0 1 30 382 411 5e-09 60.0 chr10100041065-100046547AK015251
...

I tried to use
Code:
grep -f file1 file2

but the process is killed Smilie
I think it's because both files are too large (file 1 has more 10e6 lines).

I tried the same code but a whit a smaller file1 (10 lines) and It work fine.

Can anybody tell me an other way to do this or tell me what I'm doing wrong??

Thanks!
# 2  
Old 01-26-2011
'file2' can be any length, but it actually has to load all of file1 into memory at once! On a 32-bit computer, this may not even be possible even if you have gigs of memory due to 32-bit addresss limitations.

If the file's too large to fit in memory, you may have to process it in batches.

Code:
split -l 50000 file1 split

...will produce splitaa, splitbb, ...splitzz files of a few megs each which you can process like

Code:
for FILE in split*
do
         grep -f "$FILE" file2
done > allmatches

These 3 Users Gave Thanks to Corona688 For This Post:
# 3  
Old 01-26-2011
Code:
awk 'NR==FNR{a[$1];next} $NF in a ' file1 file2

This User Gave Thanks to rdcwayx For This Post:
# 4  
Old 01-26-2011
If there's not enough memory or address space for grep to store the whole file at once, how will awk? It might try for a while but I think it'll break down halfway through...
# 5  
Old 01-27-2011
How many lines in file1 ?
How many lines in file2 ?
What does "filter the strings" mean?

Do you have a Database Engine and disc space etc. to load the data into a database?
# 6  
Old 01-28-2011
Corona: I have a Intel Core I 5, which is 64-bits

I knew there are a way thought awk

awk 'NR==FNR{a[$1];next} $NF in a ' file1 file2
worked nice!

Thanks !!!!
# 7  
Old 01-28-2011
Quote:
Originally Posted by geparada88
Corona: I have a Intel Core I 5, which is 64-bits
More importantly though: Is your OS 64-bit?
Quote:
I knew there are a way thought awk

awk 'NR==FNR{a[$1];next} $NF in a ' file1 file2
worked nice!

Thanks !!!!
...apparently, it is.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

2. Shell Programming and Scripting

How-to check if file1 a subset of file2 ?

I need to know if file1 is a subset of file2 i.e all the contents of file1 are present in file2 or not. Here is how i would do it. Read line by line file1 and grep every line in file2 in a for loop. any failing grep would means that it is not a subset. Is there a quicker or easier way... (3 Replies)
Discussion started by: mohtashims
3 Replies

3. UNIX for Dummies Questions & Answers

Compare file1 and file2, print matching lines in same order as file1

I want to print only the lines in file2 that match file1, in the same order as they appear in file 1 file1 file2 desired output: I'm getting the lines to match awk 'FNR==NR {a++}; FNR!=NR && a' file1 file2 but they are in sorted order, which is not what I want: Can anyone... (4 Replies)
Discussion started by: pathunkathunk
4 Replies

4. Shell Programming and Scripting

If file1 and file2 exist then

HI, I would like a little help on writing a if statement. What i have so far is: #!/bin/bash FILE1=path/to/file1 FILE2=path/to/file2 echo ${FILE1} ${FILE2} if ] then echo file1 and file2 not found else echo FILE ok fi (6 Replies)
Discussion started by: techy1
6 Replies

5. Shell Programming and Scripting

look for line from FILE1 at FILE2

Hi guys! I'm trying to write something to find each line of file1 into file2, if line is found return YES, if not found return NO. The result can be written to a new file. Can you please help me out? FILE1 INPUT: WATER CAR SNAKE (in reality this file has about 600 lines each with a... (2 Replies)
Discussion started by: demmel
2 Replies

6. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

7. Shell Programming and Scripting

file1 newer then file2

Hello, I am new to shell scripting and i need to create a script with the following directions and I can not figure it out. Create a shell script called newest.bash that takes two filenames as input arguments ($1 and $2) and prints out the name of the newest file (i.e. the file with the... (1 Reply)
Discussion started by: mandylynn78
1 Replies

8. UNIX for Dummies Questions & Answers

cat file1 file2 > file3

file1 has pgap500 500 file2 has bunch of data cat file1 file2 > file3 cp file2 file3.dat then vi pgap500 500 onto 1st line compare file3 and fil3.dat, they are not the same. any idea ? the 1st line, i want to put pg500 xxx ---------- Post updated at 07:35 AM ---------- Previous... (2 Replies)
Discussion started by: tjmannonline
2 Replies

9. Shell Programming and Scripting

grep -f file1 file2

Wat does this command do? fileA is a subset of fileB..now, i need to find the lines in fileB that are not in fileA...i.e fileA - fileB. diff fileA fileB gives the ouput but the format looks no good.... I just need the contents alone not the line num etc. (7 Replies)
Discussion started by: vijay_0209
7 Replies

10. Shell Programming and Scripting

match value from file1 in file2

Hi, i've two files (file1, file2) i want to take value (in column1) and search in file2 if the they match print the value from file2. this is what i have so far. awk 'FILENAME=="file1"{ arr=$1 } FILENAME=="file2" {print $0} ' file1 file2 (2 Replies)
Discussion started by: myguess21
2 Replies
Login or Register to Ask a Question