Remove words from file2 that don't exist in file1


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove words from file2 that don't exist in file1
# 1  
Old 07-12-2012
Remove words from file2 that don't exist in file1

Hi

I have to list of words file1 and file2, I want to compare both lists and remove from file2 all the words that don't exist in file1.

How can I do this?

Many thanks
# 2  
Old 07-12-2012
Code:
awk 'FILENAME=="file1" { arr[$0]++ }
       FILENAME=="file2" { if( $0 in arr ) {print $0}; next } ' file1 file2 > tmp.tmp
# be SURE you got what you wanted before doing the mv command
mv tmp.tmp file2

# 3  
Old 07-12-2012
Code:
# cat f1
a f g h i
j k l
# cat f2
o p q r
g z x
n b i
# comm -12 <(xargs -n1 <f1 | sort) <(xargs -n1 <f2 | sort)
g
i
#

... but ok this solution may not be the most optimized one ...
# 4  
Old 07-12-2012
Quote:
Originally Posted by ctsgnb
Code:
# comm -12 <(xargs -n1 <f1 | sort) <(xargs -n1 <f2 | sort)

... but ok this solution may not be the most optimized one ...
You're correct about it not being optimal Smilie. xargs will fork/exec echo once per word in each file. Not a big deal for smaller files, but it would be an expensive solution if the dataset were large.

Regards,
Alister
# 5  
Old 07-13-2012
Ok ok Smilie

... a little better with tr :

Code:
# time comm -12 <(xargs -n1 <f1 | sort) <(xargs -n1 <f2 | sort)
g
i

real    0m0.022s
user    0m0.000s
sys     0m0.050s
# time comm -12 <(tr ' ' '\n' <f1 | sort) <(tr ' ' '\n' <f2 | sort)
g
i

real    0m0.009s
user    0m0.000s
sys     0m0.010s

If we can assume the lists already consist of a single column (just as Jim's code does) the tr step can then be removed.

And if the lists are already sorted, we can then also remove the sorting step ...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

2. UNIX for Dummies Questions & Answers

Compare file1 and file2, print matching lines in same order as file1

I want to print only the lines in file2 that match file1, in the same order as they appear in file 1 file1 file2 desired output: I'm getting the lines to match awk 'FNR==NR {a++}; FNR!=NR && a' file1 file2 but they are in sorted order, which is not what I want: Can anyone... (4 Replies)
Discussion started by: pathunkathunk
4 Replies

3. Shell Programming and Scripting

Remove rows from file2 if it exists in file1

I have 2 file, file1 and file2. file1 has some keys and file2 has keys+some other data. I want to remove the lines from file2,if the key for that line exists in file1. file1: key1 key2 flie2: key1,moredata key2,moredata key3,moredata Required output: key3,moredata Thanks EDIT:... (6 Replies)
Discussion started by: chacko193
6 Replies

4. Shell Programming and Scripting

If file1 and file2 exist then

HI, I would like a little help on writing a if statement. What i have so far is: #!/bin/bash FILE1=path/to/file1 FILE2=path/to/file2 echo ${FILE1} ${FILE2} if ] then echo file1 and file2 not found else echo FILE ok fi (6 Replies)
Discussion started by: techy1
6 Replies

5. Shell Programming and Scripting

look for line from FILE1 at FILE2

Hi guys! I'm trying to write something to find each line of file1 into file2, if line is found return YES, if not found return NO. The result can be written to a new file. Can you please help me out? FILE1 INPUT: WATER CAR SNAKE (in reality this file has about 600 lines each with a... (2 Replies)
Discussion started by: demmel
2 Replies

6. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

7. Shell Programming and Scripting

Remove lines in file1 with values from file2

Hello, I have two data files: file1 12345 aa bbb cccc 98765 qq www uuuu 76543 pp rrr bbbbb 34567 nn ccc sssss 87654 qq ppp rrrrr file2 98765 34567 I need to remove the lines from file1 if the first field contains a value that appears in file2: output 12345 aa bbb cccc 76543 pp... (2 Replies)
Discussion started by: palex
2 Replies

8. Shell Programming and Scripting

grep -f file1 file2

Wat does this command do? fileA is a subset of fileB..now, i need to find the lines in fileB that are not in fileA...i.e fileA - fileB. diff fileA fileB gives the ouput but the format looks no good.... I just need the contents alone not the line num etc. (7 Replies)
Discussion started by: vijay_0209
7 Replies

9. Shell Programming and Scripting

I want records in file2 those are not exist in file1

I have two files - file1 and file2. Now I want records in file2 those are not exist in file1. How to grep this ? eg: file1 08941 08944 08945 08946 08947 file2 08942 08944 5 08942 08945 5 08942 08946 4 08942 08947 6 08942 08952 4 08942 08963 5 08942 ... (3 Replies)
Discussion started by: suresh3566
3 Replies

10. Shell Programming and Scripting

match value from file1 in file2

Hi, i've two files (file1, file2) i want to take value (in column1) and search in file2 if the they match print the value from file2. this is what i have so far. awk 'FILENAME=="file1"{ arr=$1 } FILENAME=="file2" {print $0} ' file1 file2 (2 Replies)
Discussion started by: myguess21
2 Replies
Login or Register to Ask a Question