Removing Lines based on matching first column


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Removing Lines based on matching first column
# 1  
Old 07-18-2011
Removing Lines based on matching first column

I have a file1 that looks like this:

File 1
a b
b c
c e
d e

and a file 2 that looks like this:

File 2
b
c
e
e

Note that file 2 is the right hand column from file1. I want to remove any lines from file1 that begin with the column in file2. In this case the desired output would be

File 3
a b
d e

I think awk would be best, unless someone knows how to grep only looking a specific columns.

thanks in advance!
# 2  
Old 07-18-2011
Try:
Code:
awk 'NR==FNR{a[$1]=1;next}!($1 in a)' file2 file1

# 3  
Old 07-18-2011
Yes! That works! Thank you so much. Would you mind explaining the syntax? I'm not very good at awk and would like to understand it for future reference.

---------- Post updated at 11:46 AM ---------- Previous update was at 11:02 AM ----------

For some reason it's not working on my actual files with look like this

File 1
GEMS_CAM_101_1_a GEMS_CAM_102_1_a
GEMS_CAM_102_1_a GEMS_CAM_103_1_a
GEMS_CAM_103_1_a GEMS_CAM_104_1_a

File 2
GEMS_CAM_102_1_a
GEMS_CAM_103_1_a
GEMS_CAM_104_1_a

output should be:

File 3
GEMS_CAM_101_1_a GEMS_CAM_102_1_a

Any ideas what's going wrong?
# 4  
Old 07-18-2011
It is working for me...
Code:
[root@rhel ~]# cat f1
GEMS_CAM_101_1_a GEMS_CAM_102_1_a
GEMS_CAM_102_1_a GEMS_CAM_103_1_a
GEMS_CAM_103_1_a GEMS_CAM_104_1_a
[root@rhel ~]# cat f2
GEMS_CAM_102_1_a
GEMS_CAM_103_1_a
GEMS_CAM_104_1_a
[root@rhel ~]# awk 'NR==FNR{a[$1]=1;next}!($1 in a)' f2 f1
GEMS_CAM_101_1_a GEMS_CAM_102_1_a

# 5  
Old 07-18-2011
Ahh! I'm so frustrated. I literally do the exact same thing as you and my output file looks exactly like file1. Is there any other possible explanation for this? or am I missing something..I've checked this for like 15 min haha
# 6  
Old 07-18-2011
Show output of
Code:
cat -Te file1

and
Code:
cat -Te file2

with the use of code tags please.
# 7  
Old 07-18-2011
Ya it's weird, it works fine on my computer at home and it works on the fake examples I make up at work but it won't work on the real thing at work. My linux doesn't have internet so I can't really show you but it's the strangest thing. Anyway, your code is definitely good, it's something wrong with my system I guess. Either way thank you for all your help!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicate lines on first column based with pipe delimiter

Hi, I have tried to remove dublicate lines based on first column with pipe delimiter . but i ma not able to get some uniqu lines Command : sort -t'|' -nuk1 file.txt Input : 38376KZ|09/25/15|1.057 38376KZ|09/25/15|1.057 02006YB|09/25/15|0.859 12593PS|09/25/15|2.803... (2 Replies)
Discussion started by: parithi06
2 Replies

2. Shell Programming and Scripting

Insert value of column based on file name matching

At the top of the XYZ file, I need to insert the ABC data value of column 2 only when ABC column 1 matches the prefix XYZ file name (not the ".txt"). Is there an awk solution for this? ABC Data 0101 0.54 0102 0.48 0103 1.63 XYZ File Name 0101.txt 0102.txt 0103.txt ... (7 Replies)
Discussion started by: ncwxpanther
7 Replies

3. Shell Programming and Scripting

Based on column in file1, find match in file2 and print matching lines

file1: file2: I need to find matches for any lines in file1 that appear in file2. Desired output is '>' plus the file1 term, followed by the line after the match in file2 (so the title is a little misleading): This is honestly beyond what I can do without spending the whole night on it, so I'm... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

4. Shell Programming and Scripting

Find lines with matching column 1 value, retain only the one with highest value in column 2

I have a file like: I would like to find lines lines with duplicate values in column 1, and retain only one based on two conditions: 1) keep line with highest value in column 3, 2) if column 3 values are equal, retain the line with the highest value in column 4. Desired output: I was able to... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

5. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Hi, I have a file like this ACC 2 2 21 aaa AC 443 3 22 aaa GCT 76 1 33 xxx TCG 34 2 33 aaa ACGT 33 1 22 ggg TTC 99 3 44 wee CCA 33 2 33 ggg AAC 1 3 55 ddd TTG 10 1 22 ddd TTGC 98 3 22 ddd GCT 23 1 21 sds GTC 23 4 32 sds ACGT 32 2 33 vvv CGT 11 2 33 eee CCC 87 2 44... (1 Reply)
Discussion started by: polsum
1 Replies

6. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Hi, I want to remove duplicate records including the first line based on column1. For example inputfile(filer.txt): ------------- 1,3000,5000 1,4000,6000 2,4000,600 2,5000,700 3,60000,4000 4,7000,7777 5,999,8888 expected output: ---------------- 3,60000,4000 4,7000,7777... (5 Replies)
Discussion started by: G.K.K
5 Replies

7. Shell Programming and Scripting

awk print non matching lines based on column

My item was not answered on previous thread as code given did not work I wanted to print records from file2 where comparing column 1 and 16 for both files find rows where column 16 in file 1 does not match column 16 in file 2 Here was CODE give to issue ~/unix.com$ cat f1... (0 Replies)
Discussion started by: sigh2010
0 Replies

8. Shell Programming and Scripting

Matching 2 files based on one column

Hi, On a similar subject, the following. I have two files: file1.txt dbSNP_rsID,Chromosome,Position,Gene rs10399749,chr. 01,45162,? rs4030303,chr. 01,72434,? rs4030300,chr. 01,72515,? rs940550,chr. 01,78032,? rs13328714,chr. 01,81468,? rs11490937,chr. 01,222077,? rs6683466,chr.... (5 Replies)
Discussion started by: swvanderlaan
5 Replies

9. Shell Programming and Scripting

Matching words based on column headers

Hi , Pls help on this. Input file: NAME1 BSC1 TEXT ID 1 MAINSFAIL TEXT ID 2 DGON TEXT ID 3 lOADONDG NAME2 BSC2 TEXT ID 1 DGON TEXT ID 3 lOADONG (1 Reply)
Discussion started by: bha148
1 Replies

10. UNIX for Dummies Questions & Answers

Removing lines that are (same in content) based on columns

I have a file which looks like AA BB CC DD EE FF GG HH KK AA BB GG HH KK FF CC DD EE AA BB CC DD EE UU VV XX ZZ AA BB VV XX ZZ UU CC DD EE .... I want the script to give me only one line based on duplicate contents: AA BB CC DD EE FF GG HH KK AA BB CC DD EE UU VV XX ZZ (7 Replies)
Discussion started by: adsforall
7 Replies
Login or Register to Ask a Question