awk function to remove lines that contain contents of another file


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers awk function to remove lines that contain contents of another file
# 1  
Old 09-19-2017
awk function to remove lines that contain contents of another file

Hi,

I'd be grateful for your help with the following. I have a file (file.txt) with 10 columns and about half a million lines, which in simplified form looks like this:

Code:
ID     Col1    Col2  Col3....
a        4         2       8
b        5         6       1
c        8         4       1
d        3         5       9
e        8         5       2

I'd like to remove all the lines where, say, "b" and "d" appear in the first (ID) column. The output that I want is:

Code:
ID     Col1    Col2  Col3....
a        4         2       8
c        8         4       1
e        8         5       2

In reality, there are about 100,000 lines that I want to remove.
I therefore have a reference file (referencefile.txt) that lists all the IDs that I want removed from file.txt. In this example, the reference file would simply contain "b" and "d" on successive lines.

I am using grep at the moment, and while it works, it is proving painfully slow.

Code:
grep -v -f referencefile.txt file.txt

Is there a way of using awk (or anything else for that matter) to speed up the process?

Many thanks.

AB
# 2  
Old 09-19-2017
This requires a lot of memory depending on what you have in reference.txt
Simple awk which can be rewritten as something difficult to read for non-awkers.
We have posters who do that, which is okay as long as you can get what they show you.
Code:
# code assumes that the reference.txt file has field #1 from inputfile

awk ' FILENAME=="reference.txt" {! arr[$0]++; next}  # create an array of values 
         FILENAME=="inputfile" { if(! $1 in arr) {print $0}; next} ' reference.txt inputfile > outputfile

This User Gave Thanks to jim mcnamara For This Post:
# 3  
Old 09-19-2017
Thanks Jim - that works. Much appreciated.

A.B.
# 4  
Old 09-19-2017
It would be interesting what performance gain you see - can you time both approaches and post the results?
# 5  
Old 09-20-2017
I do not understand the ! and ++ in {! arr[$0]++; next}
Replace by {arr[$1]; next}. Not storing a value in the array saves sone memory! $1 strips spaces, can make sense if there is invisible trailing space (and embedded spaces wouldn't work anyway when later comparing with $1). The next jumps to the next cycle, no need for checking the FILENAME again. {print $0} is a default action if there is just a condition.
Code:
awk ' FILENAME=="reference.txt" {arr[$1]; next}  # create an array without values 
        !($1 in arr)' reference.txt inputfile > outputfile

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove contents from file which are under bracket?

hello Friend, In hostgroup file, i have define lots of hostgroups. I need to remove few of them without manually editing file. Need script or syntax. I want to search particular on hostgroup_members and delete hostgoup defination of it. for example. define hostgroup{ hostgroup_name... (8 Replies)
Discussion started by: ghpradeep
8 Replies

2. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

I have been searching and trying to come up with an awk that will perform the following on a converted text file (original is a pdf). 1. Since the first two lines are (begin with) text they are removed 2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Discussion started by: cmccabe
3 Replies

3. Shell Programming and Scripting

Using awk to remove lines from file that match text

I am trying to remove each line in which $2 is FP or RFP. I believe the below will remove one instance but not both. Thank you :). file 12 123 FP 11 10 RFP awk awk -F'\t' ' $2 != "FP"' file desired output 12 11 (6 Replies)
Discussion started by: cmccabe
6 Replies

4. Shell Programming and Scripting

awk to remove lines in file if specific field matches

I am trying to remove lines in the target.txt file if $5 before the - in that file matches sorted_list. I have tried grep and awk. Thank you :). grep grep -v -F -f targets.bed sort_list grep -vFf sort_list targets awk awk -F, ' > FILENAME == ARGV {to_remove=1; next} > ! ($5 in... (2 Replies)
Discussion started by: cmccabe
2 Replies

5. Shell Programming and Scripting

awk remove/grab lines from file with pattern from other file

Sorry for the weird title but i have the following problem. We have several files which have between 10000 and about 500000 lines in them. From these files we want to remove lines which contain a pattern which is located in another file (around 20000 lines, all EAN codes). We also want to get... (28 Replies)
Discussion started by: SDohmen
28 Replies

6. Shell Programming and Scripting

Perl script for Calling a function and writing all its contents to a file

I have a function which does awk proceessing sub mergeDescription { system (q@awk -F'~' ' NR == FNR { A = $1 B = $2 C = $0 next } { n = split ( C, V, "~" ) if... (3 Replies)
Discussion started by: crypto87
3 Replies

7. Shell Programming and Scripting

Remove lines based on contents of another file

So, this issue is driving me nuts! I was hoping to get a lending hand here... I have 2 files: file1.txt contains: this is example1 this is example2 this is example3 this is example4 this is example5 file2.txt contains: example3 example5 Basically, I need a script or command to... (4 Replies)
Discussion started by: bashshadow1979
4 Replies

8. Shell Programming and Scripting

Compare two files and remove all the contents of one file from another

Hi, I have two files, in which the second file has exactly the same contents of the first file with some additional records. Now, if I want to remove those matching lines from file2 and print only the extra contents which the first file does not have, I could use the below unsophisticated... (3 Replies)
Discussion started by: royalibrahim
3 Replies

9. Solaris

remove the contents of a file

Hi Let say a flat file contains 1000 lines. The cursor is at the 530 line number. Now I like to delete all the line at one ahot. how it can be done? (2 Replies)
Discussion started by: surjyap
2 Replies
Login or Register to Ask a Question