awk remove/grab lines from file with pattern from other file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk remove/grab lines from file with pattern from other file
# 1  
Old 02-10-2016
awk remove/grab lines from file with pattern from other file

Sorry for the weird title but i have the following problem.

We have several files which have between 10000 and about 500000 lines in them. From these files we want to remove lines which contain a pattern which is located in another file (around 20000 lines, all EAN codes). We also want to get the removed lines in a seperate file so we can check if lines get removed which shouldn't (this has nothing todo with the matching)

pattern file:
Code:
0018208944262
4016432428011
7290006780829
4021121468858
5025232434084
4021121338540
4021121435638

main file
Code:
0018208944262;A 562381;VNA750E1;50;4999.14;Nikon
4242004181811;A 582194;B55CR22N0;2;939.46;Neff
4242004181439;A 582193;B45CS24N0;1;895.04;Neff
4716123314882;A 552806;NH-L9A;0;39.90;Noctua
4716123314875;A 548120;NH-L9I;1;39.01;Noctua

With both the above i should get 1 file that only has "0018208944262;A 562381;VNA750E1;50;4999.14;Nikon" in it and one file which has the rest in it.

I tried with the following awk code

awk -F ';' 'NR==FNR {id[$1]; next} $1 in id' filter.csv main.csv but it does not remove the line or put it in another file. I also tried grep but that only works when the filter file has around 100 or so lines.

Does anyone know a way how i can get those 2 results like above?
# 2  
Old 02-10-2016
Hello SDohmen,

Could you please try following and let me know if this helps you.
Code:
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print}' pattern_Input_file FS=";" main_Input_file

Output will be as follows.
Code:
0018208944262;A 562381;VNA750E1;50;4999.14;Nikon

Thanks,
R. Singh
# 3  
Old 02-10-2016
How about
Code:
awk -F ';' 'NR==FNR {id[$1]; next} $1 in id {print > "Positive"; next} {print > "Negative"}' file1 file2

# 4  
Old 02-10-2016
Quote:
Originally Posted by RavinderSingh13
Hello SDohmen,

Could you please try following and let me know if this helps you.
Code:
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print}' pattern_Input_file FS=";" main_Input_file

Output will be as follows.
Code:
0018208944262;A 562381;VNA750E1;50;4999.14;Nikon

Thanks,
R. Singh
Hi,

thank you for the quick reply but it does not seem to work. the pattern lines are still in the output and not in a seperated file.



Quote:
Originally Posted by RudiC
How about
Code:
awk -F ';' 'NR==FNR {id[$1]; next}  $1 in id {print > "Positive"; next} {print > "Negative"}' file1  file2

This seems to create no file at all. i get no output.

Last edited by SDohmen; 02-10-2016 at 09:26 AM.. Reason: adding answer.
# 5  
Old 02-10-2016
Quote:
Originally Posted by SDohmen
Hi,

thank you for the quick reply but it does not seem to work. the pattern lines are still in the output and not in a seperated file.
Hello SDohmen,

You could redirect the output to a Output_file as follows then.
Code:
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print}' pattern_file FS=";" main_file  >  Output_file

EDIT: In case you need Output_file for matches and non-matches both then following may help you in same then.
Code:
 awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> "Output_match_found_file"} !($1 in A){print >> "Output_match_NOT_found_file"}' pattern_file FS=";" main_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 02-10-2016 at 09:29 AM.. Reason: Added one more solution for having 2 Output_file for matching and non-matching fields.
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 02-10-2016
Quote:
Originally Posted by RavinderSingh13
Hello SDohmen,

You could redirect the output to a Output_file as follows then.
Code:
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print}' pattern_file FS=";" main_file  >  Output_file

Thanks,
R. Singh
I know i can redirect the output but it still did not remove the lines from the main file. Sorry if that was unclear before.


Omg you are fast with editing Smilie. I tested the revised code and it seems to work fine. I now have 2 files with different output in each. I will test it some more with other files to be sure but it seems to work. Thank you again for the extremly fast helping.
# 7  
Old 02-10-2016
Quote:
Originally Posted by SDohmen
I know i can redirect the output but it still did not remove the lines from the main file. Sorry if that was unclear before.

Omg you are fast with editing Smilie. I tested the revised code and it seems to work fine. I now have 2 files with different output in each. I will test it some more with other files to be sure but it seems to work. Thank you again for the extremly fast helping.
Hello SDohmen,

Here is exactly what you may be looking for.
Code:
cat script.ksh
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> "Output_match_found_file"} !($1 in A){print >> "Output_match_NOT_found_file"}' pattern_file FS=";" main_file
if [[ $? == 0 ]]
then
        mv  main_file main_file_Original
        mv  Output_match_NOT_found_file main_file
else
        echo "Please check there seems to be an issue with awk command."
fi

Above code will create a backup for main_file with name main_file_Original and remvove non-matching lines from main_file too, let me know if this helps you.
EDIT: Also you could try following too. Here if first awk statement fails then off course 2nd statement to change Input_file name will not be executed then.
Code:
 awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> "Output_match_found_file"} !($1 in A){print >> "Output_match_NOT_found_file"}' pattern_file FS=";" match_file && mv Output_match_NOT_found_file match_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 02-10-2016 at 09:55 AM.. Reason: Added one more solution on same now.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk with sed to combine lines and remove specific odd # pattern from line

In the awk piped to sed below I am trying to format file by removing the odd xxxx_digits and whitespace after, then move the even xxxx_digit to the line above it and add a space between them. There may be multiple lines in file but they are in the same format. The Filename_ID line is the last line... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. UNIX for Beginners Questions & Answers

awk to remove pattern and lines above pattern

In the awk below I am trying to remove all lines above and including the pattern Test or Test2. Each block is seperated by a newline and Test2 also appears in the lines to keep but it will always have additional text after it. The Test to remove will not. The awk executed until the || was added... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. UNIX for Beginners Questions & Answers

awk function to remove lines that contain contents of another file

Hi, I'd be grateful for your help with the following. I have a file (file.txt) with 10 columns and about half a million lines, which in simplified form looks like this: ID Col1 Col2 Col3.... a 4 2 8 b 5 6 1 c 8 4 1 d... (4 Replies)
Discussion started by: aberg
4 Replies

4. Shell Programming and Scripting

Using awk to remove lines from file that match text

I am trying to remove each line in which $2 is FP or RFP. I believe the below will remove one instance but not both. Thank you :). file 12 123 FP 11 10 RFP awk awk -F'\t' ' $2 != "FP"' file desired output 12 11 (6 Replies)
Discussion started by: cmccabe
6 Replies

5. Shell Programming and Scripting

awk to remove lines in file if specific field matches

I am trying to remove lines in the target.txt file if $5 before the - in that file matches sorted_list. I have tried grep and awk. Thank you :). grep grep -v -F -f targets.bed sort_list grep -vFf sort_list targets awk awk -F, ' > FILENAME == ARGV {to_remove=1; next} > ! ($5 in... (2 Replies)
Discussion started by: cmccabe
2 Replies

6. UNIX for Dummies Questions & Answers

awk -remove pattern from file

I have a file like this - I want to remove the 2015 (or any four digit #) from column $4 so I can get: Nov 05 1997 /ifs/inventory2/ for example. Im not sure how. Should I use an if statement with awk? Jan 16 2015 23:45 /ifs/sql_file Jan 16 2015 23:45 /ifs/sql_file Nov 05 2015 1997... (4 Replies)
Discussion started by: newbie2010
4 Replies

7. Shell Programming and Scripting

How to grab a block of data in a file with repeating pattern?

I need to send email to receipient in each block of data in a file which has the sender address under TO and just send that block of data where it ends as COMPANY. I tried to work this out by getting line numbers of the string HELLO but unable to grab the next block of data to send the next... (5 Replies)
Discussion started by: loggedout
5 Replies

8. Shell Programming and Scripting

Getting lines before and until next pattern in file /awk, sed

Hi, I need to get specific parts in a large file. I need to: Get a line containing an IP address, and read from there to another line saying ***SNMP-END*** So, I have the start and the end well defined, but the problem is that apparently the awk command using the -F option doesn't work... (17 Replies)
Discussion started by: ocramas
17 Replies

9. Shell Programming and Scripting

shell script to remove all lines from a file before a line starting with pattern

hi,, i hav a file with many lines.i need to remove all lines before a line begginning with a specific pattern from the file because these lines are not required. Can u help me out with either a perl script or shell script example:- if file initially contains lines: a b c d .1.2 d e f... (2 Replies)
Discussion started by: raksha.s
2 Replies

10. Shell Programming and Scripting

Search file for pattern and grab some lines before pattern

I want to search a file for a string and then if the string is found I need the line that the string is on - but also the previous two lines from the file (that the pattern will not be found in) This is on solaris Can you help? (2 Replies)
Discussion started by: frustrated1
2 Replies
Login or Register to Ask a Question