awk remove/grab lines from file with pattern from other file

02-10-2016

Registered User

57, 3

Join Date: Jan 2016

Last Activity: 19 September 2019, 10:01 AM EDT

Posts: 57

Thanks Given: 17

Thanked 3 Times in 2 Posts

awk remove/grab lines from file with pattern from other file

Sorry for the weird title but i have the following problem.

We have several files which have between 10000 and about 500000 lines in them. From these files we want to remove lines which contain a pattern which is located in another file (around 20000 lines, all EAN codes). We also want to get the removed lines in a seperate file so we can check if lines get removed which shouldn't (this has nothing todo with the matching)

pattern file:

Code:

0018208944262
4016432428011
7290006780829
4021121468858
5025232434084
4021121338540
4021121435638

main file

Code:

0018208944262;A 562381;VNA750E1;50;4999.14;Nikon
4242004181811;A 582194;B55CR22N0;2;939.46;Neff
4242004181439;A 582193;B45CS24N0;1;895.04;Neff
4716123314882;A 552806;NH-L9A;0;39.90;Noctua
4716123314875;A 548120;NH-L9I;1;39.01;Noctua

With both the above i should get 1 file that only has "0018208944262;A 562381;VNA750E1;50;4999.14;Nikon" in it and one file which has the rest in it.

I tried with the following awk code

awk -F ';' 'NR==FNR {id[$1]; next} $1 in id' filter.csv main.csv but it does not remove the line or put it in another file. I also tried grep but that only works when the filter file has around 100 or so lines.

Does anyone know a way how i can get those 2 results like above?

SDohmen

View Public Profile for SDohmen

Find all posts by SDohmen

02-10-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello SDohmen,

Could you please try following and let me know if this helps you.

Code:

awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print}' pattern_Input_file FS=";" main_Input_file

Output will be as follows.

Code:

0018208944262;A 562381;VNA750E1;50;4999.14;Nikon

Thanks,
R. Singh

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

02-10-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

How about

Code:

awk -F ';' 'NR==FNR {id[$1]; next} $1 in id {print > "Positive"; next} {print > "Negative"}' file1 file2

RudiC

View Public Profile for RudiC

Find all posts by RudiC

02-10-2016

Registered User

57, 3

Join Date: Jan 2016

Last Activity: 19 September 2019, 10:01 AM EDT

Posts: 57

Thanks Given: 17

Thanked 3 Times in 2 Posts

Quote:

Originally Posted by RavinderSingh13

Hello SDohmen,

Could you please try following and let me know if this helps you.

Code:

awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print}' pattern_Input_file FS=";" main_Input_file

Output will be as follows.

Code:

0018208944262;A 562381;VNA750E1;50;4999.14;Nikon

Thanks,
R. Singh

Hi,

thank you for the quick reply but it does not seem to work. the pattern lines are still in the output and not in a seperated file.

Quote:

Originally Posted by RudiC

How about

Code:

awk -F ';' 'NR==FNR {id[$1]; next}  $1 in id {print > "Positive"; next} {print > "Negative"}' file1  file2

This seems to create no file at all. i get no output.

Last edited by SDohmen; 02-10-2016 at 09:26 AM.. Reason: adding answer.

SDohmen

View Public Profile for SDohmen

Find all posts by SDohmen

02-10-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Quote:

Originally Posted by SDohmen

Hi,

thank you for the quick reply but it does not seem to work. the pattern lines are still in the output and not in a seperated file.

Hello SDohmen,

You could redirect the output to a Output_file as follows then.

Code:

awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print}' pattern_file FS=";" main_file  >  Output_file

EDIT: In case you need Output_file for matches and non-matches both then following may help you in same then.

Code:

 awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> "Output_match_found_file"} !($1 in A){print >> "Output_match_NOT_found_file"}' pattern_file FS=";" main_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 02-10-2016 at 09:29 AM.. Reason: Added one more solution for having 2 Output_file for matching and non-matching fields.

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

02-10-2016

Registered User

57, 3

Join Date: Jan 2016

Last Activity: 19 September 2019, 10:01 AM EDT

Posts: 57

Thanks Given: 17

Thanked 3 Times in 2 Posts

Quote:

Originally Posted by RavinderSingh13

Hello SDohmen,

You could redirect the output to a Output_file as follows then.

Code:

awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print}' pattern_file FS=";" main_file  >  Output_file

Thanks,
R. Singh

I know i can redirect the output but it still did not remove the lines from the main file. Sorry if that was unclear before.

Omg you are fast with editing

. I tested the revised code and it seems to work fine. I now have 2 files with different output in each. I will test it some more with other files to be sure but it seems to work. Thank you again for the extremly fast helping.

SDohmen

View Public Profile for SDohmen

Find all posts by SDohmen

02-10-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Quote:

Originally Posted by SDohmen

I know i can redirect the output but it still did not remove the lines from the main file. Sorry if that was unclear before.

Omg you are fast with editing Smilie

Hello SDohmen,

Here is exactly what you may be looking for.

Code:

cat script.ksh
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> "Output_match_found_file"} !($1 in A){print >> "Output_match_NOT_found_file"}' pattern_file FS=";" main_file
if [[ $? == 0 ]]
then
        mv  main_file main_file_Original
        mv  Output_match_NOT_found_file main_file
else
        echo "Please check there seems to be an issue with awk command."
fi

Above code will create a backup for main_file with name main_file_Original and remvove non-matching lines from main_file too, let me know if this helps you.
EDIT: Also you could try following too. Here if first awk statement fails then off course 2nd statement to change Input_file name will not be executed then.

Code:

 awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> "Output_match_found_file"} !($1 in A){print >> "Output_match_NOT_found_file"}' pattern_file FS=";" match_file && mv Output_match_NOT_found_file match_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 02-10-2016 at 09:55 AM.. Reason: Added one more solution on same now.

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

Shell Programming and Scripting

awk remove/grab lines from file with pattern from other file

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk with sed to combine lines and remove specific odd # pattern from line

Discussion started by: cmccabe

2. UNIX for Beginners Questions & Answers

awk to remove pattern and lines above pattern

Discussion started by: cmccabe

3. UNIX for Beginners Questions & Answers

awk function to remove lines that contain contents of another file

Discussion started by: aberg

4. Shell Programming and Scripting

Using awk to remove lines from file that match text

Discussion started by: cmccabe

5. Shell Programming and Scripting

awk to remove lines in file if specific field matches

Discussion started by: cmccabe

6. UNIX for Dummies Questions & Answers

awk -remove pattern from file

Discussion started by: newbie2010

7. Shell Programming and Scripting

How to grab a block of data in a file with repeating pattern?

Discussion started by: loggedout

8. Shell Programming and Scripting

Getting lines before and until next pattern in file /awk, sed

Discussion started by: ocramas

9. Shell Programming and Scripting

shell script to remove all lines from a file before a line starting with pattern

Discussion started by: raksha.s

10. Shell Programming and Scripting

Search file for pattern and grab some lines before pattern

Discussion started by: frustrated1