awk remove/grab lines from file with pattern from other file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk remove/grab lines from file with pattern from other file
# 8  
Old 02-10-2016
Quote:
Originally Posted by RavinderSingh13
Hello SDohmen,

Here is exactly what you may be looking for.
Code:
cat script.ksh
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> "Output_match_found_file"} !($1 in A){print >> "Output_match_NOT_found_file"}' pattern_file FS=";" main_file
if [[ $? == 0 ]]
then
        mv  main_file main_file_Original
        mv  Output_match_NOT_found_file main_file
else
        echo "Please check there seems to be an issue with awk command."
fi

Above code will create a backup for main_file with name main_file_Original and remvove non-matching lines from main_file too, let me know if this helps you.


Thanks,
R. Singh
I just tested the code as follows (adapted to my enviroment)
Code:
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> $PAD/raw4.csv} !($1 in A){print >> $PAD/removed.csv}' $VOEG/niet_gebruiken_ean.csv FS=";" $PAD/raw3.csv

but it does not seem to create the files. The error i am getting is as follows:

awk: 1: unexpected character '.'
awk: 1: unexpected character '.'
awk: cannot open /srv/prijslijst/lev/raw4.csv (No such file or directory)

I tried changing the $PAD variable to the dir itself but it does not change the output. From the shell itself it works fine.
# 9  
Old 02-10-2016
Quote:
Originally Posted by SDohmen
I just tested the code as follows (adapted to my enviroment)
Code:
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> $PAD/raw4.csv} !($1 in A){print >> $PAD/removed.csv}' $VOEG/niet_gebruiken_ean.csv FS=";" $PAD/raw3.csv

but it does not seem to create the files. The error i am getting is as follows:
awk: 1: unexpected character '.'
awk: 1: unexpected character '.'
awk: cannot open /srv/prijslijst/lev/raw4.csv (No such file or directory)
I tried changing the $PAD variable to the dir itself but it does not change the output. From the shell itself it works fine.
Hello SDohmen,

In awk assigning variables value will not be like shell ones. Could you please try following and let me know if this helps you.
Code:
 awk -vpad="$PAD" -vvoeg="$VOEG" 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> pad"/raw4.csv"} !($1 in A){print >> pad"/removed.csv"}'  $VOEG"/niet_gebruiken_ean.csv" FS=";"  $PAD"/raw3.csv"

Thanks,
R. Singh
# 10  
Old 02-10-2016
Quote:
Originally Posted by SDohmen
.
.
.
This seems to create no file at all. i get no output.
Did you look for two files called "Positive" and "Negative" in your pwd?
This User Gave Thanks to RudiC For This Post:
# 11  
Old 02-11-2016
Quote:
Originally Posted by RavinderSingh13
Hello SDohmen,

In awk assigning variables value will not be like shell ones. Could you please try following and let me know if this helps you.
Code:
 awk -vpad="$PAD" -vvoeg="$VOEG" 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> pad"/raw4.csv"} !($1 in A){print >> pad"/removed.csv"}'  $VOEG"/niet_gebruiken_ean.csv" FS=";"  $PAD"/raw3.csv"

Thanks,
R. Singh
I think i figured it out. The code above did not create the files either so i googled a bit and found out that the changed code like below works.

Code:
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print >> "'"$PAD/raw4.csv"'"} !($1 in A){print >> "'"$PAD/removed.csv"'"}' $VOEG/niet_gebruiken_ean.csv FS=";" $PAD/raw3.csv

Thank you for helping with the code.


Quote:
Originally Posted by RudiC
Did you look for two files called "Positive" and "Negative" in your pwd?
I overlooked those 2 as i thought it where text inline with the files Smilie. Later on i noticed it created 2 files named positive and negative. Sorry for overlooking that Smilie

---------- Post updated 11-02-16 at 09:47 AM ---------- Previous update was 10-02-16 at 03:16 PM ----------

Here i am again. I tested the code with 1 of the things i needed filtering and it works just fine.

Now i am struggling with the other filter though. Same as before i need too filter lines from 1 file to 2 others. The difference here is that the filter file has words in it in which some have spaces like below:

Code:
All In   One PC       
Asus PC       
Bandsaege       
CI Module       
Desktop PC

And the other problem is that these words can be anywhere in the file. I tried using the awk line from before and let it run on each colomn seperatly but it seems that it looks for each word seperatly creating false negatives. Any idea how i best can solve this?

I know i can do this with grep but its awefull slow and gives problems with too many lines.

---------- Post updated at 09:57 AM ---------- Previous update was at 09:47 AM ----------

Sorry i forgot to add some samples.

Main file
Code:
37760   Haushalt & Küche > SodaStream & Wassermaxx > Sirup      SodaStream Orange Sirup 500ml   3.1682242990654 EUR     7.00    SodaStream      7290002793335   1020103490      >10     4.99    0.699   0       Haushalt & Kueche > SodaStream & Wassermaxx > Sirup
37761   Haushalt & Küche > SodaStream & Wassermaxx > Sirup      SodaStream Zitrone-Limette Sirup 500ml  2.5046728971963 EUR     7.00    SodaStream      7290002793328   1020110490      0       4.99    0.600   0       Haushalt & Kueche > SodaStream & Wassermaxx > Sirup
37762   Haushalt & Küche > SodaStream & Wassermaxx > Sirup      SodaStream Apfel-Mix Sirup 500ml        3.5046728971963 EUR     7.00    SodaStream      7290002793229   1020108491      3       4.99    0.600   0       Haushalt & Kueche > SodaStream & Wassermaxx > Sirup
37765   Haushalt & Küche > SodaStream & Wassermaxx > Sirup      SodaStream Isotonic Sirup 375ml 3.7289719626168 EUR     7.00    SodaStream      7290010498574   5140013 >10     4.99    0.400   0       Haushalt & Kueche > SodaStream & Wassermaxx > Sirup
37773   Haushalt & Küche > Elektro Kleingeräte > Eierkocher     Gastroback 42801 Design Eierkocher Silber       33      EUR     19.00   Gastroback      4016432428011   0580178 1       0       1.000   0       Haushalt & Kueche > Elektro Kleingeraete > Eierkocher
54164           Logitech R400 Wireless Presenter        29.327731092437 EUR     19.00   Logitech        5099206018129   910-001357      2       4.99    0.210   0
68132   Computer & Zubehör > Eingabegeräte > Mäuse      Logitech MK710 Wireless Desktop 64.621848739496 EUR     19.00   Logitech        5099206020948   920-002420      10      4.99    1.390   0       Computer & Zubehoer > Eingabegeraete > Maeuse

This main file is different in most cases but that is mainly the delimiter which i should be able to handle.


filter file
Code:
4K Fernseher
Acer Aspire
Acer PC
Acer Veriton
All In One PC
Asus PC
Bandsaege
Wireless Desktop

With above samples it should filter the last line from the main file (and put it in a seperated file) because of the word(s) "Wireless Desktop" but not the line above which only contains "Wireless".

I hope this makes it easier.
# 12  
Old 02-11-2016
Not sure how to discriminate this new problem from the other. Does it come on top or in place? Some decent samples could help.
# 13  
Old 02-11-2016
Quote:
Originally Posted by RudiC
Not sure how to discriminate this new problem from the other. Does it come on top or in place? Some decent samples could help.
I am not 100% sure what you mean but i added sample data.

In total i have 2 filters which run over different stages of the same file.

The word filter will be run first as that is when i still have all the data to look for those. After that it will be cut and processed to a near finished file which then filters against the other EAN filter to remove the last lines which arent needed.

Technically it should be fine if they are in 1 filter file (and loop in the beginning) but that would probably make the code really complicated i guess.
# 14  
Old 02-11-2016
This might solve your new problem:
Code:
awk -F ';' '
NR==FNR         {id[$0]
                 next
                }
                {for (SP in id) if ($0 ~ SP)    {print > "Positive"    
                                                 next
                                                }
                }
                {print > "Negative"    
                }
' file1 file2
cf *ive
Negative:
37760   Haushalt & Küche > SodaStream & Wassermaxx > Sirup      SodaStream Orange Sirup 500ml   3.1682242990654 EUR     7.00    SodaStream      7290002793335   1020103490      >10     4.99    0.699   0       Haushalt & Kueche > SodaStream & Wassermaxx > Sirup
37761   Haushalt & Küche > SodaStream & Wassermaxx > Sirup      SodaStream Zitrone-Limette Sirup 500ml  2.5046728971963 EUR     7.00    SodaStream      7290002793328   1020110490      0       4.99    0.600   0       Haushalt & Kueche > SodaStream & Wassermaxx > Sirup
37762   Haushalt & Küche > SodaStream & Wassermaxx > Sirup      SodaStream Apfel-Mix Sirup 500ml        3.5046728971963 EUR     7.00    SodaStream      7290002793229   1020108491      3       4.99    0.600   0       Haushalt & Kueche > SodaStream & Wassermaxx > Sirup
37765   Haushalt & Küche > SodaStream & Wassermaxx > Sirup      SodaStream Isotonic Sirup 375ml 3.7289719626168 EUR     7.00    SodaStream      7290010498574   5140013 >10     4.99    0.400   0       Haushalt & Kueche > SodaStream & Wassermaxx > Sirup
37773   Haushalt & Küche > Elektro Kleingeräte > Eierkocher     Gastroback 42801 Design Eierkocher Silber       33      EUR     19.00   Gastroback      4016432428011   0580178 1       0       1.000   0       Haushalt & Kueche > Elektro Kleingeraete > Eierkocher
54164           Logitech R400 Wireless Presenter        29.327731092437 EUR     19.00   Logitech        5099206018129   910-001357      2       4.99    0.210   0
Positive:
68132   Computer & Zubehör > Eingabegeräte > Mäuse      Logitech MK710 Wireless Desktop 64.621848739496 EUR     19.00   Logitech        5099206020948   920-002420      10      4.99    1.390   0       Computer & Zubehoer > Eingabegeraete > Maeuse

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk with sed to combine lines and remove specific odd # pattern from line

In the awk piped to sed below I am trying to format file by removing the odd xxxx_digits and whitespace after, then move the even xxxx_digit to the line above it and add a space between them. There may be multiple lines in file but they are in the same format. The Filename_ID line is the last line... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. UNIX for Beginners Questions & Answers

awk to remove pattern and lines above pattern

In the awk below I am trying to remove all lines above and including the pattern Test or Test2. Each block is seperated by a newline and Test2 also appears in the lines to keep but it will always have additional text after it. The Test to remove will not. The awk executed until the || was added... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. UNIX for Beginners Questions & Answers

awk function to remove lines that contain contents of another file

Hi, I'd be grateful for your help with the following. I have a file (file.txt) with 10 columns and about half a million lines, which in simplified form looks like this: ID Col1 Col2 Col3.... a 4 2 8 b 5 6 1 c 8 4 1 d... (4 Replies)
Discussion started by: aberg
4 Replies

4. Shell Programming and Scripting

Using awk to remove lines from file that match text

I am trying to remove each line in which $2 is FP or RFP. I believe the below will remove one instance but not both. Thank you :). file 12 123 FP 11 10 RFP awk awk -F'\t' ' $2 != "FP"' file desired output 12 11 (6 Replies)
Discussion started by: cmccabe
6 Replies

5. Shell Programming and Scripting

awk to remove lines in file if specific field matches

I am trying to remove lines in the target.txt file if $5 before the - in that file matches sorted_list. I have tried grep and awk. Thank you :). grep grep -v -F -f targets.bed sort_list grep -vFf sort_list targets awk awk -F, ' > FILENAME == ARGV {to_remove=1; next} > ! ($5 in... (2 Replies)
Discussion started by: cmccabe
2 Replies

6. UNIX for Dummies Questions & Answers

awk -remove pattern from file

I have a file like this - I want to remove the 2015 (or any four digit #) from column $4 so I can get: Nov 05 1997 /ifs/inventory2/ for example. Im not sure how. Should I use an if statement with awk? Jan 16 2015 23:45 /ifs/sql_file Jan 16 2015 23:45 /ifs/sql_file Nov 05 2015 1997... (4 Replies)
Discussion started by: newbie2010
4 Replies

7. Shell Programming and Scripting

How to grab a block of data in a file with repeating pattern?

I need to send email to receipient in each block of data in a file which has the sender address under TO and just send that block of data where it ends as COMPANY. I tried to work this out by getting line numbers of the string HELLO but unable to grab the next block of data to send the next... (5 Replies)
Discussion started by: loggedout
5 Replies

8. Shell Programming and Scripting

Getting lines before and until next pattern in file /awk, sed

Hi, I need to get specific parts in a large file. I need to: Get a line containing an IP address, and read from there to another line saying ***SNMP-END*** So, I have the start and the end well defined, but the problem is that apparently the awk command using the -F option doesn't work... (17 Replies)
Discussion started by: ocramas
17 Replies

9. Shell Programming and Scripting

shell script to remove all lines from a file before a line starting with pattern

hi,, i hav a file with many lines.i need to remove all lines before a line begginning with a specific pattern from the file because these lines are not required. Can u help me out with either a perl script or shell script example:- if file initially contains lines: a b c d .1.2 d e f... (2 Replies)
Discussion started by: raksha.s
2 Replies

10. Shell Programming and Scripting

Search file for pattern and grab some lines before pattern

I want to search a file for a string and then if the string is found I need the line that the string is on - but also the previous two lines from the file (that the pattern will not be found in) This is on solaris Can you help? (2 Replies)
Discussion started by: frustrated1
2 Replies
Login or Register to Ask a Question