awk remove/grab lines from file with pattern from other file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk remove/grab lines from file with pattern from other file
# 22  
Old 02-12-2016
How about posting the result of your script applied to your samples?
# 23  
Old 02-15-2016
Sorry for the late reply but it was weekend already Smilie

I tried your code but i think i am missing something/doing something wrong.

Code:
PAD='/srv/prijslijst/test'

awk -F"\t" '
FNR == 1        {FC++
                }
FC == 1         {FILTWORD[$0]
                 next
                }
FC == 2         {FILTEAN[$0]
                 next
                }

FNR == 1 ||
$6 == 7         {next
                }

                {for (SP in FILTWORD) if ($0 ~ SP)      {print > "removed_woord"
                                                         next
                                                        }
                 gsub (/>/, "")
                }

!$4  ||
$9 ~ "-"        {next
                }

$8 in FILTEAN   {print $8, $1, $9, $10, $4+0, $7  > "removed_EAN"
                 next
                }

                {print $8, $1, $9, $10, $4+0, $7, "CMT"  > "clean"
                }
' OFS=";" OFMT="%.2f" $VOEG/filters/woord_COM.csv $VOEG/filters/niet_gebruiken_ean.csv $PAD/preisliste.csv

when i run the script above i see that it is downloading the file (not shown above) but after that nothing happens. I am guessing that is because you mentioned file2 file3 in your version but technicly i dont have those anymore because everything is in this script from what i can see.

[EDIT]
Duh it helps if i actually add both filter files.

Lets compare both runs of the scripts:


Original.csv 6393 lines (this is the file that is downloaded without any script touching it)

Original script:
clean.csv 6393 lines removed_woord.csv 1774 lines
removed_EAN.csv 959 lines


New version:
clean 4382 lines
removed_woord.csv 1774L
removed_EAN.csv 584L

It is a bit strange that there is so much difference but i havent checked any details yet. I will post those also as soon as i have the time for it.

Last edited by SDohmen; 02-15-2016 at 11:31 AM.. Reason: Duh i forgot the 2 filter files.
# 24  
Old 02-15-2016
file2 is the Word filter file, file3 the EAN filter file.

---------- Post updated at 16:30 ---------- Previous update was at 16:29 ----------

Again, allow to test the scriptlet by posting the desired result.
# 25  
Old 02-15-2016
Quote:
Originally Posted by RudiC
file2 is the Word filter file, file3 the EAN filter file.

---------- Post updated at 16:30 ---------- Previous update was at 16:29 ----------

Again, allow to test the scriptlet by posting the desired result.
I just updated the script as i noticed it. Test results are also in the same post.

The output should look like the following:

Code:
4016432428011;37773;0580178;1;29.33;Gastroback;CMT 
7290006780829;37816;1041191490;0;7.25;SodaStream;CMT 
4021121468858;38616;20110019;4;75.76;Kathrein;CMT 
5025232434084;38909;WES035K503;5;7.98;Panasonic;CMT 
4021121338540;39101;274425;5;5.96;Kathrein;CMT

The output that it gives is fine. The problem is that it seems to remove quite alot of lines. I will check out what kinda lines it removed and post those to see how it could be solved,
# 26  
Old 02-15-2016
How can a clean file of 6393 lines come from 6393 line original file?

And, as said before, the criteria of what and when to remove/redistribute wasn't completely nor correctly recognized. You may want to respecify in detail.
# 27  
Old 02-15-2016
Here is a short snippet of a sdiff between the 2 clean files. On the left is your output and on the right is the original output.

Code:
5031713055075;93940;44951534;1;140.25;OKI;CMT                   5031713055075;93940;44951534;1;140.25;OKI;CMT
5031713047964;93946;44469704;1;66.30;OKI;CMT                    5031713047964;93946;44469704;1;66.30;OKI;CMT
5031713047988;93947;44469706;1;66.30;OKI;CMT                    5031713047988;93947;44469706;1;66.30;OKI;CMT
5031713047971;93948;44469705;1;66.30;OKI;CMT                    5031713047971;93948;44469705;1;66.30;OKI;CMT
5031713047995;93949;44469803;1;49.54;OKI;CMT                    5031713047995;93949;44469803;1;49.54;OKI;CMT
4242002688879;94131;40304;9;37;Bosch;CMT                      | 0790069375798;94019;DIR-505/E;8;15.12;D-Link;CMT
                                                              > 4039784518926;94045;2.633-002.0;2;13.00;Kärcher;CMT
                                                              > 4242002688879;94131;40304;9;37.00;Bosch;CMT
                                                              > 8806085027084;92798;CLT-W406/SEE;1;9.16;Samsung;CMT
                                                              > 5025232683680;94246;DMC-FZ200;10;251.26;Panasonic;CMT
4210201072799;94078;31B;3;16.80;Braun;CMT                       4210201072799;94078;31B;3;16.80;Braun;CMT
5031713056188;94360;44844616;1;73.03;OKI;CMT                    5031713056188;94360;44844616;1;73.03;OKI;CMT
5031713056164;94362;44844615;1;167.98;OKI;CMT                   5031713056164;94362;44844615;1;167.98;OKI;CMT
8712581639389;94341;AJ3115/12;10;14.28;Philips;CMT              8712581639389;94341;AJ3115/12;10;14.28;Philips;CMT
0374318820040;94416;2250;0;100;Singer;CMT                     | 0374318820040;94416;2250;0;100.00;Singer;CMT
8004399326415;94427;EN 266.BAE;10;130;DeLonghi;CMT            | 8004399326415;94427;EN 266.BAE;10;130.00;DeLonghi;CMT
0010942213017;94436;XN 2140;10;50.41;Krups;CMT                  0010942213017;94436;XN 2140;10;50.41;Krups;CMT
                                                              > 0885370404449;94571;3LR-00001;10;30.25;Microsoft;CMT
0885909627592;94624;MD824ZM/A;2;29.33;Apple;CMT                 0885909627592;94624;MD824ZM/A;2;29.33;Apple;CMT

I have no results yet from the ean filter file as those arent saved on the machine but i will post them also later on.

---------- Post updated at 04:54 PM ---------- Previous update was at 04:52 PM ----------

Quote:
Originally Posted by RudiC
How can a clean file of 6393 lines come from 6393 line original file?

And, as said before, the criteria of what and when to remove/redistribute wasn't completely nor correctly recognized. You may want to respecify in detail.
Oops my fault. the original file is 9231 lines.
# 28  
Old 02-15-2016
You wanted suppliernumbers having a "-" sign to be removed. That accounts for
Code:
0790069375798;94019;DIR-505/E;8;15.12;D-Link;CMT
4039784518926;94045;2.633-002.0;2;13.00;Kärcher;CMT
8806085027084;92798;CLT-W406/SEE;1;9.16;Samsung;CMT
5025232683680;94246;DMC-FZ200;10;251.26;Panasonic;CMT
0885370404449;94571;3LR-00001;10;30.25;Microsoft;CMT

.
The "Bosch" line IS in BOTH result files. The "Singer" and "DeLonghi" lines differ in that decimals are not printed for integer numbers.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk with sed to combine lines and remove specific odd # pattern from line

In the awk piped to sed below I am trying to format file by removing the odd xxxx_digits and whitespace after, then move the even xxxx_digit to the line above it and add a space between them. There may be multiple lines in file but they are in the same format. The Filename_ID line is the last line... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. UNIX for Beginners Questions & Answers

awk to remove pattern and lines above pattern

In the awk below I am trying to remove all lines above and including the pattern Test or Test2. Each block is seperated by a newline and Test2 also appears in the lines to keep but it will always have additional text after it. The Test to remove will not. The awk executed until the || was added... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. UNIX for Beginners Questions & Answers

awk function to remove lines that contain contents of another file

Hi, I'd be grateful for your help with the following. I have a file (file.txt) with 10 columns and about half a million lines, which in simplified form looks like this: ID Col1 Col2 Col3.... a 4 2 8 b 5 6 1 c 8 4 1 d... (4 Replies)
Discussion started by: aberg
4 Replies

4. Shell Programming and Scripting

Using awk to remove lines from file that match text

I am trying to remove each line in which $2 is FP or RFP. I believe the below will remove one instance but not both. Thank you :). file 12 123 FP 11 10 RFP awk awk -F'\t' ' $2 != "FP"' file desired output 12 11 (6 Replies)
Discussion started by: cmccabe
6 Replies

5. Shell Programming and Scripting

awk to remove lines in file if specific field matches

I am trying to remove lines in the target.txt file if $5 before the - in that file matches sorted_list. I have tried grep and awk. Thank you :). grep grep -v -F -f targets.bed sort_list grep -vFf sort_list targets awk awk -F, ' > FILENAME == ARGV {to_remove=1; next} > ! ($5 in... (2 Replies)
Discussion started by: cmccabe
2 Replies

6. UNIX for Dummies Questions & Answers

awk -remove pattern from file

I have a file like this - I want to remove the 2015 (or any four digit #) from column $4 so I can get: Nov 05 1997 /ifs/inventory2/ for example. Im not sure how. Should I use an if statement with awk? Jan 16 2015 23:45 /ifs/sql_file Jan 16 2015 23:45 /ifs/sql_file Nov 05 2015 1997... (4 Replies)
Discussion started by: newbie2010
4 Replies

7. Shell Programming and Scripting

How to grab a block of data in a file with repeating pattern?

I need to send email to receipient in each block of data in a file which has the sender address under TO and just send that block of data where it ends as COMPANY. I tried to work this out by getting line numbers of the string HELLO but unable to grab the next block of data to send the next... (5 Replies)
Discussion started by: loggedout
5 Replies

8. Shell Programming and Scripting

Getting lines before and until next pattern in file /awk, sed

Hi, I need to get specific parts in a large file. I need to: Get a line containing an IP address, and read from there to another line saying ***SNMP-END*** So, I have the start and the end well defined, but the problem is that apparently the awk command using the -F option doesn't work... (17 Replies)
Discussion started by: ocramas
17 Replies

9. Shell Programming and Scripting

shell script to remove all lines from a file before a line starting with pattern

hi,, i hav a file with many lines.i need to remove all lines before a line begginning with a specific pattern from the file because these lines are not required. Can u help me out with either a perl script or shell script example:- if file initially contains lines: a b c d .1.2 d e f... (2 Replies)
Discussion started by: raksha.s
2 Replies

10. Shell Programming and Scripting

Search file for pattern and grab some lines before pattern

I want to search a file for a string and then if the string is found I need the line that the string is on - but also the previous two lines from the file (that the pattern will not be found in) This is on solaris Can you help? (2 Replies)
Discussion started by: frustrated1
2 Replies
Login or Register to Ask a Question