Help with awk, using a file to filter another one


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with awk, using a file to filter another one
# 1  
Old 12-27-2012
Help with awk, using a file to filter another one

I have a main file:
Code:
...
17,466971    0,095185    17,562156    id 676
17,466971    0,096694    17,563665    id 677
17,466971    0,09816        17,565131    id 678
17,466971    0,099625    17,566596    id 679
17,466971    0,101091    17,568062    id 680
17,466971    0,016175    17,483146    id 681
17,466971    0,101793    17,568764    id 682
17,466971    0,10253        17,569501    id 683
38,166772    0,08125        38,248022    id 1572
38,166772    0,082545    38,249317    id 1573
38,233772    0,005457    38,239229    id 1574
38,233772    0,082113    38,315885    id 1575
38,299771    0,081412    38,381183    id 1576
38,299771    0,006282    38,306053    id 1577
38,299771    0,083627    38,383398    id 1578
38,299771    0,085093    38,384864    id 1579
38,299771    0,008682    38,308453    id 1580
38,299771    0,085094    38,384865    id 1581
...

I wanna to supprime/delete some lines based on this other file, last collum (id) :

Code:
...
d 17.483146 1 0 udp 181 ------- 1 19.0 2.0 681 
d 38.239229 1 0 udp 571 ------- 1 19.0 2.0 1574 
d 38.306053 1 0 udp 1000 ------- 1 19.0 2.0 1577 
d 38.308453 1 0 udp 1000 ------- 1 19.0 2.0 1580 
d 38.372207 1 0 udp 546 ------- 1 19.0 2.0 1582 
d 38.441845 1 0 udp 499 ------- 1 19.0 2.0 1585 
d 38.505262 1 0 udp 616 ------- 1 19.0 2.0 1586 
d 38.572324 1 0 udp 695 ------- 1 19.0 2.0 1588 
d 38.639246 1 0 udp 597 ------- 1 19.0 2.0 1590 
d 38.639758 1 0 udp 640 ------- 1 19.0 2.0 1591 

...

For the example above, the result would be:


Code:
17,466971    0,095185    17,562156    id 676
17,466971    0,096694    17,563665    id 677
17,466971    0,09816        17,565131    id 678
17,466971    0,099625    17,566596    id 679
17,466971    0,016175    17,483146    id 681
17,466971    0,101793    17,568764    id 682
17,466971    0,10253        17,569501    id 683
38,166772    0,08125        38,248022    id 1572
38,166772    0,082545    38,249317    id 1573
38,233772    0,082113    38,315885    id 1575
38,299771    0,081412    38,381183    id 1576
38,299771    0,083627    38,383398    id 1578
38,299771    0,085093    38,384864    id 1579
38,299771    0,085094    38,384865    id 1581


The lines deletes were:
Code:
17,466971    0,101091    17,568062    id 680
38,233772    0,005457    38,239229    id 1574
38,299771    0,006282    38,306053    id 1577
38,299771    0,008682    38,308453    id 1580

Thank you in advance

Last edited by Corona688; 12-27-2012 at 07:07 PM..
# 2  
Old 12-27-2012
For the first file, NR will be identical to FNR, so it will save the id (last column, i.e. $NF ) into the D array as D[651]=1, etc.

Then when NR stops being equal to FNR ( fnr will reset to 1, nr won't ) it will start checking if the last column is in the D array !( $NF in D ). If it isn't, the expression will be non-zero, and the line will be printed.

Code:
awk 'NR==FNR { D[$NF]++; next } !($NF in D)' todelete datafile

This User Gave Thanks to Corona688 For This Post:
# 3  
Old 12-27-2012
How you indicates that the filter need to check the last collumn ?
# 4  
Old 12-27-2012
NF is a special variable that means 'number of columns'. Columns are counted 1,2,...,NF.

So $NF means 'the last column', since $ means 'convert from column number into column'.
This User Gave Thanks to Corona688 For This Post:
# 5  
Old 12-27-2012
Really thanks, it works! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Filter and sort the file using awk

I have file and process it and provide clean output. input file Device Symmetrix Name : 000A4 Device Symmetrix Name : 000A5 Device Symmetrix Name : 000A6 Device Symmetrix Name : 000A7 Device Symmetrix Name : 000A8 Device Symmetrix Name : 000A9 Device Symmetrix Name ... (10 Replies)
Discussion started by: ranjancom2000
10 Replies

2. Shell Programming and Scripting

Awk/sed/cut to filter out records from a file based on criteria

I have two files and would need to filter out records based on certain criteria, these column are of variable lengths, but the lengths are uniform throughout all the records of the file. I have shown a sample of three records below. Line 1-9 is the item number "0227546_1" in the case of the first... (15 Replies)
Discussion started by: MIA651
15 Replies

3. Shell Programming and Scripting

awk to filter file based on seperate conditions

The below awk will filter a list of 30,000 lines in the tab-delimited file. What I am having trouble with is adding a condition to SVTYPE=CNV that will only print that line if CI= must be >.05 . The other condition to add is if SVTYPE=Fusion, then in order to print that line READ_COUNT must... (3 Replies)
Discussion started by: cmccabe
3 Replies

4. Shell Programming and Scripting

awk to filter file using another working on smaller subset

In the below awk if I use the attached file as the input, I get no results for TCF4. However, if I just copy that line from the attached file and use that as input I get results for TCF4. Basically the gene file is a 1 column list that is used to filter $8 of the attached file. When there is a... (9 Replies)
Discussion started by: cmccabe
9 Replies

5. Shell Programming and Scripting

awk to filter file using range in another file

I have a very large tab-delimited, ~2GB file2 that I am trying to filter using $2 of file1. If $2 of file1 is in the range of $2 and $3 in file1 then the entire line of file2 is outputed. If the range match is not found then that line is skipped. The awk below does run but no output results. ... (3 Replies)
Discussion started by: cmccabe
3 Replies

6. Shell Programming and Scripting

awk filter by columns of file csv

Hi, I would like extract some lines from file csv using awk , below the example: I have the file test.csv with in content below. FLUSSO;COD;DATA_LAV;ESITO ULL;78;17/09/2013;OL ULL;45;05/09/2013;Apertura NP;45;13/09/2013;Riallineamento ULLNP;78;17/09/2013;OL NPG;14;12/09/2013;AperturaTK... (6 Replies)
Discussion started by: giankan
6 Replies

7. Shell Programming and Scripting

awk-filter record by another file

I have file1 3049 3138 4672 22631 45324 112382 121240 125470 130289 186128 193996 194002 202776 228002 253221 273523 284601 284605 641858 (8 Replies)
Discussion started by: biomed
8 Replies

8. Shell Programming and Scripting

AWK filter from file and print

Dear all, I am using awk to filter some data like this:- awk 'NR==FNR{a;next}($1 in a)' FS=":" filter.dat data.dat >! out.dat where the filter and input data look like this:- filter.dat... n_o00j_1900_40_007195350_0:n_o00j_1940_40_007308526... (3 Replies)
Discussion started by: atb299
3 Replies

9. Shell Programming and Scripting

Filter records in a file using AWK

I want to filter records in one of my file using AWK command (or anyother command). I am using the below code awk -F@ '$1=="0003"&&"$2==20100402" print {$0}' $INPUT > $OUTPUT I want to pass the 0003 and 20100402 values through a variable. How can I do this? Any help is much... (1 Reply)
Discussion started by: gpaulose
1 Replies

10. Shell Programming and Scripting

filter parts of a big file using awk or sed script

I need an assistance in file generation using awk, sed or anything... I have a big file that i need to filter desired parts only. The objective is to select (and print) the report # having the string "apple" on 2 consecutive lines in every report. Please note that the "apple" line has a HEX... (1 Reply)
Discussion started by: apalex
1 Replies
Login or Register to Ask a Question