awk to print line is values between two fields in separate file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to print line is values between two fields in separate file
# 1  
Old 12-14-2016
awk to print line is values between two fields in separate file

I am trying to use awk to find all the $3 values in file2 that are between $2 and $3 in file1. If a value in $3 of file2 is between the file1 fields then it is printed along with the $6 value in file1. Both file1 and file2 are tab-delimited as well as the desired output. If there is nothing to print then the next line is processed. The awk below currently just prints all of file1, no matter if the values are found. Thank you Smilie.

file1 tab-delimited
Code:
chr1	948953	948956	chr1:948953-948956	.	ISG15
chr1	949363	949858	chr1:949363-949858	.	ISG15
chr1	955542	955763	chr1:955542-955763	.	AGRN
chr1	957570	957852	chr1:957570-957852	.	AGRN
chr1	976034	976270	chr1:976034-976270	.	AGRN

file2 tab-delimited
Code:
rs13303106	1	891945	GG
rs28415373	1	893981	CC
rs13303010	1	894573	AA
rs6696281	1	903104	CC
rs28391282	1	904165	GG
rs4511111	1	949375	GG
rs6657048	1	957640	CC
rs2710888	1	959842	CT
rs3128126	1	962210	AG
rs2710875	1	977780	CT

desired output tab-delimited
Code:
rs4511111	1	949375	GG     ISG15

awk
Code:
awk -F'\t' -v OFS='\t' '                   
    NR == FNR {min[$1]=$2; max[$1]=$3; Gene[$6]=$NF; next}
    {                
        for (id in min) 
            if (min[id] < $3 && $3 < max[id]) {
                print $0, id, Gene[id]
                break              
            }
    }                                     
' file1 file2

# 2  
Old 12-14-2016
Hello cmccabe,

I am not sure completely about your requirement but could you please try following and let me know if this helps.
Code:
awk 'FNR==NR{A[$3]=$0;next} {for(i in A){if(i>$2 && i<$3){print A[i] FS $NF}}}'  Input_file2   Input_file1

Output will be as follows.
Code:
rs4511111       1       949375  GG ISG15
rs6657048       1       957640  CC AGRN

If you have any other requirements then please do let us know with more details.

NOTE: So above code will search each line of Input_file1 with each line of Input_file2.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 12-14-2016
It looks to me like your program can be fixed with just small changes to one line. I get the same output as R. Singh

change
Code:
NR == FNR {min[$1]=$2; max[$1]=$3; Gene[$6]=$NF; next}

to
Code:
NR == FNR {min[NR]=$2; max[NR]=$3; Gene[NR]=$NF; next}

Now each record in the filter file has a separate entry in the min, max, and Gene arrays.
This User Gave Thanks to ronaldxs For This Post:
# 4  
Old 12-14-2016
Both commands run great, my actual dataset is ~960,000 lines or 26 MB. Is there a more efficient way to search this file? The two file formats are as posted, they are just quite large. Thank you Smilie.
# 5  
Old 12-14-2016
I will post code for a more robust and sophisticated solution that should avoid slow down with size below. If you can depend on file1, the filter range file, being sorted with no overlaps than you might be able to adjust the awk program to only look at filter records near the file2, field 3, key. Another simple solution may be to import the data into a relational database and query with SQL. If you have access to Perl and CPAN and can install the Perl module Tree::Range::RB then ...
Code:
#!/bin/bash

perl -Mstrict -MTree::Range::RB -wane'
    our $rat;
    BEGIN {
        $rat = Tree::Range::RB->new({ "cmp" => sub { $_[0] <=> $_[1] }});
    }
    if (@ARGV) { # first - filter file
        $rat->range_set($F[1], $F[2], $F[5])
    }
    else { # second file
        if (my $v = $rat->get_range($F[2])) {
            chomp;
            print "$_\t$v\n";
        }
    }
' file1 file2

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Print line if values in fields matches number and text

datafile: 2017-03-24 10:26:22.098566|5|'No Route for Sndr:RETEK RMS 00040 /ZZ Appl:PF Func:PD Txn:832 Group Cntr:None ISA CntlNr:None Ver:003050 '|'2'|'PFI'|'-'|'EAI_ED_DeleteAll'|'EAI_ED'|NULL|NULL|NULL|139050594|ActivityLog| 2017-03-27 02:50:02.028706|5|'No Route for... (7 Replies)
Discussion started by: SkySmart
7 Replies

2. Shell Programming and Scripting

awk to combine all matching fields in input but only print line with largest value in specific field

In the below I am trying to use awk to match all the $13 values in input, which is tab-delimited, that are in $1 of gene which is just a single column of text. However only the line with the greatest $9 value in input needs to be printed. So in the example below all the MECP2 and LTBP1... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. UNIX for Beginners Questions & Answers

Output to file print as single line, not separate line

example of problem: when I echo "$e" >> /home/cogiz/file.txt result prints to file as:AA BB CC I need it to save to file as this:AA BB CC I know it's probably something really simple but any help would be greatly appreciated. Thank You. Cogiz (7 Replies)
Discussion started by: cogiz
7 Replies

4. Shell Programming and Scripting

awk print even fields of file

Hello: I want to print out the even number of fields plus the first column as row identifiers. input.txt ID X1 ID X2 ID X3 ID X4 A 700 A 1200 A 400 A 1300 B 2000 B 1000 B 2000 B 600 C 1400 C 200 C 1000 C 1200 D 1300 D 500 D 600 D 200and the output is: output.txt ID X1 X2 X3... (3 Replies)
Discussion started by: yifangt
3 Replies

5. Programming

Read text from file and print each character in separate line

performing this code to read from file and print each character in separate line works well with ASCII encoded text void preprocess_file (FILE *fp) { int cc; for (;;) { cc = getc (fp); if (cc == EOF) break; printf ("%c\n", cc); } } int main(int... (1 Reply)
Discussion started by: khaled79
1 Replies

6. UNIX for Dummies Questions & Answers

using sed delete a line from csv file based on specific data in two separate fields

Hello, :wall: I have a 12 column csv file. I wish to delete the entire line if column 7 = hello and column 12 = goodbye. I have tried everything that I can find in all of my ref books. I know this does not work /^*,*,*,*,*,*,"hello",*,*,*,*,"goodbye"/d Any ideas? Thanks Please... (2 Replies)
Discussion started by: Chris Eagleson
2 Replies

7. Shell Programming and Scripting

awk: Print fields between two delimiters on separate lines and send to variables

I have email headers that look like the following. In the end I would like to accomplish sending each email address to its own variable, such as: user1@domain.com='user1@domain.com' user2@domain.com='user2@domain.com' user3@domain.com='user3@domain.com' etc... I know the sed to get rid of... (11 Replies)
Discussion started by: tay9000
11 Replies

8. Shell Programming and Scripting

awk print header as text from separate file with getline

I would like to print the output beginning with a header from a seperate file like this: awk 'BEGIN{FS="_";print ((getline < "header.txt")>0)} { if (! ($0 ~ /EL/ ) print }" input.txtWhat am i doing wrong? (4 Replies)
Discussion started by: sdf
4 Replies

9. Shell Programming and Scripting

awk/sed script to print each line to a separate named file

I have a large 3479 line .csv file, the content of which looks likes this: 1;0;177;170;Guadeloupe;x 2;127;171;179;Antigua and Barbuda;x 3;170;144;2;Umpqua;x 4;170;126;162;Coos Bay;x ... 1205;46;2;244;Unmak Island;x 1206;47;2;248;Yunaska Island;x 1207;0;2;240;north sea;x... (5 Replies)
Discussion started by: kalelovil
5 Replies

10. Shell Programming and Scripting

extract nth line of all files and print in output file on separate lines.

Hello UNIX experts, I have 124 text files in a directory. I want to extract the 45678th line of all the files sequentialy by file names. The extracted lines should be printed in the output file on seperate lines. e.g. The input Files are one.txt, two.txt, three.txt, four.txt The cat of four... (1 Reply)
Discussion started by: yogeshkumkar
1 Replies
Login or Register to Ask a Question