Awk- Indexing a list of numbers in file2 to print certain rows in file1


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk- Indexing a list of numbers in file2 to print certain rows in file1
# 1  
Old 10-30-2018
Awk- Indexing a list of numbers in file2 to print certain rows in file1

Hi


Does anyone know of an efficient way to index a column of data in file2 to print the coresponding row in file1 which corresponds to the data in file2 AND 30 rows preceding and after the row in file1.



For example suppose you have a list of numbers in file2 (single column) as follows:


Code:
rs25678
rs25679
rs25680
rs25681
rs25682
rs25683

file1:


Code:
2    9658    rs25681    G    G    GT1=0.20;GT2=0.65;GT3=0.75
2    4258    rs25679    A    G    GT1=0.20;GT2=0.65;GT3=0.76
2    4258    rs25680    T    T    GT1=0.20;GT2=0.65;GT3=0.77

I would like all rows in file1 corresponding to the file2 numbers indexed printed AND 30 rows before and after also printed.

Desired output:




Code:
.    .    .    .    .    .
2    9658    rs25681    G    G    GT1=0.20;GT2=0.65;GT3=0.75
.    .    .    .    .    .
.    .    .    .    .    .
2    4258    rs25679    A    G    GT1=0.20;GT2=0.65;GT3=0.76
.    .    .    .    .    .
.    .    .    .    .    .
2    4258    rs25680    T    T    GT1=0.20;GT2=0.65;GT3=0.77
.    .    .    .    .    .
.    .    .    .    .    .

Dots signify 30 rows of data preceding and after the targeted rows being printed along with targeted rows indexed in file2


Thanks...

Last edited by Geneanalyst; 10-30-2018 at 01:59 PM..
# 2  
Old 10-30-2018
something along these lines:
default 2 lines before/after
awk -f gene.awk file2.txt file1.txt
or 30 lines before/after
awk -v ba=30 -f gene.awk file2.txt file1.txt
where gene.awk is:
Code:
BEGIN {
  if(!ba) ba=2
}
FNR == NR {
   f2[$1];
   next
}
{
  f1all[FNR]=$0
  if ($3 in f2) {
    f1pat[$3]=FNR
    f1order[++order]=$3
  }
}
END {
  for (i=1;i<=order;i++)
    for(j=f1pat[f1order[i]]-ba;j<=f1pat[f1order[i]]+ba;j++)
      print f1all[j]
}

Or depending on your OS/version of grep you could do (for 2 lines before/after):
grep -A 2 -B 2 -F -f file2.txt file1.txt

Last edited by vgersh99; 10-30-2018 at 03:19 PM..
This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 10-30-2018
Quote:
Originally Posted by vgersh99
something along these lines:
default 2 lines before/after
awk -f gene.awk file2.txt file1.txt
or 30 lines before/after
awk -v ba=30 -f gene.awk file2.txt file1.txt
where gene.awk is:
Code:
BEGIN {
  if(!ba) ba=2
}
FNR == NR {
   f2[$1];
   next
}
{
  f1all[FNR]=$0
  if ($3 in f2) {
    f1pat[$3]=FNR
    f1order[++order]=$3
  }
}
END {
  for (i=1;i<=order;i++)
    for(j=f1pat[f1order[i]]-ba;j<=f1pat[f1order[i]]+ba;j++)
      print f1all[j]
}

Or depending on your OS/version of grep you could do (for 2 lines before/after):
grep -A 2 -B 2 -F -f file2.txt file1.txt

Works great! Initially it was outputting 80 million rows, but that was my bad because a ".' had made its way into the column of data in file2

Last edited by Geneanalyst; 10-31-2018 at 07:18 AM..
# 4  
Old 10-31-2018
If vgersh99's solution if matching lines are less than 30 lines apart some lines are printed multiple times (overlapping regions).

Try this modification:

Code:
BEGIN {
  if(!ba) ba=2
}
FNR == NR {
   f2[$1];
   next
}
{
  f1all[FNR]=$0
  if ($3 in f2)
     for(i=FNR-ba;i<=FNR+ba;i++) prn[i]
}
END {
  for(i=1;i<=FNR;i++)
     if(i in prn) print f1all[i]
}


Last edited by Chubler_XL; 10-31-2018 at 12:24 AM.. Reason: Fix "out by 1" errors
This User Gave Thanks to Chubler_XL For This Post:
# 5  
Old 10-31-2018
Quote:
Originally Posted by Chubler_XL
If vgersh99's solution if matching lines are less than 30 lines apart some lines are printed multiple times (overlapping regions).

Try this modification:

Code:
BEGIN {
  if(!ba) ba=2
}
FNR == NR {
   f2[$1];
   next
}
{
  f1all[FNR]=$0
  if ($3 in f2)
     for(i=FNR-ba;i<=FNR+ba;i++) prn[i]
}
END {
  for(i=1;i<=FNR;i++)
     if(i in prn) print f1all[i]
 }



Works great! Initially it was outputting 40 million rows, but that was my bad because a "." had made its way into the column of data in file2 and file1 had many rows for which $3 was a "."

Last edited by Geneanalyst; 10-31-2018 at 07:22 AM..
# 6  
Old 10-31-2018
Are you sure you have the code as posted? The solution I presented shouldn't be able to print more lines than is in file2. I suspect you don't have and END { block and the for loop is executing for every line of file2.
This User Gave Thanks to Chubler_XL For This Post:
# 7  
Old 10-31-2018
Quote:
Originally Posted by Chubler_XL
Are you sure you have the code as posted? The solution I presented shouldn't be able to print more lines than is in file2. I suspect you don't have and END { block and the for loop is executing for every line of file2.

Nothing wrong with the END block. I edited my post above to outline the problem. Sorry for the trouble.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

2. UNIX for Dummies Questions & Answers

Compare file1 and file2, print matching lines in same order as file1

I want to print only the lines in file2 that match file1, in the same order as they appear in file 1 file1 file2 desired output: I'm getting the lines to match awk 'FNR==NR {a++}; FNR!=NR && a' file1 file2 but they are in sorted order, which is not what I want: Can anyone... (4 Replies)
Discussion started by: pathunkathunk
4 Replies

3. Shell Programming and Scripting

Remove rows from file2 if it exists in file1

I have 2 file, file1 and file2. file1 has some keys and file2 has keys+some other data. I want to remove the lines from file2,if the key for that line exists in file1. file1: key1 key2 flie2: key1,moredata key2,moredata key3,moredata Required output: key3,moredata Thanks EDIT:... (6 Replies)
Discussion started by: chacko193
6 Replies

4. Shell Programming and Scripting

Print sequences from file2 based on match to, AND in same order as, file1

I have a list of IDs in file1 and a list of sequences in file2. I can print sequences from file2, but I'm asking for help in printing the sequences in the same order as the IDs appear in file1. file1: EN_comp12952_c0_seq3:367-1668 ES_comp17168_c1_seq6:1-864 EN_comp13395_c3_seq14:231-1088... (5 Replies)
Discussion started by: pathunkathunk
5 Replies

5. Shell Programming and Scripting

awk read in file1, gsub in file2, print to file3

I'm trying to use awk to do the following. I have file1 with many lines, each containing 5 fields describing an individual set. I have file2 which is a template config file with variable space holders to be replaced by the values in file1. I would like to substitute each set of values in file1 with... (6 Replies)
Discussion started by: msmehaffey
6 Replies

6. Shell Programming and Scripting

Search within file1 numbers from list in file2

Hello to all, I hope somebody could help me with this: I have this File1 (real has 5 million of lines): Number Category --------------- -------------------------------------- 8734060355 3 8734060356 ... (6 Replies)
Discussion started by: Ophiuchus
6 Replies

7. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

8. UNIX for Advanced & Expert Users

print contents of file2 for matching pattern in file1 - AWK

File1 row is same as column 2 in file 2. Also file 2 will either start with A, B or C. And 3rd column in file 2 is always F2. When column 2 of file 2 matches file1 column, print all those rows into a separate file. Here is an example. file 1: 100 103 104 108 file 2: ... (6 Replies)
Discussion started by: i.scientist
6 Replies

9. Shell Programming and Scripting

Find numbers from File1 within File2

Hi all, Please your help with this. I have 2 files, File_1-->contains a column of N numbers File_2-->contains many lines with other info and numbers from File_1 within it. I would like to get from File_2 all the lines containing within the same line each of N numbers from File_1... (4 Replies)
Discussion started by: cgkmal
4 Replies

10. Shell Programming and Scripting

replacing text in file1 with list from file2

I am trying to automate a process of searching through a set of files and replace all occurrences of a formatted text with the next item in the list of a second file. Basically i need to replace all instances of T????CLK???? with an IP address from a list in a second file. the second file is one IP... (9 Replies)
Discussion started by: dovetail
9 Replies
Login or Register to Ask a Question