Sponsored Content
Top Forums Shell Programming and Scripting awk to print text in field if match and range is met Post 303013137 by cmccabe on Thursday 15th of February 2018 01:37:00 PM
Old 02-15-2018
Quote:
$7 = print "overlap" else "missing"
what is that supposed to mean/do?
Is meant to print either overlap or missing in $7 depending on if the condition's are met.

If I understand the comments correctly:

Code:
awk '
BEGIN { FS=OFS="\t" }                               # define FS and OFS as tab
              NR==FNR{                              # process same line in file1 and file2
           {
       A[$1]=$4;next}                               # store $4 value from file1 into A
       {min[NR]=$2; max[NR]=$3; chr[NR]=$1; next}   # store $1,$2,$3 values into seperate arrays
{
  split($4,array,"_")                               # split $4 in file2 by the _
}
                  {                
     for (id in min)
           if($1 in A) ~ array[5])) && ($1==chr[NR]) && (min[id] <= $2 && $3 < max[id])) {  # match $4 in A with array split and check for overlap using min max and chr from file1
                  {
                    print "overlap", $7 else print "missing", $7  # print value in $7 of file 2
                  }                                          
  }
}' file1 file2

I updated the awk (specifically the print statement) but am getting syntax errors.

Thank you very much Smilie.

Last edited by cmccabe; 02-15-2018 at 02:54 PM.. Reason: updated awk, corrected typo
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep range of lines to print a line number on match

Hi Guru's, I am trying to grep a range of line numbers (based on match) and then look for another match which starts with a special character '$' and print the line number. I have the below code but it is actually printing the line number counting starting from the first line of the range i am... (15 Replies)
Discussion started by: Kevin Tivoli
15 Replies

2. Shell Programming and Scripting

Match text in a range and copy value

In the files attached, I am trying to: if Files.txt $1 is in the range of Exons.txt $1, then in Files.txt $4 the value from Exons.txt $3 is copied else if no match is found Exons.txt $3 = "Intron" For example, the first value in File.txt $1 is chr1:14895-14944 and is not found in any range... (4 Replies)
Discussion started by: cmccabe
4 Replies

3. Shell Programming and Scripting

Print specific field when condition met

Hi All, Seeking for your assistance to print all the specific field when the condition met. Ex: file1.txt 1|203|3|31243|5341|6452|623|22|00|01 3|45345|123214|6534|3423|6565|643|343|232|10 if field 1 = 1 and field 3 = 3 and field 5 = 5341 and field 6 = 6452 it will print from $1 to $10.... (2 Replies)
Discussion started by: znesotomayor
2 Replies

4. Shell Programming and Scripting

Command/script to match a field and print the next field of each line in a file.

Hello, I have a text file in the below format: Source Destination State Lag Status CQA02W2K12pl:D:\CAQA ... (10 Replies)
Discussion started by: pocodot
10 Replies

5. Shell Programming and Scripting

awk to print unique text in field

I am trying to use awk to print the unique entries in $2 So in the example below there are 3 lines but 2 of the lines match in $2 so only one is used in the output. File.txt chr17:29667512-29667673 NF1:exon.1;NF1:exon.2;NF1:exon.38;NF1:exon.4;NF1:exon.46;NF1:exon.47 703.807... (5 Replies)
Discussion started by: cmccabe
5 Replies

6. Shell Programming and Scripting

awk to print unique text in field before hyphen

Trying to print the unique values in $2 before the -, currently the count is displayed. Hopefully, the below is close. Thank you :). file chr2:46603668-46603902 EPAS1-902|gc=54.3 253.1 chr2:211471445-211471675 CPS1-1205|gc=48.3 264.7 chr19:15291762-15291983 NOTCH3-1003|gc=68.8 195.8... (3 Replies)
Discussion started by: cmccabe
3 Replies

7. Shell Programming and Scripting

awk to remove field and match strings to add text

In file1 field $18 is removed.... column header is "Otherinfo", then each line in file1 is used to search file2 for a match. When a match is found the last four strings in file2 are copied to file1. Maybe: cut -f1-17 file1 and then match each line to file2 file1 Chr Start End ... (6 Replies)
Discussion started by: cmccabe
6 Replies

8. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

9. UNIX for Beginners Questions & Answers

awk - print when condition is met

I have a file.txt containing the following: Query= HWI-ST863:386:C5Y8UACXX:3:2302:16454:89688 1:N:0:ACACGAAT Length=100 Score E Sequences producing significant alignments: (Bits) Value ... (2 Replies)
Discussion started by: tons92
2 Replies

10. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

In the awk below I am trying to print the entire line, along with the header row, if $2 is SNV or MNV or INDEL. If that condition is met or is true, and $3 is less than or equal to 0.05, then in $7 the sub pattern :GMAF= is found and the value after the = sign is checked. If that value is less than... (0 Replies)
Discussion started by: cmccabe
0 Replies
JOIN(1) 						      General Commands Manual							   JOIN(1)

NAME
join - relational database operator SYNOPSIS
join [-an] [-e s] [-o list] [-tc] file1 file2 DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 is `-', the standard input is used. File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in each line. There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con- sists of the common field, then the rest of the line from file1, then the rest of the line from file2. Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are dis- carded. These options are recognized: -an In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2. -e s Replace empty output fields by string s. -o list Each output line comprises the fields specified in list, each element of which has the form n.m, where n is a file number and m is a field number. -tc Use character c as a separator (tab character). Every appearance of c in a line is significant. SEE ALSO
sort(1), comm(1), awk(1). BUGS
With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort. The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous. 7th Edition April 29, 1985 JOIN(1)
All times are GMT -4. The time now is 05:27 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy