Sponsored Content
Top Forums Shell Programming and Scripting awk to print text in field if match and range is met Post 303013137 by cmccabe on Thursday 15th of February 2018 01:37:00 PM
Old 02-15-2018
Quote:
$7 = print "overlap" else "missing"
what is that supposed to mean/do?
Is meant to print either overlap or missing in $7 depending on if the condition's are met.

If I understand the comments correctly:

Code:
awk '
BEGIN { FS=OFS="\t" }                               # define FS and OFS as tab
              NR==FNR{                              # process same line in file1 and file2
           {
       A[$1]=$4;next}                               # store $4 value from file1 into A
       {min[NR]=$2; max[NR]=$3; chr[NR]=$1; next}   # store $1,$2,$3 values into seperate arrays
{
  split($4,array,"_")                               # split $4 in file2 by the _
}
                  {                
     for (id in min)
           if($1 in A) ~ array[5])) && ($1==chr[NR]) && (min[id] <= $2 && $3 < max[id])) {  # match $4 in A with array split and check for overlap using min max and chr from file1
                  {
                    print "overlap", $7 else print "missing", $7  # print value in $7 of file 2
                  }                                          
  }
}' file1 file2

I updated the awk (specifically the print statement) but am getting syntax errors.

Thank you very much Smilie.

Last edited by cmccabe; 02-15-2018 at 02:54 PM.. Reason: updated awk, corrected typo
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep range of lines to print a line number on match

Hi Guru's, I am trying to grep a range of line numbers (based on match) and then look for another match which starts with a special character '$' and print the line number. I have the below code but it is actually printing the line number counting starting from the first line of the range i am... (15 Replies)
Discussion started by: Kevin Tivoli
15 Replies

2. Shell Programming and Scripting

Match text in a range and copy value

In the files attached, I am trying to: if Files.txt $1 is in the range of Exons.txt $1, then in Files.txt $4 the value from Exons.txt $3 is copied else if no match is found Exons.txt $3 = "Intron" For example, the first value in File.txt $1 is chr1:14895-14944 and is not found in any range... (4 Replies)
Discussion started by: cmccabe
4 Replies

3. Shell Programming and Scripting

Print specific field when condition met

Hi All, Seeking for your assistance to print all the specific field when the condition met. Ex: file1.txt 1|203|3|31243|5341|6452|623|22|00|01 3|45345|123214|6534|3423|6565|643|343|232|10 if field 1 = 1 and field 3 = 3 and field 5 = 5341 and field 6 = 6452 it will print from $1 to $10.... (2 Replies)
Discussion started by: znesotomayor
2 Replies

4. Shell Programming and Scripting

Command/script to match a field and print the next field of each line in a file.

Hello, I have a text file in the below format: Source Destination State Lag Status CQA02W2K12pl:D:\CAQA ... (10 Replies)
Discussion started by: pocodot
10 Replies

5. Shell Programming and Scripting

awk to print unique text in field

I am trying to use awk to print the unique entries in $2 So in the example below there are 3 lines but 2 of the lines match in $2 so only one is used in the output. File.txt chr17:29667512-29667673 NF1:exon.1;NF1:exon.2;NF1:exon.38;NF1:exon.4;NF1:exon.46;NF1:exon.47 703.807... (5 Replies)
Discussion started by: cmccabe
5 Replies

6. Shell Programming and Scripting

awk to print unique text in field before hyphen

Trying to print the unique values in $2 before the -, currently the count is displayed. Hopefully, the below is close. Thank you :). file chr2:46603668-46603902 EPAS1-902|gc=54.3 253.1 chr2:211471445-211471675 CPS1-1205|gc=48.3 264.7 chr19:15291762-15291983 NOTCH3-1003|gc=68.8 195.8... (3 Replies)
Discussion started by: cmccabe
3 Replies

7. Shell Programming and Scripting

awk to remove field and match strings to add text

In file1 field $18 is removed.... column header is "Otherinfo", then each line in file1 is used to search file2 for a match. When a match is found the last four strings in file2 are copied to file1. Maybe: cut -f1-17 file1 and then match each line to file2 file1 Chr Start End ... (6 Replies)
Discussion started by: cmccabe
6 Replies

8. Shell Programming and Scripting

awk to search field2 in file2 using range of fields file1 and using match to another field in file1

I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited. I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
Discussion started by: cmccabe
6 Replies

9. UNIX for Beginners Questions & Answers

awk - print when condition is met

I have a file.txt containing the following: Query= HWI-ST863:386:C5Y8UACXX:3:2302:16454:89688 1:N:0:ACACGAAT Length=100 Score E Sequences producing significant alignments: (Bits) Value ... (2 Replies)
Discussion started by: tons92
2 Replies

10. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

In the awk below I am trying to print the entire line, along with the header row, if $2 is SNV or MNV or INDEL. If that condition is met or is true, and $3 is less than or equal to 0.05, then in $7 the sub pattern :GMAF= is found and the value after the = sign is checked. If that value is less than... (0 Replies)
Discussion started by: cmccabe
0 Replies
comm(1) 							   User Commands							   comm(1)

NAME
comm - select or reject lines common to two files SYNOPSIS
comm [-123] file1 file2 DESCRIPTION
The comm utility reads file1 and file2, which must be ordered in the current collating sequence, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files. If the input files were ordered according to the collating sequence of the current locale, the lines written will be in the collating sequence of the original lines. If not, the results are unspecified. OPTIONS
The following options are supported: -1 Suppresses the output column of lines unique to file1. -2 Suppresses the output column of lines unique to file2. -3 Suppresses the output column of lines duplicated in file1 and file2. OPERANDS
The following operands are supported: file1 A path name of the first file to be compared. If file1 is -, the standard input is used. file2 A path name of the second file to be compared. If file2 is -, the standard input is used. USAGE
See largefile(5) for the description of the behavior of comm when encountering files greater than or equal to 2 Gbyte ( 2**31 bytes). EXAMPLES
Example 1: Printing a list of utilities specified by files If file1, file2, and file3 each contain a sorted list of utilities, the command example% comm -23 file1 file2 | comm -23 - file3 prints a list of utilities in file1 not specified by either of the other files. The entry: example% comm -12 file1 file2 | comm -12 - file3 prints a list of utilities specified by all three files. And the entry: example% comm -12 file2 file3 | comm -23 -file1 prints a list of utilities specified by both file2 and file3, but not specified in file1. ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of comm: LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH. EXIT STATUS
The following exit values are returned: 0 All input files were successfully output as specified. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ |Interface Stability |Standard | +-----------------------------+-----------------------------+ SEE ALSO
cmp(1), diff(1), sort(1), uniq(1), attributes(5), environ(5), largefile(5), standards(5) SunOS 5.10 3 Mar 2004 comm(1)
All times are GMT -4. The time now is 03:52 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy