I have been trying to understand your code and am just not understanding, but I am trying, I know it may seem like I am not but I assure yu that I am and will continue to do so. I added comments to each line and some questions. My understanding is not there completely but hopefully its a start. I apologize for the misleading output description, thousands of lines print, i only showed a few to keep the post short. Thank you .
Code:
#!/bin/sh
awk -v d=$# ' # does this define d as non-zero
BEGIN { FS = "[\t_]" # define FS as tab or underscore
OFS = "\t" # define output as tab delimited
}
FNR == NR { # process same line in file 2 as file1 and start processing file2 or /home/cmccabe/folder/less/all_cdsV2
m[$1, $4, ++c[$1, $4]] = $2 + 0 # split $4 and store $1 and $2in array m, what does the + 0 do?
M[$1, $4, c[$1, $4]] = $3 + 0 # split $4 and store $1 and $2in array M, what does the + 0 do?
if(d) printf("m[%s,%s,%d]=%s,M[%s,%s,%d]=%s\n", # debugging print
$1, $4, c[$1, $4], m[$1, $4, c[$1, $4]], # for m (m=min)?
$1, $4, c[$1, $4], M[$1, $4, c[$1, $4]]) # for M (M=max)?
next # process next line
}
{ if(d) printf("FNR=%d:\"%s\"\n",FNR,$0) # not sure what this doesI think it prints each line in $file?
for(i = 1; i <= c[$1, $4]; i++) { # start a loop using $4 and $1 value
#if(d) printf("m[%d]=%d,M[%d]=%d,$2=%d\n", # again not sure?
i, m[$1, $4, i], # loop through each m in /home/cmccabe/folder/less/all_cdsV2 for each $1 and $4 of $file
i, M[$1, $4, i], # loop through each M in /home/cmccabe/folder/less/all_cdsV2 for each $1 and $4 of $file
$2) # not sure what this does?
if(m[$1, $4, i] <= $2 && $2 <= M[$1, $4, i]) { # if the value of each matching m<=$2 and <=M then print
$5 = "exon" # exon in $5
break # break loop and move to next line
} else {if(m[$1, $4, i] > $2 + 0) { # if the value of each matching m>=$2 and >=M then print
if(m[$1, $4, i] - 10 <= $2 + 0) { # if the value of each matching -10and <=$2 then print
$5 = "splicing"
break # break loop and move to next line
} else {$5 = "intron" # print intron in $5
break # break loop and move to next line
}
}
}
}
if(i > c[$1, $4]) # what does this do hasn't intron already been printed?
$5 = "intron"
}
1' "$1" "$2" # parameters passed from for loop
Code:
(only a few lines of the thousands to show the desired output results from the grep)
grep -E 'splicing|intron|exon' /home/cmccabe/folder/less/00-0000_output.txt
chr7 30062272 30062492 FKBP14 splicing
chr7 30065867 30066087 FKBP14 intron
chr7 30065964 30066184 FKBP14 exon
chr7 94024268 94024488 COL1A2 intron
I have a line like this:
I want to move HTTP/1.1 200 OK to the next line and put a blank line between the two lines i.e.
How can i get it using awk?
Thanks in advance (2 Replies)
Hi All,
I was wondering if anyone knew how to dynamically change the FS in awk to accept vairiable containing a field separator. the current code is as below and does not work when i introduce the dynamic FS change :-(
validate_source_file()
{
source_file=$1
... (2 Replies)
Hi, all
I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes.
sample input:
for this line, 5 fields are supposed to be extracted, they... (8 Replies)
First, thanks for the help in previous posts... couldn't have gotten where I am now without it!
So here is what I have, I use AWK to match $1 and $2 as 1 string in file1 to $1 and $2 as 1 string in file2. Now I'm wondering if I can extend this AWK command to incorporate the following:
If $1... (4 Replies)
Hi. I'd appreciate if I can get some direction in this issue to get me going.
Datafile1:
-About 4000 records, I have to update field#4 in selected records based on a match in the key field (Field#1).
-Field #1 is the key field (servername) . # of Fields may vary
# comment
server1 bbb ccc... (2 Replies)
Hi !
input:
111|222|333|aaa|bbb|ccc
999|888|777|nnn|kkk
444|666|555|eee|ttt|ooo|ppp
With awk, I am trying to change the FS "|" to "; " only from the 4th field until the end (the number of fields vary between records).
In order to get:
111|222|333|aaa; bbb; ccc
999|888|777|nnn; kkk... (1 Reply)
Hi Experts,
i need to change delimiter from tab to ","
sample test file
cat test
A0000368 A29938511 072569352 5 Any 2 for £1.00 BUTCHERS|CAT FOOD|400G Sep 12 2012 12:00AM Jan 5 2014 11:59PM Sep 7 2012 12:00AM M 2.000 group 5
... (2 Replies)
In the below awk in the first step I default Classification NF-1 to VUS. Next, I am trying to change the value of Classification (NF) to whatever CLINSIG (NF-1) is. If there is only one condition everything works great, but if there are two conditions it does not work. Is the syntax used... (4 Replies)
In the awk below I am trying to copy the entire contents of $6 there may be multiple values seperated by a ;, to $8, if $8 is . (lines 1 and 3 are examples). If that condition $8 is not . (line2 is an example) then that line is skipped and printed as is. The awk does execute but prints the output... (3 Replies)
Discussion started by: cmccabe
3 Replies
LEARN ABOUT MINIX
join
JOIN(1) General Commands Manual JOIN(1)NAME
join - relational database operator
SYNOPSIS
join [-an] [-e s] [-o list] [-tc] file1 file2
DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 is `-', the standard
input is used.
File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in
each line.
There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con-
sists of the common field, then the rest of the line from file1, then the rest of the line from file2.
Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are dis-
carded.
These options are recognized:
-an In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2.
-e s Replace empty output fields by string s.
-o list
Each output line comprises the fields specified in list, each element of which has the form n.m, where n is a file number and m is a
field number.
-tc Use character c as a separator (tab character). Every appearance of c in a line is significant.
SEE ALSO sort(1), comm(1), awk(1).
BUGS
With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort.
The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous.
7th Edition April 29, 1985 JOIN(1)