The awk works great, do you mind explaining it a bit, why do you need two inputs? Thanks again .
Hello cmccabe,
Following explanation may help you in same.
Code:
awk 'FNR==NR ####FNR==NR, FNR and NR are the built in variables in awk, where they represent the number of lines while reading any Input_file, so difference between FNR and NR is, FNR's value will be RESET(whenever we are reading mutiple Input_files) when it starts reading the next file.
####On the other hand NR's value will keep on increasing till all the files are read. So in this condition it will be TURE only when 1st file is being read as when second Input_file will be reading then NR's value will be reater than FNR's value as per above explanation.
{S[$4]++; ####Creating an array named S whose index is field 4 and by doing ++ to this array so that it could the occurances of that particular index's value, like how many times a same value of a 4th field occured.
next} ####Next, is a built in variable of awk again and it tells awk not to go further and it skips all upcoming statements. We are using it because we don't want to execute all further statements as we need to read Input_file completly 1st time and have the array S's values till the file's completion.
($4 in S) ####Now this statement will be executed when 2nd Input_file is being read. Here we are checking like which ever 4th field of line is present in array S. If it is present(which it should be) then following statements will be executed.
{if(S[$4]>1) ####Now we are checking here if value(occurance of that 4th field in the Input_file) of array S's whose index is current 4th field of the line being read is greater than 1 or not, if it is greater than 1, it means condition is TRUE then following statements will be executed.
{print $1 OFS $2 OFS $2+S[$4] OFS $5;} ####Now printing $1, $2 and $3 as value of $2+s[$4](count of 4th field adding to 2nd field, as per your requirements), $5, off course OFS is a built in awk variable which stands for Output field seprator and it's default value is space.
else ####If if condition which we checked above(where array S's whose index is 4th field is NOT greater than 1 then following statements will be executed.
{if($6==1) ####Now again here I am checking if 6th's field value is 1 or not, if it is 1 then following statements will be executed otherwise it will go to else.
{print $1 OFS $2 OFS $2 OFS $5} ####Printing $1, $2, $2, $5's values as per your requirements, again OFS value is space by default here.
else ####It will come to this statement if above if condition was NOT TRUE.
{print $1 OFS $2+$6 OFS $2+$6 OFS $5}}; ####Printing the values of $1, $2+$6(sum of 2nd and 6th field), $2+$6(sum of 2nd and 6th field), $5 as per your requirements.
delete S[$4]} ####deleting the value of array S's whose index is 4th field so that duplicate values shouldn't come.
' Input_file Input_file ####mentioning Input_files here.
Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
Trying to sum field #6 when field #2 matches string as follows:
Input data:
2010-09-18-20.24.44.206117 UOWEXEC db2bp DB2XYZ hostname 1
2010-09-18-20.24.44.206117 UOWWAIT db2bp DB2XYZ hostname ... (3 Replies)
Hi experts,
I need to print the first field first then last two fields should come next and then i need to print rest of the fields.
Input :
a1,abc,jsd,fhf,fkk,b1,b2
a2,acb,dfg,ghj,b3,c4
a3,djf,wdjg,fkg,dff,ggk,d4,d5
Expected output:
a1,b1,b2,abc,jsd,fhf,fkk... (6 Replies)
Trying to use awk to find a keyword and return the matches in the row, but also $1 and $2, which are the unique id's, but they only appear once. Thank you :).
file
name 31 Index Chromosomal Position Gene Inheritance
122 2106725 TSC2 AD
124 2115481 TSC2 AD
121 2105400 TSC2 AD... (6 Replies)
In the below awk I am trying to combine all matching $4 into a single $5 (up to the -), and count the lines in $6 and average all values in $7. The awk is close but it seems to only be using the last line in the file and skipping all others. The posted input is a sample of the file that is over... (3 Replies)
In the below I am trying to use awk to match all the $13 values in input, which is tab-delimited,
that are in $1 of gene which is just a single column of text.
However only the line with the greatest $9 value in input needs to be printed.
So in the example below all the MECP2 and LTBP1... (0 Replies)
I am trying to output a tab-delimited result that uses the data from a tab-delimited file to combine and subtract specific lines.
If $4 matches in each line then the first matching sequential $6 value is added to $2, unless the value is 1, then the original $2 is used (like in the case of line... (3 Replies)
I am trying to use awk to format the file below, which is tab-delimited. The desired out is space delimited and is in the order of
$9 $13 $2 $10-$11.$10 and $11 are often times multiple values separated by a comma, so the value in $10 is combined with the first value from
$11 using the comma.... (5 Replies)
The below awk executes as is and produces the current output. It isvery close but what Ican not seem to do is add the -exon..., the ... portion comes from $1 and the _exon is static and will never change. If there is + sign in $4 then the ... is in acending order or sequential. If there is a - in... (2 Replies)
I have a text file with many thousands of lines, a small sample of which looks like this:
InputFile:PS002,003 D -1 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 6 6 -1 -1 -1 -1 0 509 0
PS002,003 PSQ 0 1 7 18 1 0 -1 1 1 3 -1 -1 ... (5 Replies)
In the file below I am trying to count the given repeats of A,T,C,G in each string of letters. Each sequence is below the > and it is possible for a string of repeats to wrap from the line above. For example, in the first line the last letter is a T and the next lines has 3 more. I think the below... (10 Replies)