I need your kind help to get min and max values from file based on value in $5 .
File1
I did the following codes:-
But the results shows all lines containing "CD" patterns like below:
The real output that i want will only show min and max value if "CD" pattern is found, and it should be based on value in $5. If "+", then the value in $3 for the first "CD" found and value in $4 for the last "CD" found for each ID2 ($6) will be printed in $3 and $4 of output file respectively. If "-", then the value in $4 for the first "CD" found and value in $3 for the last "CD" found for each ID2($6) will be printed in $4 and $3 respectively like below:-
If there is only 1 CD for any ID2 ($7), the line will also be omitted. Would appreciate if you can help me on this. thanks
Last edited by redse171; 08-03-2014 at 06:20 PM..
Reason: for better sample and description
I don't understand your selection of the left value for the "+" sign not the right value for the "-" sign. With this code
i get the result
which does not match your requirement for above mentioned values...
Thanks a lot for your quick response.
I am not really clear about your question above but, I am extracting info for gene features and that's how to find out the region for the coding sequence.
i tried your code but it did not give accurate results on my real data. I tried to change and play around with your code but still the result is not correct. below is the sample result that i got:-
If u don't mind, can you explain about your codes? The above data is just a sample. for $1, i have many different values, not only SP12.3. So, i changed "print "SP12.3"" to print "$1". But the output is still wrong. Thanks
I need your kind help to get min and max values from file based on value in $5 .
File1
I did the following codes:-
But the results still shows all lines containing "CD" pattern. The real output that i want will only show min and max value based on $5 ((blue color for "+" and red color for "-") as below. :-
If there is only 1 CD for any ID2 ($7), the line will also be omitted. Would appreciate if you can help me on this. thanks
It is no wonder that the results you are getting are not what you want. Your description of how to process the input is so vague that we do not understand what you want.
The code you showed us prints parts of every line with "CD" in the 2nd field. For those lines, it throws away fields 2, 8, and 9; and, if $5 is "+", it swaps fields 3 and 4 before printing the remainder of the line. But, the output you say you want shows every field (keeping fields 2, 8, and 9). And if fields 3 and 4 have been swapped, it isn't obvious to me.
You mentioned ID2 ($7), but it looks like you are looking for the minimum $3 value and the maximum $4 value for each different value in field 9 (not field 7). And from the data shown, I don't see that the + or - in field 5 makes any difference at all.
You have shown us data where fields 1, 6, and 8 are all constants. You have said that $1 may change, but you haven't given any indication of how, or if, that should affect the output produced.
Please give us a clear English description of what you are trying to do and explain what the meaning is for each of the fields in your file.
Also, lots of gene data that we're asked to help with has huge files to process. If that is the case here as well, any details you can give us about the data may help speed up the process considerably. For example, what you have shown us could be sorted with field 1, 5, or 9 as a primary sort key. If data is to be grouped using field 9 as a key and the input is sorted on field 9, we can produce any needed output every time the contents of field 9 changes (as opposed to accumulating all of the input into memory and processing everything at the end).
We also need to know up front whether or not it is important that the output be in the same order as the input.
And, finally: just saying that the code you were given did't give you accurate results is useless information. Show us the output you got, the output you wanted, and explain why (based on your description of what you wanted) the output you got was wrong! Help us help you!
These 3 Users Gave Thanks to Don Cragun For This Post:
Thank u for your comments. Forgive me for the vague description. I just edited my question and sample above. I tried my best to explain my issue. My data is long and huge and has different conditions and i tried my best to make it simple for the sample. but it seems that it created more confusion. my mistake. thanks
I am trying to get a simple min/max script to work with the below input. Note the special character (">") within it.
Script
awk 'BEGIN{max=0}{if(($1)>max) max=($1)}END {print max}'
awk 'BEGIN{min=0}{if(($2)<min) min=($2)}END {print min}'
Input
-122.2840 42.0009
-119.9950 ... (7 Replies)
I need to find the max/min of columns 1 and 2 of a 2 column file what contains the special character ">".
I know that this will find the max value of column 1.
awk 'BEGIN {max = 0} {if ($1>max) max=$1} END {print max}' input.file
But what if I needed to ignore special characters in the... (3 Replies)
aaa: 3 ms
aaa: 2 ms
aaa: 5 ms
aaa: 10 ms
..........
to get the 3 2 5 10 ...'s min avg and max
something like
min: 2 ms avg: 5 ms max: 10 ms (2 Replies)
Hi,
I have a file which looks like this:
FID IID MISS_PHENO N_MISS N_GENO F_MISS
12AB43131 12AB43131 N 17774 906341 0.01961
65HJ87451 65HJ87451 N 10149 906341 0.0112
43JJ21345 43JJ21345 N 2826 906341 0.003118I would... (11 Replies)
Hi guys,
I already search on the forum but i can't solve this on my own.
I have a lot of files like this:
And i need to print the line with the maximum value in last column but if the value is the same (2 in this exemple for the 3 last lines) i need get the line with the minimum value in... (4 Replies)
Hi guys!
I'm new to scripting and I need to write a script in awk.
Here is example of file on which I'm working
ATOM 4688 HG1 PRO A 322 18.080 59.680 137.020 1.00 0.00
ATOM 4689 HG2 PRO A 322 18.850 61.220 137.010 1.00 0.00
ATOM 4690 CD ... (18 Replies)
hi, i have an awk script and I managed to figure out how to search the max value but Im having difficulty in searching for the min field value.
BEGIN {FS=","; max=0}
NF == 7 {if (max < $6) max = $6;}
END { print man, min}
where $6 is the column of a field separated by a comma (3 Replies)
Hello every one, I have following data
***CAMPAIGN 1998 CONTRIBUTIONS***
---------------------------------------------------------------------------
NAME PHONE Jan | Feb | Mar | Total Donated
... (12 Replies)