How to get min and max values using awk?

Login or Register to Ask a Question and Join Our Community

How to get min and max values using awk?

Tags

awk, shell scripts

Login to Discuss or Reply to this Discussion in Our Community

Top Forums Shell Programming and Scripting How to get min and max values using awk?

08-02-2014

Registered User

149, 1

Join Date: Dec 2010

Last Activity: 9 June 2015, 10:16 AM EDT

Posts: 149

Thanks Given: 100

Thanked 1 Time in 1 Post

How to get min and max values using awk?

Hi,

I need your kind help to get min and max values from file based on value in $5 .

File1

Code:

SP12.3	stc	2240806	2240808	+	ID1_N003	 ID2_N003T0
SP12.3	sto	2241682	2241684	+	ID1_N003	 ID2_N003T0
SP12.3	XE	2239943	2240011	+	ID1_N003	 ID2_N003T0
SP12.3	XE	2240077	2241254	+	ID1_N003	 ID2_N003T0
SP12.3	CD	2240806	2241254	+	ID1_N003	 ID2_N003T0
SP12.3	XE	2241471	2241684	+	ID1_N003	 ID2_N003T0
SP12.3	CD	2241471	2241681	+	ID1_N003	 ID2_N003T0
SP12.3	stc	2245127	2245129	+	ID1_N005	 ID2_N005T0
SP12.3	sto	2246954	2246956	+	ID1_N005	 ID2_N005T0
SP12.3	XE	2244762	2247195	+	ID1_N005	 ID2_N005T0
SP12.3	CD	2245127	2246953	+	ID1_N005	 ID2_N005T0
SP12.3	stc	2253115	2253117	-	ID1_N006	 ID2_N006T0
SP12.3	sto	2249759	2249761	-	ID1_N006	 ID2_N006T0
SP12.3	XE	2253090	2254054	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2253090	2253117	-	ID1_N006	 ID2_N006T0
SP12.3	XE	2252492	2252908	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2252492	2252908	-	ID1_N006	 ID2_N006T0
SP12.3	XE	2251730	2251882	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2251730	2251882	-	ID1_N006	 ID2_N006T0
SP12.3	XE	2251591	2251664	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2251591	2251664	-	ID1_N006	 ID2_N006T0
SP12.3	XE	2249887	2251530	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2249887	2251530	-	ID1_N006	 ID2_N006T0
SP12.3	XE	2249087	2249821	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2249762	2249821	-	ID1_N006	 ID2_N006T0
SP12.3	stc	2252073	2252075	-	ID1_N006	 ID2_N006T1
SP12.3	sto	2249759	2249761	-	ID1_N006	 ID2_N006T1
SP12.3	XE	2252492	2252973	-	ID1_N006	 ID2_N006T1
SP12.3	XE	2251730	2252227	-	ID1_N006	 ID2_N006T1
SP12.3	CD	2251730	2252075	-	ID1_N006	 ID2_N006T1
SP12.3	XE	2251591	2251664	-	ID1_N006	 ID2_N006T1
SP12.3	CD	2251591	2251664	-	ID1_N006	 ID2_N006T1
SP12.3	XE	2249887	2251530	-	ID1_N006	 ID2_N006T1
SP12.3	CD	2249887	2251530	-	ID1_N006	 ID2_N006T1
SP12.3	XE	2249090	2249821	-	ID1_N006	 ID2_N006T1
SP12.3	CD	2249762	2249821	-	ID1_N006	 ID2_N006T1
SP12.5	stc	3001307	3001309	+	ID1_N01140	ID2_N01140T0
SP12.5	sto	3005026	3005028	+	ID1_N01140	ID2_N01140T0
SP12.5	XE	3000439	3001397	+	ID1_N01140	ID2_N01140T0
SP12.5	CD	3001307	3001397	+	ID1_N01140	ID2_N01140T0
SP12.5	XE	3001572	3002765	+	ID1_N01140	ID2_N01140T0
SP12.5	CD	3001572	3002765	+	ID1_N01140	ID2_N01140T0
SP12.5	XE	3002821	3004797	+	ID1_N01140	ID2_N01140T0
SP12.5	CD	3002821	3004797	+	ID1_N01140	ID2_N01140T0
SP12.5	XE	3004855	3004929	+	ID1_N01140	ID2_N01140T0
SP12.5	CD	3004855	3004929	+	ID1_N01140	ID2_N01140T0
SP12.5	XE	3004994	3005417	+	ID1_N01140	ID2_N01140T0
SP12.5	CD	3004994	3005025	+	ID1_N01140	ID2_N01140T0

I did the following codes:-

Code:

awk -F"\t" '$2=="CD"{if ($5~/\+/) {print $1"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7} else {print $1"\t"$4"\t"$3"\t"$5"\t"$6"\t"$7}}' file1

But the results shows all lines containing "CD" patterns like below:

Code:

SP12.3	CD	2240806	2241254	+	ID1_N003	 ID2_N003T0
SP12.3	CD	2241471	2241681	+	ID1_N003	 ID2_N003T0
SP12.3	CD	2245127	2246953	+	ID1_N005	 ID2_N005T0
SP12.3	CD	2253090	2253117	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2252492	2252908	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2251730	2251882	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2251591	2251664	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2249887	2251530	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2249762	2249821	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2251730	2252075	-	ID1_N006	 ID2_N006T1
SP12.3	CD	2251591	2251664	-	ID1_N006	 ID2_N006T1
SP12.3	CD	2249887	2251530	-	ID1_N006	 ID2_N006T1
SP12.3	CD	2249762	2249821	-	ID1_N006	 ID2_N006T1
SP12.5	CD	3001307	3001397	+	ID1_N01140	ID2_N01140T0
SP12.5	CD	3001572	3002765	+	ID1_N01140	ID2_N01140T0
SP12.5	CD	3002821	3004797	+	ID1_N01140	ID2_N01140T0
SP12.5	CD	3004855	3004929	+	ID1_N01140	ID2_N01140T0
SP12.5	CD	3004994	3005025	+	ID1_N01140	ID2_N01140T0

The real output that i want will only show min and max value if "CD" pattern is found, and it should be based on value in $5. If "+", then the value in $3 for the first "CD" found and value in $4 for the last "CD" found for each ID2 ($6) will be printed in $3 and $4 of output file respectively. If "-", then the value in $4 for the first "CD" found and value in $3 for the last "CD" found for each ID2($6) will be printed in $4 and $3 respectively like below:-

Code:

SP12.3	CD	2240806	2241681	+	ID1_N003	 ID2_N003T0
SP12.3	CD	2249762	2253117	-	ID1_N006	 ID2_N006T0
SP12.3	CD	2249762	2252075	-	ID1_N006	 ID2_N006T1
SP12.5	CD	3001307	3005025	+	ID1_N01140	ID2_N01140T0

If there is only 1 CD for any ID2 ($7), the line will also be omitted. Would appreciate if you can help me on this. thanks

Last edited by redse171; 08-03-2014 at 06:20 PM.. Reason: for better sample and description

redse171

View Public Profile for redse171

Find all posts by redse171

08-02-2014

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

I don't understand your selection of the left value for the "+" sign not the right value for the "-" sign. With this code

Code:

awk     '$2 != "CD"     {next}
         !($7 in EXT3)  {EXT3[$7]=EXT4[$7]= -1E100 * ($5"1")}
                        {CNT[$7]++;SGN[$7]=$5}
         $5 == "+"      {if ($3 > EXT3[$7]) EXT3[$7] = $3
                         if ($4 > EXT4[$7]) EXT4[$7] = $4}
         $5 == "-"      {if ($3 < EXT3[$7]) EXT3[$7] = $3
                         if ($4 < EXT4[$7]) EXT4[$7] = $4}

         END            {for (i in EXT3) if (2 <= CNT[i]) print "SP12.3", "CD", EXT3[i], EXT4[i], SGN[i], substr (i, 2, 8), i}
        ' FS="\t" OFS="\t" file

i get the result

Code:

SP12.3    CD    2249762    2249821    -    ID2 N006     ID2 N006T1
SP12.3    CD    2249762    2249821    -    ID2 N006     ID2 N006T0
SP12.3    CD    2241471    2241681    +    ID2 N003     ID2 N003T0

which does not match your requirement for above mentioned values...

RudiC

View Public Profile for RudiC

Find all posts by RudiC

08-02-2014

Registered User

149, 1

Join Date: Dec 2010

Last Activity: 9 June 2015, 10:16 AM EDT

Posts: 149

Thanks Given: 100

Thanked 1 Time in 1 Post

Hi RudiC,

Thanks a lot for your quick response.
I am not really clear about your question above but, I am extracting info for gene features and that's how to find out the region for the coding sequence.

i tried your code but it did not give accurate results on my real data. I tried to change and play around with your code but still the result is not correct. below is the sample result that i got:-

Code:

SP12.5	CD	2249762	2249821	-	ID2_N006	 ID2_N006T1
SP12.5	CD	3004994	3005025	+	D2_N0114	ID2_N01140T0
SP12.5	CD	2249762	2249821	-	ID2_N006	 ID2_N006T0
SP12.5	CD	2241471	2241681	+	ID2_N003	 ID2_N003T0

If u don't mind, can you explain about your codes? The above data is just a sample. for $1, i have many different values, not only SP12.3. So, i changed "print "SP12.3"" to print "$1". But the output is still wrong. Thanks

Last edited by redse171; 08-03-2014 at 06:23 PM..

redse171

View Public Profile for redse171

Find all posts by redse171

08-02-2014

Registered User

50, 8

Join Date: Oct 2013

Last Activity: 5 October 2015, 10:38 PM EDT

Posts: 50

Thanks Given: 1

Thanked 8 Times in 8 Posts

Code:

awk '
	$2=="CD" {
		key=$5"|"$9"|";
		($3>A[key"max"] || A[key"max"]=="")? A[key"max"]=$3:"";
		($4>A[key"max"] || A[key"max"]=="")? A[key"max"]=$4:"";
		($3<A[key"min"] || A[key"min"]=="")? A[key"min"]=$3:"";
		($4<A[key"min"] || A[key"min"]=="")? A[key"min"]=$4:"";
		!(key in line)? line[key]=$0: "";
		count[$9]++;
	}
	END {
		for(key in line) {
			split(key,s,"|");
			if(count[s[2]] > 1) {
				sub(/[0-9]+\s+[0-9]+/, A[key"min"]" "A[key"max"], line[key]);
				print line[key];
			}
		}
	}
' file

Last edited by jethrow; 08-03-2014 at 05:25 PM..

This User Gave Thanks to jethrow For This Post:

jethrow

View Public Profile for jethrow

Find all posts by jethrow

08-02-2014

Registered User

149, 1

Join Date: Dec 2010

Last Activity: 9 June 2015, 10:16 AM EDT

Posts: 149

Thanks Given: 100

Thanked 1 Time in 1 Post

Hi jethrow,

thanks so much for your response. tried your code but the result is not accurate.

Code:

SP12.5	CD	3001307	3001397	+	ID1_N01140	ID2_N01140T0
SP12.3	CD	2240806	2241254	+	ID1_N003	 ID2_N003T0
SP12.3	CD	2251730	2252075	-	ID1_N006	 ID2_N006T1
SP12.3	CD	2253090	2253117	-	ID1_N006	 ID2_N006T0

Last edited by redse171; 08-03-2014 at 06:32 PM..

redse171

View Public Profile for redse171

Find all posts by redse171

08-03-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by redse171

Hi,

I need your kind help to get min and max values from file based on value in $5 .

File1

Code:

SP12.3	stc	2240806	2240808	+	ID1 N003	 ID2 N003T0
SP12.3	sto	2241682	2241684	+	ID1 N003	 ID2 N003T0
SP12.3	XE	2239943	2240011	+	ID1 N003	 ID2 N003T0
SP12.3	XE	2240077	2241254	+	ID1 N003	 ID2 N003T0
SP12.3	CD	2240806	2241254	+	ID1 N003	 ID2 N003T0
SP12.3	XE	2241471	2241684	+	ID1 N003	 ID2 N003T0
SP12.3	CD	2241471	2241681	+	ID1 N003	 ID2 N003T0
SP12.3	stc	2245127	2245129	+	ID1 N005	 ID2 N005T0
SP12.3	sto	2246954	2246956	+	ID1 N005	 ID2 N005T0
SP12.3	XE	2244762	2247195	+	ID1 N005	 ID2 N005T0
SP12.3	CD	2245127	2246953	+	ID1 N005	 ID2 N005T0
SP12.3	stc	2253115	2253117	-	ID1 N006	 ID2 N006T0
SP12.3	sto	2249759	2249761	-	ID1 N006	 ID2 N006T0
SP12.3	XE	2253090	2254054	-	ID1 N006	 ID2 N006T0
SP12.3	CD	2253090	2253117	-	ID1 N006	 ID2 N006T0
SP12.3	XE	2252492	2252908	-	ID1 N006	 ID2 N006T0
SP12.3	CD	2252492	2252908	-	ID1 N006	 ID2 N006T0
SP12.3	XE	2251730	2251882	-	ID1 N006	 ID2 N006T0
SP12.3	CD	2251730	2251882	-	ID1 N006	 ID2 N006T0
SP12.3	XE	2251591	2251664	-	ID1 N006	 ID2 N006T0
SP12.3	CD	2251591	2251664	-	ID1 N006	 ID2 N006T0
SP12.3	XE	2249887	2251530	-	ID1 N006	 ID2 N006T0
SP12.3	CD	2249887	2251530	-	ID1 N006	 ID2 N006T0
SP12.3	XE	2249087	2249821	-	ID1 N006	 ID2 N006T0
SP12.3	CD	2249762	2249821	-	ID1 N006	 ID2 N006T0
SP12.3	stc	2252073	2252075	-	ID1 N006	 ID2 N006T1
SP12.3	sto	2249759	2249761	-	ID1 N006	 ID2 N006T1
SP12.3	XE	2252492	2252973	-	ID1 N006	 ID2 N006T1
SP12.3	XE	2251730	2252227	-	ID1 N006	 ID2 N006T1
SP12.3	CD	2251730	2252075	-	ID1 N006	 ID2 N006T1
SP12.3	XE	2251591	2251664	-	ID1 N006	 ID2 N006T1
SP12.3	CD	2251591	2251664	-	ID1 N006	 ID2 N006T1
SP12.3	XE	2249887	2251530	-	ID1 N006	 ID2 N006T1
SP12.3	CD	2249887	2251530	-	ID1 N006	 ID2 N006T1
SP12.3	XE	2249090	2249821	-	ID1 N006	 ID2 N006T1
SP12.3	CD	2249762	2249821	-	ID1 N006	 ID2 N006T1

I did the following codes:-

Code:

awk -F"\t" '$2=="CD"{if ($5~/\+/) {print $1"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7} else {print $1"\t"$4"\t"$3"\t"$5"\t"$6"\t"$7}}' file1

But the results still shows all lines containing "CD" pattern. The real output that i want will only show min and max value based on $5 ((blue color for "+" and red color for "-") as below. :-

Code:

SP12.3	CD	2240806	2241681	+	ID1 N003	 ID2 N003T0
SP12.3	CD	2249762	2253117	-	ID1 N006	 ID2 N006T0
SP12.3	CD	2249762	2252075	-	ID1 N006	 ID2 N006T1

If there is only 1 CD for any ID2 ($7), the line will also be omitted. Would appreciate if you can help me on this. thanks

It is no wonder that the results you are getting are not what you want. Your description of how to process the input is so vague that we do not understand what you want.

The code you showed us prints parts of every line with "CD" in the 2nd field. For those lines, it throws away fields 2, 8, and 9; and, if $5 is "+", it swaps fields 3 and 4 before printing the remainder of the line. But, the output you say you want shows every field (keeping fields 2, 8, and 9). And if fields 3 and 4 have been swapped, it isn't obvious to me.

You mentioned ID2 ($7), but it looks like you are looking for the minimum $3 value and the maximum $4 value for each different value in field 9 (not field 7). And from the data shown, I don't see that the + or - in field 5 makes any difference at all.

You have shown us data where fields 1, 6, and 8 are all constants. You have said that $1 may change, but you haven't given any indication of how, or if, that should affect the output produced.

Please give us a clear English description of what you are trying to do and explain what the meaning is for each of the fields in your file.

Also, lots of gene data that we're asked to help with has huge files to process. If that is the case here as well, any details you can give us about the data may help speed up the process considerably. For example, what you have shown us could be sorted with field 1, 5, or 9 as a primary sort key. If data is to be grouped using field 9 as a key and the input is sorted on field 9, we can produce any needed output every time the contents of field 9 changes (as opposed to accumulating all of the input into memory and processing everything at the end).

We also need to know up front whether or not it is important that the output be in the same order as the input.

And, finally: just saying that the code you were given did't give you accurate results is useless information. Show us the output you got, the output you wanted, and explain why (based on your description of what you wanted) the output you got was wrong! Help us help you!

These 3 Users Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-03-2014

Registered User

149, 1

Join Date: Dec 2010

Last Activity: 9 June 2015, 10:16 AM EDT

Posts: 149

Thanks Given: 100

Thanked 1 Time in 1 Post

Hi Don Crugan,

Thank u for your comments. Forgive me for the vague description. I just edited my question and sample above. I tried my best to explain my issue. My data is long and huge and has different conditions and i tried my best to make it simple for the sample. but it seems that it created more confusion. my mistake. thanks

redse171

View Public Profile for redse171

Find all posts by redse171

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk Sort 2d histogram output from min(X,Y) to max(X,Y)

I've got Gnuplot-format 2D histogram data output which looks as follows. 6.5 -1.25 10.2804 6.5404 -1.25 10.4907 6.58081 -1.25 10.8087 6.62121 -1.25 10.4686 6.66162 -1.25 10.506 6.70202 -1.25 10.3084 6.74242 -1.25 9.68256 6.78283 -1.25 9.41229 6.82323 -1.25 9.43078 6.86364 -1.25 9.62408...

2. Shell Programming and Scripting

awk search for max and min while ignoring special character

I am trying to get a simple min/max script to work with the below input. Note the special character (">") within it. Script awk 'BEGIN{max=0}{if(($1)>max) max=($1)}END {print max}' awk 'BEGIN{min=0}{if(($2)<min) min=($2)}END {print min}' Input -122.2840 42.0009 -119.9950 ...

3. Shell Programming and Scripting

awk script to find min and max value

I need to find the max/min of columns 1 and 2 of a 2 column file what contains the special character ">". I know that this will find the max value of column 1. awk 'BEGIN {max = 0} {if ($1>max) max=$1} END {print max}' input.file But what if I needed to ignore special characters in the...

4. Shell Programming and Scripting

Get the min avg and max with awk

aaa: 3 ms aaa: 2 ms aaa: 5 ms aaa: 10 ms .......... to get the 3 2 5 10 ...'s min avg and max something like min: 2 ms avg: 5 ms max: 10 ms

5. Shell Programming and Scripting

Average, min and max in file with header, using awk

Hi, I have a file which looks like this: FID IID MISS_PHENO N_MISS N_GENO F_MISS 12AB43131 12AB43131 N 17774 906341 0.01961 65HJ87451 65HJ87451 N 10149 906341 0.0112 43JJ21345 43JJ21345 N 2826 906341 0.003118I would...

6. UNIX for Dummies Questions & Answers

[Solved] Print a line using a max and a min values of different columns

Hi guys, I already search on the forum but i can't solve this on my own. I have a lot of files like this: And i need to print the line with the maximum value in last column but if the value is the same (2 in this exemple for the 3 last lines) i need get the line with the minimum value in...

7. Shell Programming and Scripting

AWK script - extracting min and max values from selected lines

Hi guys! I'm new to scripting and I need to write a script in awk. Here is example of file on which I'm working ATOM 4688 HG1 PRO A 322 18.080 59.680 137.020 1.00 0.00 ATOM 4689 HG2 PRO A 322 18.850 61.220 137.010 1.00 0.00 ATOM 4690 CD ...

8. Shell Programming and Scripting

Find min.max value if matching columns found using AWK

Input_ File : 2 3 4 5 1 1 0 1 2 1 -1 1 2 1 3 1 3 1 4 1 6 5 6 6 6 6 6 7 6 7 6 8 5 8 6 7 Desired output : 2 3 4 5 -1 1 4 1 6 5 6 8 5 8 6 7

9. UNIX for Dummies Questions & Answers

Awk search for max and min field values

hi, i have an awk script and I managed to figure out how to search the max value but Im having difficulty in searching for the min field value. BEGIN {FS=","; max=0} NF == 7 {if (max < $6) max = $6;} END { print man, min} where $6 is the column of a field separated by a comma

10. Shell Programming and Scripting

max values amd min values

Hello every one, I have following data ***CAMPAIGN 1998 CONTRIBUTIONS*** --------------------------------------------------------------------------- NAME PHONE Jan | Feb | Mar | Total Donated ...

Login or Register to Ask a Question