Selecting highest value within a range | Unix Linux Forums | UNIX for Dummies Questions & Answers

  Unix/Linux Go Back    


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Selecting highest value within a range

UNIX for Dummies Questions & Answers


Closed Linux or Unix Question    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 12-06-2012
markymarkg123 markymarkg123 is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 8 January 2013, 3:21 PM EST
Posts: 10
Thanks: 5
Thanked 0 Times in 0 Posts
Selecting highest value within a range

Within millions of lines of data, there are perhaps 20 "spikes" that are very narrow. I want only the highest value from each spike within a range of 10 rows. Possible?

My data looks like this, 8 columns of integers, millions of rows. There are clear spikes when you graph columns.


Code:
2.14883e+05  1.22992e+05  7.96926e+03  -1.37694e+03  3.95054e+03  -1.62924e+04  8.21638e+03  1.01061e+04  
2.14357e+05  1.22730e+05  8.20939e+03  -1.54033e+03  4.28164e+03  -1.61322e+04  7.97054e+03  1.01922e+04  
2.13361e+05  1.22889e+05  8.05019e+03  -1.18045e+03  4.02582e+03  -1.61925e+04  7.99161e+03  1.02380e+04  
2.68777e+05  1.17178e+05  5.12913e+04  -1.40305e+04  2.95355e+04  -2.65120e+04  2.14739e+04  9.34042e+03  
4.09316e+05  1.80414e+05  5.32998e+04  -1.06297e+04  3.04299e+04  -2.75763e+04  2.09896e+04  8.12131e+03  
4.94380e+05  2.46756e+05  1.36658e+04  6.56373e+03  6.79386e+03  -1.70254e+04  7.70163e+03  9.14013e+03  
4.92551e+05  2.48154e+05  1.39390e+04  6.94251e+03  6.73128e+03  -1.65537e+04  7.33397e+03  9.21148e+03  
4.91403e+05  2.48659e+05  1.49110e+04  6.85990e+03  7.53969e+03  -1.67406e+04  7.14302e+03  9.71328e+03  
4.89192e+05  2.52866e+05  1.39238e+04  8.07381e+03  7.38389e+03  -1.64431e+04  6.44513e+03  1.00028e+04  
4.90260e+05  2.47635e+05  1.45364e+04  6.72570e+03  7.24393e+03  -1.69678e+04  6.90001e+03  1.01961e+04

Currently, my code will give me all the consecutive-row increases in column 1 over a specified value of 5000:


Code:
awk '(NR>1) && (d=$1-x)>=5000 {print "increase of" " " d, "at Line" " " 'NR' " "}{x=$1}' test | sort -n

The output looks like this:
increase of 140539 at Line 5
increase of 55416 at Line 4
increase of 85064 at Line 6

However, I want the output to look like this, filtering out the smaller increases within that data range:
increase of 140539 at Line 5
increase of 5662 at Line 48924
increase of 10334 at Line 589332
...
Sponsored Links
    #2  
Old Unix and Linux 12-06-2012
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 30 March 2015, 11:17 AM EDT
Location: Saskatchewan
Posts: 20,390
Thanks: 873
Thanked 3,577 Times in 3,342 Posts
I don't get it. Your one with the 'smaller increases removed' shows smaller increases.
Sponsored Links
    #3  
Old Unix and Linux 12-06-2012
markymarkg123 markymarkg123 is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 8 January 2013, 3:21 PM EST
Posts: 10
Thanks: 5
Thanked 0 Times in 0 Posts
I don't want to know every time the increase is over 5000. I want to know, of the points that exceed 5000, which is the highest within a range of 10 rows. Does that explain better?

I only provided 10 lines of data. Within that, there are 3 points within those 10 lines that the increase is over the 5000 threshold. I want to get rid of the other two at Line 4 and Line 6 because they don't indicate new spikes, they are duplicates of the same spike. I wrote in two other spikes over 5000 from later on in the data as examples of the desired outcome.
    #4  
Old Unix and Linux 12-06-2012
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 30 March 2015, 11:17 AM EDT
Location: Saskatchewan
Posts: 20,390
Thanks: 873
Thanked 3,577 Times in 3,342 Posts
If your output has nothing to do with the input shown, I can't guess much from it. Unix or Linux Image With your explanation I think I get it now. If you see a spike, you want to print the line, then suppress any output for the next 10 lines.


Code:
awk '(NR>1) && (($1-x)>=5000) && (L<=0) {
        printf("Increase of %f at line %d\n", $1-x, NR);
        L=LEN;   x=$1;   next
} { x=$1; L-- }' LEN=10 datafile

The Following User Says Thank You to Corona688 For This Useful Post:
markymarkg123 (12-06-2012)
Sponsored Links
    #5  
Old Unix and Linux 12-06-2012
markymarkg123 markymarkg123 is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 8 January 2013, 3:21 PM EST
Posts: 10
Thanks: 5
Thanked 0 Times in 0 Posts
Good start! Thank you. Your code is cool, but it does not select the highest peak (line 5). It picks line 4 because it is the first to exceed 5000. Can it be written to pick line 5 (the biggest) and then to eliminate the line before 5 (line 4) and the line after 5 (line 6)?
Sponsored Links
    #6  
Old Unix and Linux 12-06-2012
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 30 March 2015, 11:17 AM EDT
Location: Saskatchewan
Posts: 20,390
Thanks: 873
Thanked 3,577 Times in 3,342 Posts
The requirement to look ahead makes things a lot more complicated.


Code:
awk 'BEGIN           {       MAX=5000; getline; x=$1 }
((d=$1-x)>=MAX) {       MAX=d;  MAXNR=NR; L=10; }
((--L) == 0)    {
                        printf("Increase of %f at line %d\n", MAX, MAXNR);
                        MAX=5000
                        MAXNR=0
                }
{ x=$1 }

END {
        if(MAXNR) printf("Increase of %f at line %d\n", MAX, MAXNR);
}' datafile

It will count 10 lines from the highest peak.
The Following User Says Thank You to Corona688 For This Useful Post:
markymarkg123 (12-06-2012)
Sponsored Links
    #7  
Old Unix and Linux 12-06-2012
markymarkg123 markymarkg123 is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 8 January 2013, 3:21 PM EST
Posts: 10
Thanks: 5
Thanked 0 Times in 0 Posts
Thanks, brother, much obliged!
Sponsored Links
Closed Linux or Unix Question

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Unix or Linux Image More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Selecting lowest and highest values in columns 1 and 2, based on subsets in column 3 hubleo Shell Programming and Scripting 2 04-25-2012 07:59 AM
Selecting a range of Lines beinthemiddle Shell Programming and Scripting 6 07-14-2010 09:14 AM
print range between two patterns if it contains a pattern within the range joyan321 Shell Programming and Scripting 2 06-18-2009 05:27 PM
Selecting files between a user inputed date range kelldan Shell Programming and Scripting 6 07-28-2008 03:36 PM
sort with highest wc ymf UNIX for Dummies Questions & Answers 1 03-27-2008 02:58 AM



All times are GMT -4. The time now is 12:15 PM.