Selecting highest value within a range | Unix Linux Forums | UNIX for Dummies Questions & Answers

  Go Back    


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Selecting highest value within a range

UNIX for Dummies Questions & Answers


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 12-06-2012
markymarkg123 markymarkg123 is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 8 January 2013, 3:21 PM EST
Posts: 10
Thanks: 5
Thanked 0 Times in 0 Posts
Selecting highest value within a range

Within millions of lines of data, there are perhaps 20 "spikes" that are very narrow. I want only the highest value from each spike within a range of 10 rows. Possible?

My data looks like this, 8 columns of integers, millions of rows. There are clear spikes when you graph columns.


Code:
2.14883e+05  1.22992e+05  7.96926e+03  -1.37694e+03  3.95054e+03  -1.62924e+04  8.21638e+03  1.01061e+04  
2.14357e+05  1.22730e+05  8.20939e+03  -1.54033e+03  4.28164e+03  -1.61322e+04  7.97054e+03  1.01922e+04  
2.13361e+05  1.22889e+05  8.05019e+03  -1.18045e+03  4.02582e+03  -1.61925e+04  7.99161e+03  1.02380e+04  
2.68777e+05  1.17178e+05  5.12913e+04  -1.40305e+04  2.95355e+04  -2.65120e+04  2.14739e+04  9.34042e+03  
4.09316e+05  1.80414e+05  5.32998e+04  -1.06297e+04  3.04299e+04  -2.75763e+04  2.09896e+04  8.12131e+03  
4.94380e+05  2.46756e+05  1.36658e+04  6.56373e+03  6.79386e+03  -1.70254e+04  7.70163e+03  9.14013e+03  
4.92551e+05  2.48154e+05  1.39390e+04  6.94251e+03  6.73128e+03  -1.65537e+04  7.33397e+03  9.21148e+03  
4.91403e+05  2.48659e+05  1.49110e+04  6.85990e+03  7.53969e+03  -1.67406e+04  7.14302e+03  9.71328e+03  
4.89192e+05  2.52866e+05  1.39238e+04  8.07381e+03  7.38389e+03  -1.64431e+04  6.44513e+03  1.00028e+04  
4.90260e+05  2.47635e+05  1.45364e+04  6.72570e+03  7.24393e+03  -1.69678e+04  6.90001e+03  1.01961e+04

Currently, my code will give me all the consecutive-row increases in column 1 over a specified value of 5000:


Code:
awk '(NR>1) && (d=$1-x)>=5000 {print "increase of" " " d, "at Line" " " 'NR' " "}{x=$1}' test | sort -n

The output looks like this:
increase of 140539 at Line 5
increase of 55416 at Line 4
increase of 85064 at Line 6

However, I want the output to look like this, filtering out the smaller increases within that data range:
increase of 140539 at Line 5
increase of 5662 at Line 48924
increase of 10334 at Line 589332
...
Sponsored Links
    #2  
Old 12-06-2012
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 17 April 2014, 6:29 PM EDT
Location: Saskatchewan
Posts: 18,531
Thanks: 681
Thanked 3,036 Times in 2,856 Posts
I don't get it. Your one with the 'smaller increases removed' shows smaller increases.
Sponsored Links
    #3  
Old 12-06-2012
markymarkg123 markymarkg123 is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 8 January 2013, 3:21 PM EST
Posts: 10
Thanks: 5
Thanked 0 Times in 0 Posts
I don't want to know every time the increase is over 5000. I want to know, of the points that exceed 5000, which is the highest within a range of 10 rows. Does that explain better?

I only provided 10 lines of data. Within that, there are 3 points within those 10 lines that the increase is over the 5000 threshold. I want to get rid of the other two at Line 4 and Line 6 because they don't indicate new spikes, they are duplicates of the same spike. I wrote in two other spikes over 5000 from later on in the data as examples of the desired outcome.
    #4  
Old 12-06-2012
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 17 April 2014, 6:29 PM EDT
Location: Saskatchewan
Posts: 18,531
Thanks: 681
Thanked 3,036 Times in 2,856 Posts
If your output has nothing to do with the input shown, I can't guess much from it. With your explanation I think I get it now. If you see a spike, you want to print the line, then suppress any output for the next 10 lines.


Code:
awk '(NR>1) && (($1-x)>=5000) && (L<=0) {
        printf("Increase of %f at line %d\n", $1-x, NR);
        L=LEN;   x=$1;   next
} { x=$1; L-- }' LEN=10 datafile

The Following User Says Thank You to Corona688 For This Useful Post:
markymarkg123 (12-06-2012)
Sponsored Links
    #5  
Old 12-06-2012
markymarkg123 markymarkg123 is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 8 January 2013, 3:21 PM EST
Posts: 10
Thanks: 5
Thanked 0 Times in 0 Posts
Good start! Thank you. Your code is cool, but it does not select the highest peak (line 5). It picks line 4 because it is the first to exceed 5000. Can it be written to pick line 5 (the biggest) and then to eliminate the line before 5 (line 4) and the line after 5 (line 6)?
Sponsored Links
    #6  
Old 12-06-2012
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 17 April 2014, 6:29 PM EDT
Location: Saskatchewan
Posts: 18,531
Thanks: 681
Thanked 3,036 Times in 2,856 Posts
The requirement to look ahead makes things a lot more complicated.


Code:
awk 'BEGIN           {       MAX=5000; getline; x=$1 }
((d=$1-x)>=MAX) {       MAX=d;  MAXNR=NR; L=10; }
((--L) == 0)    {
                        printf("Increase of %f at line %d\n", MAX, MAXNR);
                        MAX=5000
                        MAXNR=0
                }
{ x=$1 }

END {
        if(MAXNR) printf("Increase of %f at line %d\n", MAX, MAXNR);
}' datafile

It will count 10 lines from the highest peak.
The Following User Says Thank You to Corona688 For This Useful Post:
markymarkg123 (12-06-2012)
Sponsored Links
    #7  
Old 12-06-2012
markymarkg123 markymarkg123 is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 8 January 2013, 3:21 PM EST
Posts: 10
Thanks: 5
Thanked 0 Times in 0 Posts
Thanks, brother, much obliged!
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Selecting lowest and highest values in columns 1 and 2, based on subsets in column 3 hubleo Shell Programming and Scripting 2 04-25-2012 07:59 AM
Selecting a range of Lines beinthemiddle Shell Programming and Scripting 6 07-14-2010 09:14 AM
print range between two patterns if it contains a pattern within the range joyan321 Shell Programming and Scripting 2 06-18-2009 05:27 PM
Selecting files between a user inputed date range kelldan Shell Programming and Scripting 6 07-28-2008 03:36 PM
sort with highest wc ymf UNIX for Dummies Questions & Answers 1 03-27-2008 02:58 AM



All times are GMT -4. The time now is 10:54 AM.