|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Selecting highest value within a range
Within millions of lines of data, there are perhaps 20 "spikes" that are very narrow. I want only the highest value from each spike within a range of 10 rows. Possible? My data looks like this, 8 columns of integers, millions of rows. There are clear spikes when you graph columns. Code:
2.14883e+05 1.22992e+05 7.96926e+03 -1.37694e+03 3.95054e+03 -1.62924e+04 8.21638e+03 1.01061e+04 2.14357e+05 1.22730e+05 8.20939e+03 -1.54033e+03 4.28164e+03 -1.61322e+04 7.97054e+03 1.01922e+04 2.13361e+05 1.22889e+05 8.05019e+03 -1.18045e+03 4.02582e+03 -1.61925e+04 7.99161e+03 1.02380e+04 2.68777e+05 1.17178e+05 5.12913e+04 -1.40305e+04 2.95355e+04 -2.65120e+04 2.14739e+04 9.34042e+03 4.09316e+05 1.80414e+05 5.32998e+04 -1.06297e+04 3.04299e+04 -2.75763e+04 2.09896e+04 8.12131e+03 4.94380e+05 2.46756e+05 1.36658e+04 6.56373e+03 6.79386e+03 -1.70254e+04 7.70163e+03 9.14013e+03 4.92551e+05 2.48154e+05 1.39390e+04 6.94251e+03 6.73128e+03 -1.65537e+04 7.33397e+03 9.21148e+03 4.91403e+05 2.48659e+05 1.49110e+04 6.85990e+03 7.53969e+03 -1.67406e+04 7.14302e+03 9.71328e+03 4.89192e+05 2.52866e+05 1.39238e+04 8.07381e+03 7.38389e+03 -1.64431e+04 6.44513e+03 1.00028e+04 4.90260e+05 2.47635e+05 1.45364e+04 6.72570e+03 7.24393e+03 -1.69678e+04 6.90001e+03 1.01961e+04 Currently, my code will give me all the consecutive-row increases in column 1 over a specified value of 5000: Code:
awk '(NR>1) && (d=$1-x)>=5000 {print "increase of" " " d, "at Line" " " 'NR' " "}{x=$1}' test | sort -nThe output looks like this: increase of 140539 at Line 5 increase of 55416 at Line 4 increase of 85064 at Line 6 However, I want the output to look like this, filtering out the smaller increases within that data range: increase of 140539 at Line 5 increase of 5662 at Line 48924 increase of 10334 at Line 589332 ... |
| Sponsored Links | ||
|
|
#2
|
|||
|
|||
|
I don't get it. Your one with the 'smaller increases removed' shows smaller increases.
|
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
I don't want to know every time the increase is over 5000. I want to know, of the points that exceed 5000, which is the highest within a range of 10 rows. Does that explain better?
I only provided 10 lines of data. Within that, there are 3 points within those 10 lines that the increase is over the 5000 threshold. I want to get rid of the other two at Line 4 and Line 6 because they don't indicate new spikes, they are duplicates of the same spike. I wrote in two other spikes over 5000 from later on in the data as examples of the desired outcome. |
|
#4
|
|||
|
|||
|
If your output has nothing to do with the input shown, I can't guess much from it. With your explanation I think I get it now. If you see a spike, you want to print the line, then suppress any output for the next 10 lines.Code:
awk '(NR>1) && (($1-x)>=5000) && (L<=0) {
printf("Increase of %f at line %d\n", $1-x, NR);
L=LEN; x=$1; next
} { x=$1; L-- }' LEN=10 datafile |
| The Following User Says Thank You to Corona688 For This Useful Post: | ||
markymarkg123 (12-06-2012) | ||
| Sponsored Links | |
|
|
#5
|
|||
|
|||
|
Good start! Thank you. Your code is cool, but it does not select the highest peak (line 5). It picks line 4 because it is the first to exceed 5000. Can it be written to pick line 5 (the biggest) and then to eliminate the line before 5 (line 4) and the line after 5 (line 6)?
|
| Sponsored Links | |
|
|
#6
|
|||
|
|||
|
The requirement to look ahead makes things a lot more complicated. Code:
awk 'BEGIN { MAX=5000; getline; x=$1 }
((d=$1-x)>=MAX) { MAX=d; MAXNR=NR; L=10; }
((--L) == 0) {
printf("Increase of %f at line %d\n", MAX, MAXNR);
MAX=5000
MAXNR=0
}
{ x=$1 }
END {
if(MAXNR) printf("Increase of %f at line %d\n", MAX, MAXNR);
}' datafileIt will count 10 lines from the highest peak. |
| The Following User Says Thank You to Corona688 For This Useful Post: | ||
markymarkg123 (12-06-2012) | ||
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
Thanks, brother, much obliged!
|
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Selecting lowest and highest values in columns 1 and 2, based on subsets in column 3 | hubleo | Shell Programming and Scripting | 2 | 04-25-2012 07:59 AM |
| Selecting a range of Lines | beinthemiddle | Shell Programming and Scripting | 6 | 07-14-2010 09:14 AM |
| print range between two patterns if it contains a pattern within the range | joyan321 | Shell Programming and Scripting | 2 | 06-18-2009 05:27 PM |
| Selecting files between a user inputed date range | kelldan | Shell Programming and Scripting | 6 | 07-28-2008 03:36 PM |
| sort with highest wc | ymf | UNIX for Dummies Questions & Answers | 1 | 03-27-2008 02:58 AM |
|
|