Selecting highest value within a range


 
Thread Tools Search this Thread
# 1  
Selecting highest value within a range

Within millions of lines of data, there are perhaps 20 "spikes" that are very narrow. I want only the highest value from each spike within a range of 10 rows. Possible?

My data looks like this, 8 columns of integers, millions of rows. There are clear spikes when you graph columns.

Code:
2.14883e+05  1.22992e+05  7.96926e+03  -1.37694e+03  3.95054e+03  -1.62924e+04  8.21638e+03  1.01061e+04  
2.14357e+05  1.22730e+05  8.20939e+03  -1.54033e+03  4.28164e+03  -1.61322e+04  7.97054e+03  1.01922e+04  
2.13361e+05  1.22889e+05  8.05019e+03  -1.18045e+03  4.02582e+03  -1.61925e+04  7.99161e+03  1.02380e+04  
2.68777e+05  1.17178e+05  5.12913e+04  -1.40305e+04  2.95355e+04  -2.65120e+04  2.14739e+04  9.34042e+03  
4.09316e+05  1.80414e+05  5.32998e+04  -1.06297e+04  3.04299e+04  -2.75763e+04  2.09896e+04  8.12131e+03  
4.94380e+05  2.46756e+05  1.36658e+04  6.56373e+03  6.79386e+03  -1.70254e+04  7.70163e+03  9.14013e+03  
4.92551e+05  2.48154e+05  1.39390e+04  6.94251e+03  6.73128e+03  -1.65537e+04  7.33397e+03  9.21148e+03  
4.91403e+05  2.48659e+05  1.49110e+04  6.85990e+03  7.53969e+03  -1.67406e+04  7.14302e+03  9.71328e+03  
4.89192e+05  2.52866e+05  1.39238e+04  8.07381e+03  7.38389e+03  -1.64431e+04  6.44513e+03  1.00028e+04  
4.90260e+05  2.47635e+05  1.45364e+04  6.72570e+03  7.24393e+03  -1.69678e+04  6.90001e+03  1.01961e+04

Currently, my code will give me all the consecutive-row increases in column 1 over a specified value of 5000:

Code:
awk '(NR>1) && (d=$1-x)>=5000 {print "increase of" " " d, "at Line" " " 'NR' " "}{x=$1}' test | sort -n

The output looks like this:
increase of 140539 at Line 5
increase of 55416 at Line 4
increase of 85064 at Line 6

However, I want the output to look like this, filtering out the smaller increases within that data range:
increase of 140539 at Line 5
increase of 5662 at Line 48924
increase of 10334 at Line 589332
...
# 2  
I don't get it. Your one with the 'smaller increases removed' shows smaller increases.
# 3  
I don't want to know every time the increase is over 5000. I want to know, of the points that exceed 5000, which is the highest within a range of 10 rows. Does that explain better?

I only provided 10 lines of data. Within that, there are 3 points within those 10 lines that the increase is over the 5000 threshold. I want to get rid of the other two at Line 4 and Line 6 because they don't indicate new spikes, they are duplicates of the same spike. I wrote in two other spikes over 5000 from later on in the data as examples of the desired outcome.
# 4  
If your output has nothing to do with the input shown, I can't guess much from it. Smilie With your explanation I think I get it now. If you see a spike, you want to print the line, then suppress any output for the next 10 lines.

Code:
awk '(NR>1) && (($1-x)>=5000) && (L<=0) {
        printf("Increase of %f at line %d\n", $1-x, NR);
        L=LEN;   x=$1;   next
} { x=$1; L-- }' LEN=10 datafile

This User Gave Thanks to Corona688 For This Post:
# 5  
Good start! Thank you. Your code is cool, but it does not select the highest peak (line 5). It picks line 4 because it is the first to exceed 5000. Can it be written to pick line 5 (the biggest) and then to eliminate the line before 5 (line 4) and the line after 5 (line 6)?
# 6  
The requirement to look ahead makes things a lot more complicated.

Code:
awk 'BEGIN           {       MAX=5000; getline; x=$1 }
((d=$1-x)>=MAX) {       MAX=d;  MAXNR=NR; L=10; }
((--L) == 0)    {
                        printf("Increase of %f at line %d\n", MAX, MAXNR);
                        MAX=5000
                        MAXNR=0
                }
{ x=$1 }

END {
        if(MAXNR) printf("Increase of %f at line %d\n", MAX, MAXNR);
}' datafile

It will count 10 lines from the highest peak.
This User Gave Thanks to Corona688 For This Post:
# 7  
Thanks, brother, much obliged!
 

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #820
Difficulty: Easy
Before CSS, nearly all presentational attributes of HTML documents were contained within the HTML markup.
True or False?

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Selecting lowest and highest values in columns 1 and 2, based on subsets in column 3

Hi, I have a file with the following columns: 361459 447394 CHL1 290282 290282 CHL1 361459 447394 CHL1 361459 447394 CHL1 178352861 178363529 AGA 178352861 178363529 AGA 178363657 178363657 AGA Essentially, using CHL1 as an example. For any line that has CHL1 in... (2 Replies)
Discussion started by: hubleo
2 Replies

2. Shell Programming and Scripting

Print the key with highest value

print the key with highest value input a 10 a 20 a 30 b 2 b 3 b 1 output a 30 b 3 (9 Replies)
Discussion started by: quincyjones
9 Replies

3. Shell Programming and Scripting

Finding the highest value(in negative)

Hi all, I have a simple problem. I have given an example of the problem below. There are 4 space-delimited columns. 2655 96 IA -0.8179 2655 96 IA -0.9144 2655 96 CPU -0.4275 2655 96 RMA -0.3407 2655 96 IA -0.9373 2655 96 ... (2 Replies)
Discussion started by: jaysean
2 Replies

4. Shell Programming and Scripting

Selecting a range of Lines

Hi All, Is there a way to get a range of lines from a file??? I want to search through a set of scripts and need to select the group of lines which do the FTP. Say, Line1 Line2 ftp SERVER user UNAME PASS send FILE_TO_BE_SENT close Line3 Line4 Line5 ftp SERVER1 user USER1 PASS1... (6 Replies)
Discussion started by: beinthemiddle
6 Replies

5. Shell Programming and Scripting

Extract the highest number out

Hi Gurus, I've using HPUX B.11.23 U ia64 with shell = sh. I've been having some problem get the highest number of this script. Actually I wanted to get the highest number from this listing (TEST123 data and based on this highest number, there will be email being sent out. For example,... (6 Replies)
Discussion started by: superHonda123
6 Replies

6. Shell Programming and Scripting

print range between two patterns if it contains a pattern within the range

I want to print between the range two patterns if a particular pattern is present in between the two patterns. I am new to Unix. Any help would be greatly appreciated. e.g. Pattern1 Bombay Calcutta Delhi Pattern2 Pattern1 Patna Madras Gwalior Delhi Pattern2 Pattern1... (2 Replies)
Discussion started by: joyan321
2 Replies

7. UNIX for Dummies Questions & Answers

Printing highest value from one column

Hi, I have a file that looks like this: s6 98 s6 91 s6 56 s5 32 s5 10 s5 4 So what I want to do is print only the highest value for each value in the column: So the file will look like this: s6 98 s5 32 Thanks (4 Replies)
Discussion started by: phil_heath
4 Replies

8. Shell Programming and Scripting

Selecting files between a user inputed date range

Hi all! I'm working on a KSH script to select files between a user inputed date range (stored in a variable) and then move them and unzip them. I'm stuck at how to get the files between the user inputed date range selected. Any help would be greatly appreciated! The files are as such: ... (6 Replies)
Discussion started by: kelldan
6 Replies

9. UNIX for Dummies Questions & Answers

sort with highest wc

Hi :) I'm a unix beginner and i've recently got an assignment to write up a script to print the most common IP address that made requests from a webserver. I'm really lost in this one...and if someone could pls tell me where to start i'll be really greatful ! thanx (1 Reply)
Discussion started by: ymf
1 Replies

Featured Tech Videos