Generate Regex numeric range with specific sub-ranges

03-17-2013

Registered User

28, 0

Join Date: Oct 2008

Last Activity: 17 February 2014, 12:56 PM EST

Location: UK - South East

Posts: 28

Thanks Given: 4

Thanked 0 Times in 0 Posts

Generate Regex numeric range with specific sub-ranges

hi all,

Say i have a range like 0 - 1000 and i need to split into diffrent files the lines which are within a specific fixed sub-range. I can achieve this manually but is not scalable if the range increase.

E.g

Code:

cat file1.txt
Response time 2 ms
Response time 15 ms
Response time 101 ms
Response time 279 ms
etc

What i currently do is create an array and then grep for it in a loop

Code:

bucketLimits=( 
 # 100 <> 150, 150 <> 200, 200 <> 250, 250 <> 300, 300 <> 350, 350 <> 400, 400 <> 450, 450 <> 500 
 '[1][0-4][0-9]' '[1][5-9][0-9]' '[2][0-4][0-9]' '[2][5-9][0-9]' '[3][0-4][0-9]' '[3][5-9][0-9]' '[4][0-4][0-9]' '[4][5-9][0-9]'
 # 500 <> 550, 550 <> 600, 600 <> 650, 650 <> 700, 700 <> 750, 750 <> 800, 800 <> 850, 850 <> 900, 900 <> 950, 950 <> 1000
        '[5][0-4][0-9]' '[5][5-9][0-9]' '[6][0-4][0-9]' '[6][5-9][0-9]' '[7][0-4][0-9]' '[7][5-9][0-9]' '[8][0-4][0-9]' '[8][5-9][0-9]' '[9][0-4][0-9]' '[9][5-9][0-9]' 
 )
 
  for bucketLimit in ${bucketLimits[@]}
  do
    limit=${bucketLimits[$index]}
    result=`grep "Response" file1.txt| grep -oE "time ${limit} ms" | wc -l` 
    finalResult=$finalResult","$result
    index=$(( $index + 1 ))
  done
  echo "$finalResult" >> ./stats_results.csv

Any idea how i can auto generate the buckeLimits array by giving the sub-range value? Could be 10 range or as it is now 50 range.

Thx!

Last edited by varu0612; 03-17-2013 at 09:29 AM.. Reason: code typo

varu0612

View Public Profile for varu0612

Find all posts by varu0612

03-17-2013

Registered User

2,019, 606

Join Date: Apr 2009

Last Activity: 27 February 2021, 12:15 PM EST

Location: India

Posts: 2,019

Thanks Given: 50

Thanked 606 Times in 567 Posts

Quote:

Originally Posted by varu0612

What i currently do is create an array and then grep for it in a loop

Code:

for bucketLimit in ${bucketLimits[@]}
  do
    limit=${bucketLimits[$index]}
    result=`grep "Response" | grep -P "CMDC=${limit} ms" | wc -l` 
    finalResult=$finalResult","$result
    index=$(( $index + 1 ))
    fi
  done

1. Not sure how this works for you.
2. The grep statement searches for patterns from what?
3. There is a "fi" without an "if". Are we missing few lines of code here?

balajesuri

View Public Profile for balajesuri

Find all posts by balajesuri

03-17-2013

Registered User

28, 0

Join Date: Oct 2008

Last Activity: 17 February 2014, 12:56 PM EST

Location: UK - South East

Posts: 28

Thanks Given: 4

Thanked 0 Times in 0 Posts

i made the correction in my sample code - see initial post above

Thx

varu0612

View Public Profile for varu0612

Find all posts by varu0612

03-17-2013

Registered User

2,019, 606

Join Date: Apr 2009

Last Activity: 27 February 2021, 12:15 PM EST

Location: India

Posts: 2,019

Thanks Given: 50

Thanked 606 Times in 567 Posts

Here's a solution using a bit of elementary mathematics instead of regular expressions.

And I've assumed that file.txt contains only lines such as "Response time <time> ms"

Code:

#! /bin/bash

i=0 # initial
r=50 # range
f=1000 # final

while [ $i -le $f ]
do
   final[$(( $i / $r ))]=0
   i=$(( $i + $r ))
done

while read a b time unit
do
    index=$(( $time / $r ))
    final[$index]=$(( ${final[$index]} + 1 ))
done < file.txt

x=${final[@]}
echo ${x// /,} >> stats_results.csv

balajesuri

View Public Profile for balajesuri

Find all posts by balajesuri

03-17-2013

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Here's an AWK approach which uses an array of buckets, b, with n buckets of size s.

Code:

awk '/^Response time/ {++b[int($3/s)]} END {for(i=0; i<n; i++) print b[i]+0}' n=10 s=100 file

It assumes that the first bucket spans 0 to s-1. It could easily be modified to accept an initial starting point other than 0, but I'll leave that as an exercise. Also, values beyond the valid bucket ranges are ignored, though this too can be easily changed.

The output is one line per bucket, but paste -sd, - trivially converts it to the OP's comma-delimited format.

Regards,
Alister

Last edited by alister; 03-17-2013 at 03:20 PM..

alister

View Public Profile for alister

Find all posts by alister

03-17-2013

Registered User

28, 0

Join Date: Oct 2008

Last Activity: 17 February 2014, 12:56 PM EST

Location: UK - South East

Posts: 28

Thanks Given: 4

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by alister

Here's an AWK approach which uses an array of buckets, b, with n buckets of size s.

Code:

awk '/^Response time/ {++b[int($3/s)]} END {for(i=0; i<n; i++) print b[i]+0}' n=10 s=100 file

It assumes that the first bucket spans 0 to s-1. It could easily be modified to accept an initial starting point other than 0, but I'll leave that as an exercise. Also, values beyond the valid bucket ranges are ignored, though this too can be easily changed.

The output is one line per bucket, but paste -sd, - trivially converts it to the OP's comma-delimited format.

Regards,
Alister

Alister,

Your method is a very tidy/ nice one (balajesuri yours works ok as well, so thank you!).

Two more question:

a) how can i add a header like this which should take into account the n buckets of size s

Buckets,0-5ms,5-10ms,10-20ms,20-30ms,30-40ms,40-50ms,50-60ms,60-70ms,70-80ms,80-90ms,90-100ms,100-150ms,150-200ms,> 200ms

b) if the values are beyond the valid range, how can i add it under >200ms for example?

Many thanks,

varu0612

View Public Profile for varu0612

Find all posts by varu0612

03-17-2013

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

How about this, we specify the upper limit for each bucket with an auto implied bucket for everything greater:

Code:

$ awk -v buckets="5,10,20,30,40,50,60,70,80,90,100,150,200" '
 BEGIN {n=split(buckets,B,",");B[n]=x};
 /^Response time/{for(i=1;B[i]&&($3>B[i]);i++);v[i]++}
 END{for(i=0;i<=n;i++) $i=v[i]+0; print}' OFS=, file1.txt >> stats_results.csv

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

Shell Programming and Scripting

Generate Regex numeric range with specific sub-ranges

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Cannot subset ranges from another range set

Discussion started by: cryptodice

2. Shell Programming and Scripting

Regex to exclude numeric

Discussion started by: tpx99

3. Shell Programming and Scripting

Zipping files by numeric name range

Discussion started by: enwood

4. Shell Programming and Scripting

sed filtering lines by range fails 1-line-ranges

Discussion started by: bakunin

5. Shell Programming and Scripting

getting files between specific date ranges in solaris

Discussion started by: aliyesami

6. Shell Programming and Scripting

Awk numeric range match only one digit?

Discussion started by: meridionaljet

7. Programming

Perl : Numeric Range Pattern Matching

Discussion started by: doubando

8. Shell Programming and Scripting

Count occurences of a numeric string falling in a range

Discussion started by: chen.xiao.po

9. Shell Programming and Scripting

awk to match a numeric range specified by two columns

Discussion started by: heecha

10. Shell Programming and Scripting

numeric range comparisons

Discussion started by: dcfargo