Generate Regex numeric range with specific sub-ranges


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Generate Regex numeric range with specific sub-ranges
# 1  
Old 03-17-2013
Generate Regex numeric range with specific sub-ranges

hi all,

Say i have a range like 0 - 1000 and i need to split into diffrent files the lines which are within a specific fixed sub-range. I can achieve this manually but is not scalable if the range increase.

E.g

Code:
cat file1.txt
Response time 2 ms
Response time 15 ms
Response time 101 ms
Response time 279 ms
etc

What i currently do is create an array and then grep for it in a loop

Code:
bucketLimits=( 
 # 100 <> 150, 150 <> 200, 200 <> 250, 250 <> 300, 300 <> 350, 350 <> 400, 400 <> 450, 450 <> 500 
 '[1][0-4][0-9]' '[1][5-9][0-9]' '[2][0-4][0-9]' '[2][5-9][0-9]' '[3][0-4][0-9]' '[3][5-9][0-9]' '[4][0-4][0-9]' '[4][5-9][0-9]'
 # 500 <> 550, 550 <> 600, 600 <> 650, 650 <> 700, 700 <> 750, 750 <> 800, 800 <> 850, 850 <> 900, 900 <> 950, 950 <> 1000
        '[5][0-4][0-9]' '[5][5-9][0-9]' '[6][0-4][0-9]' '[6][5-9][0-9]' '[7][0-4][0-9]' '[7][5-9][0-9]' '[8][0-4][0-9]' '[8][5-9][0-9]' '[9][0-4][0-9]' '[9][5-9][0-9]' 
 )
 
  for bucketLimit in ${bucketLimits[@]}
  do
    limit=${bucketLimits[$index]}
    result=`grep "Response" file1.txt| grep -oE "time ${limit} ms" | wc -l` 
    finalResult=$finalResult","$result
    index=$(( $index + 1 ))
  done
  echo "$finalResult" >> ./stats_results.csv

Any idea how i can auto generate the buckeLimits array by giving the sub-range value? Could be 10 range or as it is now 50 range.

Thx!

Last edited by varu0612; 03-17-2013 at 09:29 AM.. Reason: code typo
# 2  
Old 03-17-2013
Quote:
Originally Posted by varu0612
What i currently do is create an array and then grep for it in a loop
Code:
for bucketLimit in ${bucketLimits[@]}
  do
    limit=${bucketLimits[$index]}
    result=`grep "Response" | grep -P "CMDC=${limit} ms" | wc -l` 
    finalResult=$finalResult","$result
    index=$(( $index + 1 ))
    fi
  done

1. Not sure how this works for you.
2. The grep statement searches for patterns from what?
3. There is a "fi" without an "if". Are we missing few lines of code here?
# 3  
Old 03-17-2013
i made the correction in my sample code - see initial post above

Thx
# 4  
Old 03-17-2013
Here's a solution using a bit of elementary mathematics instead of regular expressions.

And I've assumed that file.txt contains only lines such as "Response time <time> ms"
Code:
#! /bin/bash

i=0 # initial
r=50 # range
f=1000 # final

while [ $i -le $f ]
do
   final[$(( $i / $r ))]=0
   i=$(( $i + $r ))
done

while read a b time unit
do
    index=$(( $time / $r ))
    final[$index]=$(( ${final[$index]} + 1 ))
done < file.txt

x=${final[@]}
echo ${x// /,} >> stats_results.csv

# 5  
Old 03-17-2013
Here's an AWK approach which uses an array of buckets, b, with n buckets of size s.
Code:
awk '/^Response time/ {++b[int($3/s)]} END {for(i=0; i<n; i++) print b[i]+0}' n=10 s=100 file

It assumes that the first bucket spans 0 to s-1. It could easily be modified to accept an initial starting point other than 0, but I'll leave that as an exercise. Also, values beyond the valid bucket ranges are ignored, though this too can be easily changed.

The output is one line per bucket, but paste -sd, - trivially converts it to the OP's comma-delimited format.

Regards,
Alister

Last edited by alister; 03-17-2013 at 03:20 PM..
# 6  
Old 03-17-2013
Quote:
Originally Posted by alister
Here's an AWK approach which uses an array of buckets, b, with n buckets of size s.
Code:
awk '/^Response time/ {++b[int($3/s)]} END {for(i=0; i<n; i++) print b[i]+0}' n=10 s=100 file

It assumes that the first bucket spans 0 to s-1. It could easily be modified to accept an initial starting point other than 0, but I'll leave that as an exercise. Also, values beyond the valid bucket ranges are ignored, though this too can be easily changed.

The output is one line per bucket, but paste -sd, - trivially converts it to the OP's comma-delimited format.

Regards,
Alister

Alister,

Your method is a very tidy/ nice one (balajesuri yours works ok as well, so thank you!).

Two more question:

a) how can i add a header like this which should take into account the n buckets of size s

Buckets,0-5ms,5-10ms,10-20ms,20-30ms,30-40ms,40-50ms,50-60ms,60-70ms,70-80ms,80-90ms,90-100ms,100-150ms,150-200ms,> 200ms

b) if the values are beyond the valid range, how can i add it under >200ms for example?

Many thanks,
# 7  
Old 03-17-2013
How about this, we specify the upper limit for each bucket with an auto implied bucket for everything greater:

Code:
$ awk -v buckets="5,10,20,30,40,50,60,70,80,90,100,150,200" '
 BEGIN {n=split(buckets,B,",");B[n]=x};
 /^Response time/{for(i=1;B[i]&&($3>B[i]);i++);v[i]++}
 END{for(i=0;i<=n;i++) $i=v[i]+0; print}' OFS=, file1.txt >> stats_results.csv

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Cannot subset ranges from another range set

Ca21chr2_C_albicans_SC5314 2159343 2228327 Ca21chr2_C_albicans_SC5314 636587 638608 Ca21chr2_C_albicans_SC5314 5286 50509 Ca21chr2_C_albicans_SC5314 634021 636276 Ca21chr2_C_albicans_SC5314 1886545 1900975 Ca21chr2_C_albicans_SC5314 610758 613544... (9 Replies)
Discussion started by: cryptodice
9 Replies

2. Shell Programming and Scripting

Regex to exclude numeric

Dear All, My regex is like below. Its says all the number in coloum is include. 11666 11777 11888 ^(?\: (0|11)(666|777|888))\\d+$ How to exclude all the numeric that not mentioned in above regex. Regards, (3 Replies)
Discussion started by: tpx99
3 Replies

3. Shell Programming and Scripting

Zipping files by numeric name range

Hi there, Not being too up on bash shell programming at this point, could anyone throw me a bone about how to zip up a set of numerically-named files by range? For example, in a folder that contains files 1.pdf through 132000.pdf, I'd like to zip up just those files that are 50000.pdf and... (6 Replies)
Discussion started by: enwood
6 Replies

4. Shell Programming and Scripting

sed filtering lines by range fails 1-line-ranges

The following is part of a larger project and sed is (right now) a given. I am working on a recursive Korn shell function to "peel off" XML tags from a larger text. Just for context i will show the complete function (not working right now) here: function pGetXML { typeset chTag="$1" typeset... (5 Replies)
Discussion started by: bakunin
5 Replies

5. Shell Programming and Scripting

getting files between specific date ranges in solaris

hi ! how can i get files in a directory between certain date ranges ? say all files created/modified between Jan24 - Jan31 thanks (10 Replies)
Discussion started by: aliyesami
10 Replies

6. Shell Programming and Scripting

Awk numeric range match only one digit?

Hello, I have a text file with lines that look like this: 1974 12 27 -0.72743 -1.0169 2 1.25029 1974 12 28 -0.4958 -0.72926 2 0.881839 1974 12 29 -0.26331 -0.53426 2 0.595623 1974 12 30 7.71432E-02 -0.71887 3 0.723001 1974 12 31 0.187789 -1.07114 3 1.08748 1975 1 1 0.349933 -1.02217... (2 Replies)
Discussion started by: meridionaljet
2 Replies

7. Programming

Perl : Numeric Range Pattern Matching

hi Experts just wondering if you can help me check a number between a specific range if i have an ip address , how can i say the valid number for ip between 1 to 254 something like this if ($ip ) =~ /.../ { } what the pattern i need to type thanks (3 Replies)
Discussion started by: doubando
3 Replies

8. Shell Programming and Scripting

Count occurences of a numeric string falling in a range

Dear all, I have numerous dat files (1.dat, 2.dat...) containing 500 numeric values each. I would like to count them, based on their range and obtain a histogram or a counter. INPUT: 1.dat 1.3 2.16 0.34 ...... 2.dat 1.54 0.94 3.13 ..... ... (3 Replies)
Discussion started by: chen.xiao.po
3 Replies

9. Shell Programming and Scripting

awk to match a numeric range specified by two columns

Hi Everyone, Here's a snippet of my data: File 1 = testRef2: A1BG - 13208 13284 AAA1 - 34758475 34873943 AAAS - 53701240 53715412File 2 = 42MLN.3.bedS2: 13208 13208 13360 13363 13484 13518 13518My awk script: awk 'NR == FNR{a=$1;next} {$1>=a}{$1<=a}{print... (5 Replies)
Discussion started by: heecha
5 Replies

10. Shell Programming and Scripting

numeric range comparisons

I have two files.And a sort of matrix analysis. Both files have a string followed by two numbers: File 1: A 2 7 B 3 11 C 5 10 ...... File 2: X 1 10 Y 3 5 Z 5 9 What I'd like to do is for each set of numbers in the second file indicate if the first or second number (or both) in... (7 Replies)
Discussion started by: dcfargo
7 Replies
Login or Register to Ask a Question