alister's proposal assumes a fixed bucket size (in this case 100 ms per bucket), and a fixed number of buckets, 10. Your header does not (5ms, 5ms, 10ms, 8 x 10ms, 50 ms, 50 ms, infinity) and thus is incompatible with that nice, simple, and linear solution. You would need to explicitly pass the buckets to awk; then it also would be easy to both print the header and check "out of range".
EDIT: Chubler_XL just outpassed me; his proposal comes close to what I had in mind. He just doesn't put the 279 ms in the sample file into the right bin.
EDIT 2: massaging Chubler_XL's proposal slightly, this might be acceptable to the requestor:
alister's proposal assumes a fixed bucket size (in this case 100 ms per bucket), and a fixed number of buckets, 10. Your header does not (5ms, 5ms, 10ms, 8 x 10ms, 50 ms, 50 ms, infinity) and thus is incompatible with that nice, simple, and linear solution. You would need to explicitly pass the buckets to awk; then it also would be easy to both print the header and check "out of range".
EDIT: Chubler_XL just outpassed me; his proposal comes close to what I had in mind. He just doesn't put the 279 ms in the sample file into the right bin.
EDIT 2: massaging Chubler_XL's proposal slightly, this might be acceptable to the requestor:
The problem with your/ Chubler_XL suggestion is that i'll have to defined the upper bucket and this is the main reason why i'm moving away from my current solution otherwise for a range of 0 - 1000 with an upper bucket limit of 10 ms will take me ages to define it.
Alister's solution is very simple and so i have to defined only 2 values.
With regards to the header - i only gave an example but as i said to keep the nice/ tidy solution, the header should be generated based ont he n/ s values.
If $3>B[i] works with your awk implementation (I know it works with at least some mawk versions, if not all) then it's because it's violating POSIX. That should be performing a string comparison for all iterations of the loop, even when both B[i] and $3 are numeric strings. A compliant implementation can yield an incorrect result (such as when "200" is treated as greater than "10").
Comparisons (with the '<' , "<=" , "!=" , "==" , '>' , and ">=" operators) shall be made numerically if both operands are numeric, if one is numeric and the other has a string value that is a numeric string, or if one is numeric and the other has the uninitialized value. Otherwise, operands shall be converted to strings as required and a string comparison shall be made using the locale-specific collation sequence. The value of the comparison expression shall be 1 if the relation is true, or 0 if the relation is false.
Except for the final value in B, every member of B that results from split() is a numeric string. Every "number' assigned from the input data to a field variable (such as $3) is also a numeric string. Note that the case of comparing a numeric string with a numeric string should be handled as a string comparison; at least one operand should be numeric for a numeric comparison to occur (which means "casting" with +0, or using the result of a function that returns a number, or using a numeric literal).
Another issue is that the terminating condition is locale dependent. The only reason the loop terminates is because a string comparison is used to compare the value of $3 against ">200" (in this instance). If a locale-aware implementation were run under a locale that did not place the ">" after all of the digits, an infinite loop would result upon encountering a value that should land in the last bucket.
Regards,
Alister
---------- Post updated at 08:04 PM ---------- Previous update was at 07:18 PM ----------
Quote:
Originally Posted by varu0612
Alister,
Your method is a very tidy/ nice one (balajesuri yours works ok as well, so thank you!).
Two more question:
a) how can i add a header like this which should take into account the n buckets of size s
Input file
Output/ result
Just for my own knowledge: should i understand that is very hard to implement this using the regular expressions? Has anyone done it?
It's part of the ternary operator, e1 ? e2 : e3, which involves three expressions, e1, e2, and e3. If the first expression, e1, evaluates to true, then the result is e2. If e1 is instead false, return e3.
In the quoted code fragment:
e1: (i=int($3/s)) > n
e2: n
e3: i
e1 calculates the bucket index to which $3 belongs, stores that value in i, and then compares the value of the assignment (which is the value stored in i) to n. If i is greater than n, which would indicate a bucket beyond the final bucket, then e1 is true and the result is e2, which is n. This is the logic which folds all values that would fall into a bucket beyond the final bucket into that final bucket. If, however, i is not greater than n, then e1 is false, i is a valid bucket index, and the ternary operator returns e3 (i).
I don't recommend this type of coding, as it's difficult to decipher. Even an expert programmer has to give it a close look to be certain of what's going on. My only defense is that it makes it more fun for me to contribute here, as I attempt to be as concise as possible. A possible beneficial side effect is that it may help others learn more about the language in question.
A much more readable, maintainable, and professional version:
Regards,
Alister
The fact you took your time to explain in detail how it works where even a 5 years old kid can understand is very much appreciated.
I've seen many smart users replying with solutions who don't fail to explain the logic ... in my view that is a useless answer since it doesn't help the requester to understand/ learn how it works.
Dear All,
My regex is like below. Its says all the number in coloum is include.
11666
11777
11888
^(?\: (0|11)(666|777|888))\\d+$
How to exclude all the numeric that not mentioned in above regex.
Regards, (3 Replies)
Hi there,
Not being too up on bash shell programming at this point, could anyone throw me a bone about how to zip up a set of numerically-named files by range?
For example, in a folder that contains files 1.pdf through 132000.pdf, I'd like to zip up just those files that are 50000.pdf and... (6 Replies)
The following is part of a larger project and sed is (right now) a given. I am working on a recursive Korn shell function to "peel off" XML tags from a larger text. Just for context i will show the complete function (not working right now) here:
function pGetXML
{
typeset chTag="$1"
typeset... (5 Replies)
hi Experts
just wondering if you can help me check a number between a specific range
if i have an ip address , how can i say the valid number for ip between 1 to 254
something like this
if ($ip ) =~ /.../
{
}
what the pattern i need to type
thanks (3 Replies)
Dear all,
I have numerous dat files (1.dat, 2.dat...) containing 500 numeric values each. I would like to count them, based on their range and obtain a histogram or a counter.
INPUT:
1.dat
1.3
2.16
0.34
......
2.dat
1.54
0.94
3.13
.....
... (3 Replies)
I have two files.And a sort of matrix analysis.
Both files have a string followed by two numbers:
File 1:
A 2 7
B 3 11
C 5 10
......
File 2:
X 1 10
Y 3 5
Z 5 9
What I'd like to do is for each set of numbers in the second file indicate if the first or second number (or both) in... (7 Replies)