Streamline script to search for numbers in a certain range


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Streamline script to search for numbers in a certain range
# 8  
Old 12-03-2013
They are pretty small numbers so you're right, double precision won't be enough. I didn't put the actual input on here becuase it is quite wide and I wasn't sure it would fit, but here are the first 29 lines of the input (some are in range some not):
Code:
 1001  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -3.200000000000000177635700E+000   7.357049247944539628087100E-005   0.000000000000000000000000E+000
  1002  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -3.000000000000000000000000E+000   8.933437123800284495461800E-005   0.000000000000000000000000E+000
  1003  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -2.800000000000000266453500E+000   1.032677570213522605163300E-004   0.000000000000000000000000E+000
  1004  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -2.600000000000000088817800E+000   1.137837787257466108355500E-004   0.000000000000000000000000E+000
  1005  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -2.400000000000000355271400E+000   1.197177265076853838613200E-004   0.000000000000000000000000E+000
  1006  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -2.200000000000000177635700E+000   1.206698428240725160963100E-004   0.000000000000000000000000E+000
  1007  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -2.000000000000000000000000E+000   1.171816551509077017172200E-004   0.000000000000000000000000E+000
  1008  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -1.800000000000000044408900E+000   1.106617028785276686097400E-004   0.000000000000000000000000E+000
  1009  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -1.600000000000000088817800E+000   1.030868470071896416838000E-004   0.000000000000000000000000E+000
  1010  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -1.400000000000000133226800E+000   9.657131022304918821930900E-005   0.000000000000000000000000E+000
  1011  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -1.200000000000000177635700E+000   9.291012747074769203420300E-005   0.000000000000000000000000E+000
  1012  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -1.000000000000000000000000E+000   9.320556587067969998156000E-005   0.000000000000000000000000E+000
  1013  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -8.000000000000000444089200E-001   9.765379370441768219074400E-005   0.000000000000000000000000E+000
  1014  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -6.000000000000000888178400E-001   1.054920789110816203891900E-004   0.000000000000000000000000E+000
  1015  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -4.000000000000000222044600E-001   1.151001670541533262190700E-004   0.000000000000000000000000E+000
  1016  -1.800000000000000000000000E+001   4.000000000000000222044600E-001  -2.000000000000000111022300E-001   1.242840753253212076213200E-004   0.000000000000000000000000E+000
  1017  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   0.000000000000000000000000E+000   1.307383448433430703315000E-004   0.000000000000000000000000E+000
  1018  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   2.000000000000000111022300E-001   1.325734149393867572844200E-004   0.000000000000000000000000E+000
  1019  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   4.000000000000000222044600E-001   1.287542993446382638949400E-004   0.000000000000000000000000E+000
  1020  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   6.000000000000000888178400E-001   1.193173666157540543063300E-004   0.000000000000000000000000E+000
  1021  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   8.000000000000000444089200E-001   1.052970288781878370996900E-004   0.000000000000000000000000E+000
  1022  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   1.000000000000000000000000E+000   8.839730377206444264081400E-005   0.000000000000000000000000E+000
  1023  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   1.200000000000000177635700E+000   7.053477083601010662588400E-005   0.000000000000000000000000E+000
  1024  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   1.400000000000000133226800E+000   5.341860224471713846023200E-005   0.000000000000000000000000E+000
  1025  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   1.600000000000000088817800E+000   3.829782783518725079079500E-005   0.000000000000000000000000E+000
  1026  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   1.800000000000000044408900E+000   2.585526288395301514511300E-005   0.000000000000000000000000E+000
  1027  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   2.000000000000000000000000E+000   1.623299231409858035877900E-005   0.000000000000000000000000E+000
  1028  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   2.200000000000000177635700E+000   9.208050322421728367170400E-006   0.000000000000000000000000E+000
  1029  -1.800000000000000000000000E+001   4.000000000000000222044600E-001   2.400000000000000355271400E+000   4.405340835827515605220000E-006   0.000000000000000000000000E+000

So what would be the fastest way to feed into bc? Would I be able to do something like:
Code:
#!/bin/bash
totden=0
for (( i = 3; i <= 1008163; i++ ))
do
      wf=$(awk 'NR = $i && $2 <= xmax && $2 >= xmin && $3 <= ymax && $3 >= ymin && $4 <= zmax && $4 >= zmin END {print $5}' xmin=-18 ymin=-2 zmin=-6 xmax=-8 ymax=8 zmax=4 NRmin=1 NRmax=3 density.mesh_index)
den=$( echo "scale=20; $wf^2 | bc )
totden=$( echo "scale=20; $totden + $den" | bc )
done
echo $totden

I taught myself the little I know about scripting so I'm sure my bad habits are shining through.
# 9  
Old 12-03-2013
In this given case only $5 needs to be "outsourced" to bc
Code:
awk '
$1 >= NRmin && $1 <= NRmax && $2 <= xmax && $2 >= xmin && $3 <= ymax && $3 >= ymin && $4 <= zmax && $4 >= zmin {print "x += "$5" ^ "2}
END {print "x"}
' xmin=-18 ymin=-2 zmin=-6 xmax=-8 ymax=8 zmax=4 NRmin=3 NRmax=1008163 density.mesh_index |
bc


Last edited by MadeInGermany; 12-03-2013 at 06:44 PM..
# 10  
Old 12-03-2013
Oh okay. I get it now.
Code:
{print "x += "$5" ^ "2}

will basically write $5^2 + $5^2.... and so on, where each $5 is from a different line that meets the criteria. Awk is not actually doing the arithmetic, just passing one big equation to bc. So, if I wanted precision out to the 20th decimal place, would I make the following change?
Code:
{print "scale=20;x += "$5" ^ "2}

---------- Post updated at 10:41 PM ---------- Previous update was at 10:33 PM ----------

I'm a bit confused. Is this doing the arithmetic in awk? Something isn't working out, it is giving a negative number.

Last edited by butson; 12-03-2013 at 10:34 PM.. Reason: code tag didn't work because I removed a bracket
# 11  
Old 12-04-2013
To get the precision you want it is a little more complicated than that; we still need to split the mantissa and exponent out of $5 and have bc perform arithmetic that you were doing in your script. For example: If we expand on MadeInGermany's awk script adding some error checking and using bc to get the desired precision we get something like:
Code:
#!/bin/ksh
IAm=${0##*/}
lf="$IAm.$$"
xmin=-18
ymin=-2
zmin=-6
xmax=-8
ymax=8
zmax=4

totden=$(awk -v xmin="$xmin" -v xmax="$xmax" -v ymin="$ymin" -v ymax="$ymax" \
             -v zmin="$zmin" -v zmax="$zmax" -v lf="$lf" '
BEGIN { printf("scale=20\nx=0\n") # Initialize bc
}
$5!="" && $2<=xmax && $2>=xmin && $3<=ymax && $3>=ymin && $4<=zmax && $4>=zmin {
        if(match($5, /E([+])?/))
                printf("x+=(%s*10^%s)^2\n", substr($5, 1, RSTART - 1),
                        substr($5, RSTART + RLENGTH))
        else    printf("NR=%d: $5 bad format: %s\n", NR, $5) > lf
}
END {   printf("x\nquit\n")
}' "${1:-density.mesh_index}" | bc)
es=$?
if [ -s "$lf" ]
then    printf "%s: Log file (%s) is not empty\n" "$IAm" "$lf" >&2
        cat "$lf" >&2
        if [ "$es" -eq 0 ]
        then    es=1
        fi
fi
printf "totden: %s\n" "$totden"
exit $es

and we expand density.mesh_index to contain several duplicates of lines you showed us in message #8 in this thread to expand it to 800 lines (and add 40 empty lines for fun) and modify a few of the exponents on the numbers in field 5 to use E+004 and E+005 instead of E-004 and E-005 (to test out the logic of converting values with positive and negative exponents) it produces the output:
Code:
totden: 29418651008124.442882561190769208458247

I also tried a slightly modified version of your script (to get rid of the diagnostics from bc induced by processing empty lines):
Code:
#!/bin/bash
xmin=-18
ymin=-2
zmin=-6

xmax=-8
ymax=8
zmax=4

totden=0

o=$2
s=$3

for (( e = $o; e <= $s; e++ ))
do
        x=$( sed -n ${e}p $1 | awk '{print $2}' | sed 's/E/\*10\^/' | sed 's/\+//' )
        if [ "$x" == "" ]
        then    continue
        fi
        xbool=$( echo "$x <= $xmax && $x >= $xmin" | bc )
        if [ "$xbool" == 1 ]
        then
                y=$( sed -n ${e}p $1 | awk '{print $3}' | sed 's/E/\*10\^/' | sed 's/\+//' )
                ybool=$( echo "$y <= $ymax && $y >= $ymin" | bc )
                if [ "$ybool" == 1 ]
                then
                        z=$( sed -n ${e}p $1 | awk '{print $4}' | sed 's/E/\*10\^/' | sed 's/\+//' )
                        zbool=$( echo "$z <= $zmax && $z >= $zmin" | bc )
                        if [ "$zbool" == 1 ]
                        then
                                psi=$( sed -n ${e}p $1 | awk '{print $5}' | sed 's/E/\*10\^/' | sed 's/\+//' )
                                dens=$( echo "scale=20; ($psi)^2" | bc )
                                totden=$( echo "scale=20;$totden + $dens" | bc )
                        fi
                fi
        fi
done
echo $totden

which when invoked with the operands density.mesh_index 1 840 to process all of the lines in the same file produces the output:
Code:
29418651008124.442882561190769208458247

Running the modified copy of your script with bash as the interpreter 10 times took from 22.94 to 23.76 seconds of wall clock time each. Changing the modified copy of your script to use ksh instead of bash produced the same results with execution times from 22.25 to 23.27 seconds each.

Using ksh to process the same data using the script I based on MadeInGermany's code 10 times took from .02 to .03 seconds each. And modifying that script to use bash instead of ksh took .02 to .04 seconds each. All of these tests were run on a 4 year old MacBook Pro running OS X version 10.7.5.

Hopefully, this streamlined script will cut your processing time for your 160 files from the 1.13 years you estimated to a something closer to 1 day.

In case you're having trouble following what the awk script is doing to $5 as it passes work to bc, an abbreviated version of what awk writes through the pipe to bc when given the input you showed us in message #8 in this thread follows:
Code:
scale=20
x=0
x+=(7.357049247944539628087100*10^-005)^2
x+=(8.933437123800284495461800*10^-005)^2
x+=(1.032677570213522605163300*10^-004)^2
 ... ... ...
x+=(1.623299231409858035877900*10^-005)^2
x+=(9.208050322421728367170400*10^-006)^2
x+=(4.405340835827515605220000*10^-006)^2
x
quit

These 2 Users Gave Thanks to Don Cragun For This Post:
# 12  
Old 12-04-2013
Sorry, I only saw that bc "eats" a
Code:
7.357049247944539628087100E-005

but now see it converts to
Code:
7.3570492479445396280871014-005

that results to
Code:
2.3570492479445396280871014

Thanks Don for working out the format conversion!
--
Yes, awk can run
Code:
{print $5 ^ 2}

that means it does the arithmetics and prints the resulting number.
While
Code:
{print $5" ^ "2}

is a concatenation of 3 strings, that is printed. And can be passed to bc.

Last edited by MadeInGermany; 12-04-2013 at 07:47 AM..
This User Gave Thanks to MadeInGermany For This Post:
# 13  
Old 12-04-2013
Thank you guys so much for all the help. I think I will spend my winter break learning how to use awk properly. Maybe I will try to learn FORTRAN as well. I know it would help with this kind of thing, but it seems way more complicated than shell scripting. I think it will help me in my future endeavors though. Again, thanks guys!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Print range of numbers

Hi I am getting an argument which specifies the range of numbers. eg: 7-15 Is there a way that i can easily (avoiding loop) print the range of number between and including the specified above. The above example should translate to 7,8,9,10,11,12,13,14,15 (3 Replies)
Discussion started by: tostay2003
3 Replies

2. Shell Programming and Scripting

grep for a range of numbers

Dear Friends, I want to know how to grep for the lines that has a number between given range(start and end). I have tried the following sed command. sed -n -e '/20030101011442/,/20030101035519/p' However this requires both start and end to be part of the content being grepped. However... (4 Replies)
Discussion started by: tamil.pamaran
4 Replies

3. UNIX for Dummies Questions & Answers

How to count how many numbers in a certain range?

Hi I have a data file with two columns which looks like: 1 42 2 40 3 55 4 50 5 38 6 49 7 33 8 46 9 39 10 33 11 33 12 26 13 46 14 44 15 55 16 54 17 30 18 32 (7 Replies)
Discussion started by: marhuu
7 Replies

4. UNIX for Dummies Questions & Answers

Frequency of a range of numbers

Hello, I have a column where there are values from 1 to 150. I want to get the frequency of values in the following ranges: 1-5 6-10 11-15 .... .... .... 146-150 How can I do this in a for loop? Thanks, Guss (1 Reply)
Discussion started by: Gussifinknottle
1 Replies

5. UNIX for Dummies Questions & Answers

List-to-Range of Numbers

Hello, I have two columns with data that look like this: Col1 Col2 ------ ----- a 1 a 2 a 3 a 4 a 7 a 8 a 9 a 10 a 11 b 6 b 7 b 8 b 9 b 14 (5 Replies)
Discussion started by: Gussifinknottle
5 Replies

6. Shell Programming and Scripting

read numbers from file and output which numbers belongs to which range

Howdy experts, We have some ranges of number which belongs to particual group as below. GroupNo StartRange EndRange Group0125 935300 935399 Group2006 935400 935476 937430 937459 Group0324 935477 935549 ... (6 Replies)
Discussion started by: thepurple
6 Replies

7. UNIX for Dummies Questions & Answers

Using grep on a range of numbers

Hi im new to unix and need to find a way to grep the top 5 numbers in a file and put them into another file. For example my file looks like this abcdef 50000 abcdef 45000 abcdef 40000 abcdef 35000 abcdef 30000 abcdef 25000 abcdef 20000 abcdef 15000 abcdef 10000 and so on... How can... (1 Reply)
Discussion started by: ProgChick2oo9
1 Replies

8. Shell Programming and Scripting

Help me streamline this counting part of my script.

Ok, so this is a small part of a script I wrote to build disk groups using VXVM. The only problem is that I am limited to a count of 8 maximum. If I want more, I will have to add more lines of "if" statements. How can I accomplish the same thing, in a few lines, but not be limited in the max... (13 Replies)
Discussion started by: LinuxRacr
13 Replies

9. Shell Programming and Scripting

Shell script to search through numbers and print the output

Suppose u have a file like 1 30 ABCSAAHSNJQJALBALMKAANKAMLAMALK 4562676268836826826868268468368282972982 2863923792102370179372012792701739729291 31 60... (8 Replies)
Discussion started by: cdfd123
8 Replies

10. Shell Programming and Scripting

grep numbers range

I want to grep a range of numbers in a log file. My log file looks like this: 20050807070609Z;blah blah That is a combination of yr,month,date,hours,minutes,seconds. I want to search in the log file events that happened between a particular time. like between 20050807070000 to 20050822070000... (1 Reply)
Discussion started by: azmathshaikh
1 Replies
Login or Register to Ask a Question