Awk getting statistics of a grid file,


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk getting statistics of a grid file,
# 1  
Old 05-31-2011
Awk getting statistics of a grid file,

Hi ,
I have the following file which is basically a grid (has more than 100000 rows)
LLL1 PPP1
LLL1 PPP2
LLL1 PPP3
...............
LLL1 5500
.....
LLL2 PPP1
LLL2 PPP2
LLL2 PPP3
...............
LLL1 5500
.....
L100 PPP1
L100 PPP2
L100 PPP3
...............
2100 5500
..........

Using a mathematical formula I have to assign to each point a grid value. Let's say the formula is grid = int((PPP1+7)/8-int(PPP1/8). The input file will become:
Line Point Grid
LLL1 PPP1 1
LLL1 PPP2 1
LLL1 PPP3 2
...............
LLL1 5500 150
.....
LLL2 PPP1 1
LLL2 PPP2 1
LLL2 PPP3 2
...............
LLL2 5300 150
.....
L100 PPP1 1
L100 PPP2 1
L100 PPP3 2
...............
2100 5500 150


I need a summary of the unique values of the grid and some statistics as follows:


Grid #1
LLLL #
LLL1
LLL2
LLL3
....
L100
...
2100
FROM
PPPP
PPP1
PPP1
PPP1
...
PPP1
.....
PPP1
TO
PPPP
5550
5300
5100
...
5000
...
5500
Total
XXXX
XXXX
XXXX
...
XXXX
...
XXX

Grid #2
LLLL #
LLL2
LLL3
....
L100
...
2100
FROM
PPPP
PPP1
PPP1
...
PPP1
.....
PPP1
TO
PPPP
5300
5100
...
5000
...
5500
Total
XXXX
XXXX
...
XXXX
...
XXX

Grid #3
LLLL #
LLL1
LLL2
LLL3
....
L100
...
2100
FROM
PPPP
PPP1
PPP1
PPP1
...
PPP1
.....
PPP1
TO
PPPP
5550
5300
5100
...
5000
...
5500
Total
XXXX
XXXX
XXXX
...
XXXX
...
XXX



Thanks in advance for any help.
# 2  
Old 05-31-2011
PPP1, PPP2 is number or others in your read data?

I can't get 150 from 5500 with your formula : int(5500+7)/8-int(5500/8), so you need explain how you get 150 with 5500?
# 3  
Old 06-01-2011
First of all thanks for you reply. In order to get the grid value we'll have to keep PPP1 in the second equation as absolute reference and to modify only first PPP1. If the PPP1 is 1001 than the formula will become int((5500+7)/8) - int(1100/8) = 563 (my mistake with 150). For another point let's say 2100 the grid values are (((2100+7)/8) - int(1100/8) = 138.
Hope this helps
Many Thanks
# 4  
Old 06-01-2011
do you mean, the second var will always be 1100?
how from

Code:
LLL1 PPP1
LLL1 PPP2
LLL1 PPP3

to

Code:
LLL1 PPP1 1
LLL1 PPP2 1
LLL1 PPP3 2

Please put some read data, (replace ppp1, ppp2 with real number) and provide the expect output.
# 5  
Old 06-01-2011
Hi,
Please find attached an excel file with a real example of the grid.
The PPPP var will not be always 1100, only in the 2nd part of the formula I need to keep the 1100 fixed as a reference for the grid.
Many thanks
# 6  
Old 06-02-2011
I still don't understand the 1100 come from, but let we start it first.

Do you ask for something like this:

Code:
awk '{print $0, int(($1+7)/8)-int(1100/8)}' infile

5001    1001 489
5001    1002 489
5001    1003 489
5001    1004 489
5001    1005 489
5001    1006 489
5001    1007 489
5001    1008 489
5001    1009 489
5001    1010 489

or

 awk '{print $0, int(($1+7)/8)-int($2/8)}' infile

5001    1001 501
5001    1002 501
5001    1003 501
5001    1004 501
5001    1005 501
5001    1006 501
5001    1007 501
5001    1008 500
5001    1009 500

# 7  
Old 06-02-2011
Hello,
Please find bellow the formula that will give us the grid values:
Code:
$ awk '{print $0, int(($2+7)/8)-int(1100/8)}' grid.test | more
5001                1001 -11
5001                1002 -11
5001                1003 -11
5001                1004 -11
5001                1005 -11
5001                1006 -11
5001                1007 -11
5001                1008 -11
5001                1009 -10
5001                1010 -10
5001                1011 -10
5001                1012 -10
5001                1013 -10
5001                1014 -10
5001                1015 -10
5001                1016 -10
5001                1017 -9
5001                1018 -9
5001                1019 -9
5001                1020 -9
5001                1021 -9
5001                1022 -9
 
------------------------
5001                1076 -2
5001                1077 -2
5001                1078 -2
5001                1079 -2
5001                1080 -2
5001                1081 -1
5001                1082 -1
5001                1083 -1
5001                1084 -1
5001                1085 -1
5001                1086 -1
5001                1087 -1
5001                1088 -1
5001                1089 0
5001                1090 0
5001                1091 0
5001                1092 0
5001                1093 0
5001                1094 0
5001                1095 0
5001                1096 0
5001                1097 1
5001                1098 1
5001                1099 1
5001                1100 1
5001                1101 1
5001                1102 1
5001                1103 1
5001                1104 1
5001                1105 2
5001                1106 2
5001                1107 2
5001                1108 2
5001                1109 2
5001                1110 2
5001                1111 2
5001                1112 2
5001                1113 3
5001                1114 3
5001                1115 3
5001                1116 3
5001                1117 3
5001                1118 3
5001                1119 3
5001                1120 3
5001                1121 4
5001                1122 4
5001                1123 4
5001                1124 4
5001                1125 4

1100 will define the grid origin (if you'll put 1001 than 1001 will be the grid origin).
Thanks

Last edited by Franklin52; 06-02-2011 at 10:35 AM.. Reason: Please use code tags for code and data examples
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need optimized awk/perl/shell to give the statistics for the Large delimited file

I have a file size is around 24 G with 14 columns, delimiter with "|" My requirement- can anyone provide me the fastest and best to get the below results Number of records of the file First column and second Column- Unique counts Thanks for your time Karti ------ Post updated at... (3 Replies)
Discussion started by: kartikirans
3 Replies

2. Red Hat

CPU Usage statistics Dump in a text file over a period of time

I am facing issue related to performance of one customized application running on RHEL 5.9. The application stalls for some unknown reason that I need to track. For that I require some tool or shell scripts that can monitor the CPU usage statistics (what we get in TOP or in more detail by other... (6 Replies)
Discussion started by: Anjan Ganguly
6 Replies

3. Shell Programming and Scripting

awk based script to print the "mode(statistics term)" for each column in a data file

Hi All, Thanks all for the continued support so far. Today, I need to find the most occurring string/number(also called mode in statistics terminology) for each column in a data file (.csv type). For one column of data(1.txt) like below Sample 1 2 2 3 4 1 1 1 2 I can find the mode... (6 Replies)
Discussion started by: ks_reddy
6 Replies

4. Shell Programming and Scripting

Writing only timing statistics output of Timer to File

I'm running long integrations on a remote server, and I'm working in terminal in a tcsh shell. I'm looking to write ONLY the timing statistics to a file. For example: $time ls >timer.out writes both the files in my current directory & the timer statistics to the file timer.out. I only... (2 Replies)
Discussion started by: elemonier
2 Replies

5. Programming

C++ Help with file handle and simple statistics problem asking...

Input_file: >header_1 ASDFFDASADASF >header_2 ASDSAFASDAAFFFAFA Desired output file: 30 Source Code try: // reading a text file #include <iostream> #include <fstream> #include <string> using namespace std; int main () { string line; ifstream myfile ("Input_file"); (3 Replies)
Discussion started by: perl_beginner
3 Replies

6. Shell Programming and Scripting

statistics using awk

Hi, I have 3 columns in a file listed below. X Y X/(X+Y) 1 1 0.5 1 1 0.5 4 1 0.8 1 1 0.5 6 1 0.857142857 1 1 0.5 23 1 0.958333333 Now I want to find confidence interval using the formula for each row. (p-2 sqrt p(1-p)/(x+y), p+2... (7 Replies)
Discussion started by: Diya123
7 Replies

7. Shell Programming and Scripting

AWK- extracting values from columns, saving them and gettins statistics

Hello, I am obviously quite new to unix and awk. I need to parse certain columns of a file (delimited by spaces), and somehow save the value of this column somewhere, together with the value of the column just after it (by pairs; so something like ). I'm then supposed to count the times that... (9 Replies)
Discussion started by: acsg
9 Replies

8. UNIX for Dummies Questions & Answers

file statistics??

Is there any command in Unix (HP-UX) which will give me the file statistics .. e.g I have a file called r001 if I created that file on 2nd of aug 2005 and till now I changed that file contents 10 times. So how will I get the history statistic(time & date) of file modification. (1 Reply)
Discussion started by: zing_foru
1 Replies

9. Shell Programming and Scripting

How to create a grid file

Hi everybody: I want to create a grid file for export to statistical program. My aid is create a file with both rows, one row are x coordenates and other for y coordenates. All grid obviousolly are same space. the form that i want is this: x=(400000 ........ 600000) and y=(4000000 .......... (1 Reply)
Discussion started by: tonet
1 Replies
Login or Register to Ask a Question