awk array problem


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk array problem
# 8  
Old 07-22-2011
stats don't seem accurate

Hi,

here are stats for 4 days, they do not seem right. The numbers "in" and "out" should be quite similar but they are not.

Code:
Date @2011-07-17 COUNT -323 IN 7 OUT 334 Estimate 323 bats(peak was at 22:01:32)
Date @2011-07-18 COUNT +107 IN 401 OUT 1 Estimate 107 bats(peak was at 08:07:31)
Date @2011-07-19 COUNT +158 IN 463 OUT 1 Estimate 158 bats(peak was at 22:20:57)
Date @2011-07-20 COUNT +451 IN 453 OUT 0 Estimate 451 bats(peak was at 08:37:30)
A 6737
B 6344
T 13081
H  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
A 533 479 539 412 1179 314 82 49 29 27 20 16 12 11  9  3  7 43 83 141 278 1414 385
B 464 535 525 455 419 130 68 49 33 31 26 17 11 10  5  3 11 51 87 152 265 2136 535

I have attached the relevant files if you would't mind having a look to see?
many thanks
# 9  
Old 07-22-2011
looks ok

hi, this looks much better since the ins are roughly = to outs
as one would expect.
I;m driving to germany on holiday with wife and kids tomorrow so cant look at it in detail for a few days - but like I said, it seems ok.
thanks so much for the help.
will let you know more as soon as i canSmilieSmilie
# 10  
Old 07-22-2011
Quote:
Originally Posted by cmp260
I was getting errors
Getting what errors? Do you mean the inaccuracy you mentioned earlier or something else?

You didn't include most of the files this needed to run in your zip, hunting for them.

---------- Post updated at 01:46 PM ---------- Previous update was at 01:32 PM ----------

I converted your data into one huge, long line and did a rough-and-ready count with grep and wc:

Code:
awk -v FS=, '{ printf(" %s,%s", $9, $10); }' < output.txt > output2.txt
$ grep -o "1,1 0,1 1,0 0,0" output3.txt  | wc -l
2603
$  $ grep -o "0,1 1,1 0,0 1,0" output3.txt  | wc -l
746

Do these numbers look reasonable? If not, these patterns aren't working.

---------- Post updated at 02:00 PM ---------- Previous update was at 01:46 PM ----------

I think I see inconsistencies in the patterns.

---------- Post updated at 02:20 PM ---------- Previous update was at 02:00 PM ----------

I think the A pattern in the program you attached was subtly wrong. You'd also set MAX to 0, which may have made it too picky. I think the problem with two events in the same second was due to problems with the A pattern and not because of MAX -- trying it with MAX=1 on your sample data it sees an A event immediately followed by a B event in the same second.

Code:
BEGIN { OT=0 # Time of previous measurement
		MAX=1	# Max num of seconds between valid events
		DAY="";	# Current day
		CA=0		;	CB=0         ; CX=0 ; CD=0
		# Running total of bats leaving and entering
		TOTALBATS=0;
		# The highest TOTALBATS has ever been
		MAXBATS=0;
		# Length of the patterns
		L=4
		# Patterns to check against
		# Block 1	unBlock 1	block 0	        Unblock 0
		A[0]="1,1";	A[1]="1,0";	A[2]="0,1";	A[3]="0,0";

		# Block 1	Block 0   	unblock 1	 Unblock 0
		X[0]="1,1";	X[1]="0,1";	X[2]="1,0";	X[3]="0,0";

		# Block 0 	Unblock 0       Block 1		Unblock 1
		B[0]="0,1";	B[1]="0,0";	B[2]="1,1";	B[3]="1,0";

		# Block 0 	block 1        unBlock 0	unblock 1
		D[0]="0,1";	D[1]="1,1";	D[2]="0,0";	D[3]="1,0";
 }
            
function print_daily(day,total,max,min,maxtime)
{
	I=total;	if(I<0)	I=-I;
	MX="no maximum"
	if(maxtime > 0)
		MX=sprintf("peak was at %s", strftime("%H:%M:%S",maxtime));

#	printf("COUNT@%s COUNT %+d RET %d LEFT %d GUESS %d (%s)\n",
	printf("Date @%s COUNT %+d IN %d OUT %d Estimate %d bats(%s)\n",
		day, total, max, -min, I, MX) > "/dev/stderr";

	# Reset daily counts
	TOTALBATS=0;	MAXBATS=0;	MINBATS=0;	MAXTIME=0;
	MINTIME=0;
}

{	# Calculate timestamp from date string
	T=mktime($1 " " $2 " " $3 " " $5 " " $6 " " $7);
         T+=(60*60*16); # Add sixteen hours
	$1=strftime("%Y", T);	# Put these back in the strings
	$2=strftime("%m", T);
	$3=strftime("%d", T);
	$5=strftime("%H", T);
	$6=strftime("%M", T);
	$7=strftime("%S", T);

	# When the year, month, and/or day changes, time to print daily counts
	if((DAY != $1 "-" $2 "-" $3) && (DAY != ""))
		print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	DAY=$1 "-" $2 "-" $3;

	if($8 == "pv")	# Ignore anything but PV lines.
	{
		# If too much time has passed since the last event, start over.
		if((T-OT) > MAX)	# Blank the array
			for(N=0; N<(L-1); N++)	C[N]="";
		else	# Shift elements toward the front
			for(N=0; N<(L-1); N++)	C[N]=C[N+1];

		OT=T	# Set prev time to this one.

		C[L-1]=$9 "," $10;	# Set the latest event in the array

		# Search for events in the array.
		FOUNDA=1;	FOUNDB=1;
		FOUNDX=1;	FOUNDD=1;
		for(N=0; N<L; N++)
		{
			if(A[N] != C[N]) FOUNDA=0;
			if(B[N] != C[N]) FOUNDB=0;
			if(X[N] != C[N]) FOUNDX=0;
			if(D[N] != C[N]) FOUNDD=0;
		}

		# Count the events and mark the hour they occurred in
		if(FOUNDA || FOUNDX)
		{
                        if(FOUNDX) CX++;
                        else       CA++;
			printf("A@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			AH[$5]++;
			TOTALBATS++;
		}

		if(FOUNDB || FOUNDD)
		{
                        if(FOUNDD) CD++;
                        else       CB++;

			printf("B@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			BH[$5]++;
			TOTALBATS--;
		}

		# Update our maximum daily counts
		if(MAXBATS < TOTALBATS)
		{
			MAXBATS=TOTALBATS;
			MAXTIME=T;
		}

		if(MINBATS > TOTALBATS)
		{
			MINBATS=TOTALBATS;
			MINTIME=T;
		}
	}
}
END {	# The final statistics will be printed to stderr, to easily
	# seperate them from the event times printed to stdout.

	# The last daily count
	print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	# Print the event counts
	printf("A %2d\nB %2d\nX %2d\nD %2d\nT %2d\n", CA, CB, CX, CD, CA+CB+CX+CD) > "/dev/stderr";

	# Print a list of hours from 1-23
	STR="H";
	for(N=1; N<=23; N++)	STR=STR sprintf(" %2d", N);;
	print STR > "/dev/stderr";

	# Print hourly counts for event A
	STR="A";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", AH[sprintf("%02d", N)]);
	print STR > "/dev/stderr";

	# Hourly counts for event B
	STR="B";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", BH[sprintf("%02d",N)]);
	print STR > "/dev/stderr";
	}

Does this look reasonable?
Code:
Date @2011-07-17 COUNT -261 IN 9 OUT 273 Estimate 261 bats(peak was at 22:01:32)
Date @2011-07-18 COUNT +217 IN 380 OUT 1 Estimate 217 bats(peak was at 08:07:31)
Date @2011-07-19 COUNT +266 IN 461 OUT 2 Estimate 266 bats(peak was at 22:20:49)
Date @2011-07-20 COUNT +428 IN 430 OUT 0 Estimate 428 bats(peak was at 08:37:30)
A 4956
B 6163
X 2603
D 746
T 14468
H  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
A 573 536 582 449 1225 321 84 50 29 27 21 16 12 11  9  3  7 46 85 152 300 1865 442
B 499 562 555 471 579 155 69 49 33 31 26 17 12 10  6  3 11 53 90 160 273 2313 561

It is matching the X and D patterns correctly if grep is to be believed..

---------- Post updated at 02:34 PM ---------- Previous update was at 02:20 PM ----------

You haven't detailed most of your requirements at all. If I wrote a CSV export it'd be almost guaranteed to not be the layout or even the data you wanted. Smilie

And your data about the bird rejection times is still too vague to use. Is that 8am to 5pm raw datalogger time, or 8am to 5pm in the "corrected" time? What about spring and fall?

Last edited by Corona688; 07-22-2011 at 05:53 PM..
This User Gave Thanks to Corona688 For This Post:
# 11  
Old 07-22-2011
hi,
the bird rejection time is not raw datalogger time since that is minus 16 hours from the actual time, it is 8am to 5pm corrected time. I suppose we could get really fancy and reference an array or lookup table of daily sunrise/sunset at partiular lattitude! -but probably easier to change the times manually for every 4 month data run...

something like this for csv would be sufficent for graphing:
2011-07-17, 261, 22:01:32
2011-07-18, 217, 08:07:31

presumably just modification the format of the output, no?

however, how do I go about ading A+X and b+d for the final in out daily stats?

Last edited by cmp260; 07-28-2011 at 06:28 PM.. Reason: more info
# 12  
Old 07-28-2011
hi,

just had a chance to look at this now. the stats look ok for each of the individual letters but how would I go about adding them - to get total in, total out and total max bats out by adding a+x and b+d?
# 13  
Old 07-29-2011
still problem with maximum estimated daily count

Hi,
I have modified the code as follows, however I cannot work out how to get the correct stats. In and out counts are correct but maximum/estimate should be 3 since there are 3 out at one time before they start to return.Smilie

any help appreciated... thanks

Code:
BEGIN { OT=0 # Time of previous measurement
MAX=1 # Max num of seconds between valid events
DAY=""; # Current day
CA=0 ; CB=0 ; CX=0 ; CD=0 # var to "hold" in or "out" A, C are one direction, B, X are the other dir
# Running total of bats leaving and entering
TOTALBATS=0;
# The highest TOTALBATS has ever been
MAXBATS=0;
# Length of the patterns
L=4
# Patterns to check against
# Block 1 unBlock 1 block 0 Unblock 0
#A[0]="1,1"; A[1]="1,0"; A[2]="0,1"; A[3]="0,0";
A[0]="1,1"; A[1]="1,0"; A[2]="0,1"; A[3]="0,0";
# Block 1 Block 0 unblock 1 Unblock 0
X[0]="1,1"; X[1]="0,1"; X[2]="1,0"; X[3]="0,0";
# Block 0 Unblock 0 Block 1 Unblock 1
B[0]="0,1"; B[1]="0,0"; B[2]="1,1"; B[3]="1,0";
# Block 0 block 1 unBlock 0 unblock 1
D[0]="0,1"; D[1]="1,1"; D[2]="0,0"; D[3]="1,0";
}
function print_daily(day,total,max,min,maxtime)
{
I=total; if(I<0) I=-I;
MX="no maximum"
if(maxtime > 0)
MX=sprintf("peak was at %s", strftime("%H:%M:%S",maxtime));
# printf("COUNT@%s COUNT %+d RET %d LEFT %d GUESS %d (%s)\n",
printf("Date @%s IN %+d OUT %d Max %d Estimate %d bats(%s)\n",
#day, total, max, -min, I, MX) > "/dev/stderr";
day, CA, CB, MAXBATS, I, MX) > "/dev/stderr";

# Reset daily counts
TOTALBATS=0; MAXBATS=0; MINBATS=0; MAXTIME=0;
MINTIME=0;  CA=0; CB=0;
}
{ # Calculate timestamp from date string
T=mktime($1 " " $2 " " $3 " " $5 " " $6 " " $7);
T+=(60*60*16); # Add sixteen hours
$1=strftime("%Y", T); # Put these back in the strings
$2=strftime("%m", T);
$3=strftime("%d", T);
$5=strftime("%H", T);
$6=strftime("%M", T);
$7=strftime("%S", T);
# When the year, month, and/or day changes, time to print daily counts
if((DAY != $1 "-" $2 "-" $3) && (DAY != ""))
print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);
DAY=$1 "-" $2 "-" $3;
if($8 == "pv") # Ignore anything but PV lines.
{
# If too much time has passed since the last event, start over.
if((T-OT) > MAX) # Blank the array
  for(N=0; N<(L-1); N++) C[N]="";
  else # Shift elements toward the front
  for(N=0; N<(L-1); N++) C[N]=C[N+1];
  OT=T # Set prev time to this one.
  C[L-1]=$9 "," $10; # Set the latest event in the array
  # Search for events in the array.
  FOUNDA=1; FOUNDB=1;
  FOUNDX=1; FOUNDD=1;
  for(N=0; N<L; N++)
  {
  if(A[N] != C[N]) FOUNDA=0;
  if(B[N] != C[N]) FOUNDB=0;
  if(X[N] != C[N]) FOUNDX=0;
  if(D[N] != C[N]) FOUNDD=0;
  }
  # Count the events and mark the hour they occurred in
  if(FOUNDA || FOUNDX)
  {
  # if(FOUNDX) CX++;
  # else
  CA++;
  printf("A@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
  AH[$5]++;
  TOTALBATS++;
  }
  if(FOUNDB || FOUNDD)
   {
 # if(FOUNDD) CD++;
 # else
 CB++;
printf("B@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
BH[$5]++;
TOTALBATS--;
}
# Update our maximum daily counts
if(MAXBATS < TOTALBATS)
{
MAXBATS=TOTALBATS;
MAXTIME=T;
}
#if(MINBATS > TOTALBATS)
#{
#MINBATS=TOTALBATS;
#MINTIME=T;
#}
}
}
END { # The final statistics will be printed to stderr, to easily
# seperate them from the event times printed to stdout.
# The last daily count
print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);
# Print the event counts
printf("A %2d\nB %2d\nX %2d\nD %2d\nT %2d\n", CA, CB, CX, CD, CA+CB+CX+CD) > "/dev/stderr";
# Print a list of hours from 1-23
STR="H";
for(N=1; N<=23; N++) STR=STR sprintf(" %2d", N);;
print STR > "/dev/stderr";
# Print hourly counts for event A
STR="A";
for(N=1; N<=23; N++)
STR=STR sprintf(" %2d", AH[sprintf("%02d", N)]);
print STR > "/dev/stderr";
# Hourly counts for event B
STR="B";
for(N=1; N<=23; N++)
STR=STR sprintf(" %2d", BH[sprintf("%02d",N)]);
    print STR > "/dev/stderr";
    }

the result I get is this:
Code:
Date @2011-07-19 IN +6 OUT 6 Max 1 Estimate 0 bats(peak was at 16:10:28)
A  0
B  0
X  0
D  0
T  0
H  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
A  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  6  0  0  0  0  0  0  0
B  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  6  0  0  0  0  0  0  0

using this as input data:
Code:
2011,07,19,Rx,00,10,28,pv,1,1
2011,07,19,Rx,00,10,28,pv,1,0
2011,07,19,Rx,00,10,28,pv,0,1
2011,07,19,Rx,00,10,28,pv,0,0
2011,07,19,Rx,00,10,28,pv,0,1
2011,07,19,Rx,00,10,28,pv,1,1
2011,07,19,Rx,00,10,28,pv,0,0
2011,07,19,Rx,00,10,28,pv,1,0
2011,07,19,Rx,00,10,29,pv,1,1
2011,07,19,Rx,00,10,29,pv,0,1
2011,07,19,Rx,00,10,29,pv,1,0
2011,07,19,Rx,00,10,29,pv,0,0
2011,07,19,Rx,00,36,57,pv,0,1
2011,07,19,Rx,00,36,57,pv,0,0
2011,07,19,Rx,00,36,57,pv,1,1
2011,07,19,Rx,00,36,57,pv,1,0
2011,07,19,Rx,00,37,10,pv,1,1
2011,07,19,Rx,00,37,10,pv,0,1
2011,07,19,Rx,00,37,10,pv,1,0
2011,07,19,Rx,00,37,10,pv,0,0
2011,07,19,Rx,00,41,30,pv,0,1
2011,07,19,Rx,00,41,30,pv,1,1
2011,07,19,Rx,00,41,30,pv,0,0
2011,07,19,Rx,00,41,30,pv,1,0
2011,07,19,Rx,00,41,31,pv,0,1
2011,07,19,Rx,00,41,31,pv,1,1
2011,07,19,Rx,00,41,31,pv,0,0
2011,07,19,Rx,00,41,31,pv,1,0
2011,07,19,Rx,00,41,27,pv,0,1
2011,07,19,Rx,00,41,27,pv,1,1
2011,07,19,Rx,00,41,27,pv,0,0
2011,07,19,Rx,00,41,27,pv,1,0
2011,07,19,Rx,00,41,28,pv,0,1
2011,07,19,Rx,00,41,28,pv,1,1
2011,07,19,Rx,00,41,28,pv,0,0
2011,07,19,Rx,00,41,28,pv,1,0
2011,07,19,Rx,00,41,29,pv,1,1
2011,07,19,Rx,00,41,29,pv,0,1
2011,07,19,Rx,00,41,29,pv,1,0
2011,07,19,Rx,00,41,29,pv,0,0
2011,07,19,Rx,00,41,31,pv,1,1
2011,07,19,Rx,00,41,31,pv,0,1
2011,07,19,Rx,00,41,31,pv,1,0
2011,07,19,Rx,00,41,31,pv,0,0
2011,07,19,Rx,00,41,32,pv,1,1
2011,07,19,Rx,00,41,32,pv,0,1
2011,07,19,Rx,00,41,32,pv,1,0
2011,07,19,Rx,00,41,32,pv,0,0

thanks
# 14  
Old 08-19-2011
hi, nudging the post in hope the last question gets answered

just nudging the post in hope the last question gets answeredSmilie

---------- Post updated 19-08-11 at 09:11 AM ---------- Previous update was 18-08-11 at 07:08 PM ----------

debugging I added this at lines 86 and 96
Code:
TOTALBATS++;
  print TOTALBATS > "/dev/stderr"; 

TOTALBATS--;
  print TOTALBATS > "/dev/stderr";

the resulting stats output shows what is happening:
Code:
-1
0
-1
0
-1
0
-1
-2
-3
-2
-3
-2
-3
-2
-3
-2
-1
-2
-1
-2
-1
-2
-3
-4
-5
-6
-5
-4
-3
-2
Date: @2011-07-19 IN: +14 OUT: 16 Maximum count: 0 mismatch= 2 (no maximum)
-1

If there is a negative maximum (this would happen if the logger was intalled in reverse -easily possible in the field) the result is that no maximum count is recorded -it should be 6 above.
any idea how I can achieve this?
thanksSmilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Index problem in associate array in awk

I am trying to reformat the table by filling any missing rows. The final table will have consecutive IDs in the first column. My problem is the index of the associate array in the awk script. infile: S01 36407 53706 88540 S02 69343 87098 87316 S03 50133 59721 107923... (4 Replies)
Discussion started by: yifangt
4 Replies

2. Shell Programming and Scripting

How to Assign an shell array to awk array?

Hello All, Can you please help me with the below. #!/bin/bash ARR="No Differences In Stage Between HASH_TOTALS & HASH_TOTALS_COMP For UNINUM:0722075 PROVIDER:5 EXTRACT_DT:30-SEP-12 VER_NUM:1" ARR="No Differences In Stage Between HASH_TOTALS & HASH_TOTALS_COMP For UNINUM:0722075 PROVIDER:5... (14 Replies)
Discussion started by: Ariean
14 Replies

3. Shell Programming and Scripting

Problem with awk array when loading from shell variable

Hi, I have a problem with awk array when iam trying to use awk in solaris box as below..Iam unable to figure out the problem.. Need your help. is there any alternative to make it in arrays from variable values nawk 'BEGIN {SUBSEP=" "; split("101880|110045 101887|110045 101896|110045... (9 Replies)
Discussion started by: cskumar
9 Replies

4. Shell Programming and Scripting

HELP with AWK one-liner. Need to employ an If condition inside AWK to check for array variable ?

Hello experts, I'm stuck with this script for three days now. Here's what i need. I need to split a large delimited (,) file into 2 files based on the value present in the last field. Samp: Something.csv bca,adc,asdf,123,12C bca,adc,asdf,123,13C def,adc,asdf,123,12A I need this split... (6 Replies)
Discussion started by: shell_boy23
6 Replies

5. Shell Programming and Scripting

Using awk array problem

I am trying to map values in the input file, where 2nd column depends on the specific value in the 1st column. When 1st column is A place 1 into 2nd column, when it is B, place 2, when C place 3, otherwise no change. My input: U |100|MAIN ST |CLMN1|1 A |200|GREEN LN |CLMN2|2 1 |12... (4 Replies)
Discussion started by: migurus
4 Replies

6. Shell Programming and Scripting

Challenging Awk array problem

Hi, I rather have a very complicated awk problem here, at least to me. I have two files. File 1: 607 687 174 0 0 chr1 3000001 3000156 -194195276 - L1_Mur2 LINE L1 -4310 1567 1413 1 607 917 214 114 45 chr1 3000237 ... (19 Replies)
Discussion started by: polsum
19 Replies

7. Shell Programming and Scripting

AWK help. how to compare a variable with a data array in AWK?

Hi all, i have a data array as follows. array=ertfgj2345 array=456ttygkd . . . array=errdjt3235 so number or elements in the array can varies depending on how big the data input is. now i have a variable, and it is $1 (there are $2, $3 and so on, i am only interested in $1). ... (9 Replies)
Discussion started by: usustarr
9 Replies

8. Shell Programming and Scripting

AWK Array problem

Dear All, I am facing problem to get right output through awk program I have file in which “B” value is appearing multiple time and I need to capture all these values. My script is BEGIN { FS=" " } { if ( substr($1,1,5) == "START" ) { i =... (2 Replies)
Discussion started by: arvindng
2 Replies

9. Shell Programming and Scripting

Problem with lookup values on AWK associative array

I'm at wits end with this issue and my troubleshooting leads me to believe it is a problem with the file formatting of the array referenced by my script: awk -F, '{if (NR==FNR) {a=$4","$3","$2}\ else {print a "," $0}}' WBTSassignments1.txt RNCalarms.tmp On the WBTSassignments1.txt file... (2 Replies)
Discussion started by: JasonHamm
2 Replies

10. Shell Programming and Scripting

awk array problem

hi i am trying to perform some calculations with awk and arrays. i have this so far: awk 'NR==FNR{ for(i=1; i<=NF; i++) {array+=$i} tot++;next} {for(i=1; i<=NF; i++) {avg=array/tot} {diff=(array - avg)}} {for(i=1; i<=NF; i++) {printf("%5.8f\n",diff)}}' "$count".txt "$count".ttt >... (4 Replies)
Discussion started by: npatwardhan
4 Replies
Login or Register to Ask a Question