Sponsored Content
Full Discussion: awk array problem
Top Forums Shell Programming and Scripting awk array problem Post 302541151 by Corona688 on Friday 22nd of July 2011 04:34:03 PM
Old 07-22-2011
Quote:
Originally Posted by cmp260
I was getting errors
Getting what errors? Do you mean the inaccuracy you mentioned earlier or something else?

You didn't include most of the files this needed to run in your zip, hunting for them.

---------- Post updated at 01:46 PM ---------- Previous update was at 01:32 PM ----------

I converted your data into one huge, long line and did a rough-and-ready count with grep and wc:

Code:
awk -v FS=, '{ printf(" %s,%s", $9, $10); }' < output.txt > output2.txt
$ grep -o "1,1 0,1 1,0 0,0" output3.txt  | wc -l
2603
$  $ grep -o "0,1 1,1 0,0 1,0" output3.txt  | wc -l
746

Do these numbers look reasonable? If not, these patterns aren't working.

---------- Post updated at 02:00 PM ---------- Previous update was at 01:46 PM ----------

I think I see inconsistencies in the patterns.

---------- Post updated at 02:20 PM ---------- Previous update was at 02:00 PM ----------

I think the A pattern in the program you attached was subtly wrong. You'd also set MAX to 0, which may have made it too picky. I think the problem with two events in the same second was due to problems with the A pattern and not because of MAX -- trying it with MAX=1 on your sample data it sees an A event immediately followed by a B event in the same second.

Code:
BEGIN { OT=0 # Time of previous measurement
		MAX=1	# Max num of seconds between valid events
		DAY="";	# Current day
		CA=0		;	CB=0         ; CX=0 ; CD=0
		# Running total of bats leaving and entering
		TOTALBATS=0;
		# The highest TOTALBATS has ever been
		MAXBATS=0;
		# Length of the patterns
		L=4
		# Patterns to check against
		# Block 1	unBlock 1	block 0	        Unblock 0
		A[0]="1,1";	A[1]="1,0";	A[2]="0,1";	A[3]="0,0";

		# Block 1	Block 0   	unblock 1	 Unblock 0
		X[0]="1,1";	X[1]="0,1";	X[2]="1,0";	X[3]="0,0";

		# Block 0 	Unblock 0       Block 1		Unblock 1
		B[0]="0,1";	B[1]="0,0";	B[2]="1,1";	B[3]="1,0";

		# Block 0 	block 1        unBlock 0	unblock 1
		D[0]="0,1";	D[1]="1,1";	D[2]="0,0";	D[3]="1,0";
 }
            
function print_daily(day,total,max,min,maxtime)
{
	I=total;	if(I<0)	I=-I;
	MX="no maximum"
	if(maxtime > 0)
		MX=sprintf("peak was at %s", strftime("%H:%M:%S",maxtime));

#	printf("COUNT@%s COUNT %+d RET %d LEFT %d GUESS %d (%s)\n",
	printf("Date @%s COUNT %+d IN %d OUT %d Estimate %d bats(%s)\n",
		day, total, max, -min, I, MX) > "/dev/stderr";

	# Reset daily counts
	TOTALBATS=0;	MAXBATS=0;	MINBATS=0;	MAXTIME=0;
	MINTIME=0;
}

{	# Calculate timestamp from date string
	T=mktime($1 " " $2 " " $3 " " $5 " " $6 " " $7);
         T+=(60*60*16); # Add sixteen hours
	$1=strftime("%Y", T);	# Put these back in the strings
	$2=strftime("%m", T);
	$3=strftime("%d", T);
	$5=strftime("%H", T);
	$6=strftime("%M", T);
	$7=strftime("%S", T);

	# When the year, month, and/or day changes, time to print daily counts
	if((DAY != $1 "-" $2 "-" $3) && (DAY != ""))
		print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	DAY=$1 "-" $2 "-" $3;

	if($8 == "pv")	# Ignore anything but PV lines.
	{
		# If too much time has passed since the last event, start over.
		if((T-OT) > MAX)	# Blank the array
			for(N=0; N<(L-1); N++)	C[N]="";
		else	# Shift elements toward the front
			for(N=0; N<(L-1); N++)	C[N]=C[N+1];

		OT=T	# Set prev time to this one.

		C[L-1]=$9 "," $10;	# Set the latest event in the array

		# Search for events in the array.
		FOUNDA=1;	FOUNDB=1;
		FOUNDX=1;	FOUNDD=1;
		for(N=0; N<L; N++)
		{
			if(A[N] != C[N]) FOUNDA=0;
			if(B[N] != C[N]) FOUNDB=0;
			if(X[N] != C[N]) FOUNDX=0;
			if(D[N] != C[N]) FOUNDD=0;
		}

		# Count the events and mark the hour they occurred in
		if(FOUNDA || FOUNDX)
		{
                        if(FOUNDX) CX++;
                        else       CA++;
			printf("A@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			AH[$5]++;
			TOTALBATS++;
		}

		if(FOUNDB || FOUNDD)
		{
                        if(FOUNDD) CD++;
                        else       CB++;

			printf("B@%s-%s-%s %s:%s:%s\n",$1,$2,$3,$5,$6,$7);
			BH[$5]++;
			TOTALBATS--;
		}

		# Update our maximum daily counts
		if(MAXBATS < TOTALBATS)
		{
			MAXBATS=TOTALBATS;
			MAXTIME=T;
		}

		if(MINBATS > TOTALBATS)
		{
			MINBATS=TOTALBATS;
			MINTIME=T;
		}
	}
}
END {	# The final statistics will be printed to stderr, to easily
	# seperate them from the event times printed to stdout.

	# The last daily count
	print_daily(DAY,TOTALBATS,MAXBATS,MINBATS,MAXTIME);

	# Print the event counts
	printf("A %2d\nB %2d\nX %2d\nD %2d\nT %2d\n", CA, CB, CX, CD, CA+CB+CX+CD) > "/dev/stderr";

	# Print a list of hours from 1-23
	STR="H";
	for(N=1; N<=23; N++)	STR=STR sprintf(" %2d", N);;
	print STR > "/dev/stderr";

	# Print hourly counts for event A
	STR="A";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", AH[sprintf("%02d", N)]);
	print STR > "/dev/stderr";

	# Hourly counts for event B
	STR="B";
	for(N=1; N<=23; N++)
		STR=STR sprintf(" %2d", BH[sprintf("%02d",N)]);
	print STR > "/dev/stderr";
	}

Does this look reasonable?
Code:
Date @2011-07-17 COUNT -261 IN 9 OUT 273 Estimate 261 bats(peak was at 22:01:32)
Date @2011-07-18 COUNT +217 IN 380 OUT 1 Estimate 217 bats(peak was at 08:07:31)
Date @2011-07-19 COUNT +266 IN 461 OUT 2 Estimate 266 bats(peak was at 22:20:49)
Date @2011-07-20 COUNT +428 IN 430 OUT 0 Estimate 428 bats(peak was at 08:37:30)
A 4956
B 6163
X 2603
D 746
T 14468
H  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
A 573 536 582 449 1225 321 84 50 29 27 21 16 12 11  9  3  7 46 85 152 300 1865 442
B 499 562 555 471 579 155 69 49 33 31 26 17 12 10  6  3 11 53 90 160 273 2313 561

It is matching the X and D patterns correctly if grep is to be believed..

---------- Post updated at 02:34 PM ---------- Previous update was at 02:20 PM ----------

You haven't detailed most of your requirements at all. If I wrote a CSV export it'd be almost guaranteed to not be the layout or even the data you wanted. Smilie

And your data about the bird rejection times is still too vague to use. Is that 8am to 5pm raw datalogger time, or 8am to 5pm in the "corrected" time? What about spring and fall?

Last edited by Corona688; 07-22-2011 at 05:53 PM..
This User Gave Thanks to Corona688 For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk array problem

hi i am trying to perform some calculations with awk and arrays. i have this so far: awk 'NR==FNR{ for(i=1; i<=NF; i++) {array+=$i} tot++;next} {for(i=1; i<=NF; i++) {avg=array/tot} {diff=(array - avg)}} {for(i=1; i<=NF; i++) {printf("%5.8f\n",diff)}}' "$count".txt "$count".ttt >... (4 Replies)
Discussion started by: npatwardhan
4 Replies

2. Shell Programming and Scripting

Problem with lookup values on AWK associative array

I'm at wits end with this issue and my troubleshooting leads me to believe it is a problem with the file formatting of the array referenced by my script: awk -F, '{if (NR==FNR) {a=$4","$3","$2}\ else {print a "," $0}}' WBTSassignments1.txt RNCalarms.tmp On the WBTSassignments1.txt file... (2 Replies)
Discussion started by: JasonHamm
2 Replies

3. Shell Programming and Scripting

AWK Array problem

Dear All, I am facing problem to get right output through awk program I have file in which “B” value is appearing multiple time and I need to capture all these values. My script is BEGIN { FS=" " } { if ( substr($1,1,5) == "START" ) { i =... (2 Replies)
Discussion started by: arvindng
2 Replies

4. Shell Programming and Scripting

AWK help. how to compare a variable with a data array in AWK?

Hi all, i have a data array as follows. array=ertfgj2345 array=456ttygkd . . . array=errdjt3235 so number or elements in the array can varies depending on how big the data input is. now i have a variable, and it is $1 (there are $2, $3 and so on, i am only interested in $1). ... (9 Replies)
Discussion started by: usustarr
9 Replies

5. Shell Programming and Scripting

Challenging Awk array problem

Hi, I rather have a very complicated awk problem here, at least to me. I have two files. File 1: 607 687 174 0 0 chr1 3000001 3000156 -194195276 - L1_Mur2 LINE L1 -4310 1567 1413 1 607 917 214 114 45 chr1 3000237 ... (19 Replies)
Discussion started by: polsum
19 Replies

6. Shell Programming and Scripting

Using awk array problem

I am trying to map values in the input file, where 2nd column depends on the specific value in the 1st column. When 1st column is A place 1 into 2nd column, when it is B, place 2, when C place 3, otherwise no change. My input: U |100|MAIN ST |CLMN1|1 A |200|GREEN LN |CLMN2|2 1 |12... (4 Replies)
Discussion started by: migurus
4 Replies

7. Shell Programming and Scripting

HELP with AWK one-liner. Need to employ an If condition inside AWK to check for array variable ?

Hello experts, I'm stuck with this script for three days now. Here's what i need. I need to split a large delimited (,) file into 2 files based on the value present in the last field. Samp: Something.csv bca,adc,asdf,123,12C bca,adc,asdf,123,13C def,adc,asdf,123,12A I need this split... (6 Replies)
Discussion started by: shell_boy23
6 Replies

8. Shell Programming and Scripting

Problem with awk array when loading from shell variable

Hi, I have a problem with awk array when iam trying to use awk in solaris box as below..Iam unable to figure out the problem.. Need your help. is there any alternative to make it in arrays from variable values nawk 'BEGIN {SUBSEP=" "; split("101880|110045 101887|110045 101896|110045... (9 Replies)
Discussion started by: cskumar
9 Replies

9. Shell Programming and Scripting

How to Assign an shell array to awk array?

Hello All, Can you please help me with the below. #!/bin/bash ARR="No Differences In Stage Between HASH_TOTALS & HASH_TOTALS_COMP For UNINUM:0722075 PROVIDER:5 EXTRACT_DT:30-SEP-12 VER_NUM:1" ARR="No Differences In Stage Between HASH_TOTALS & HASH_TOTALS_COMP For UNINUM:0722075 PROVIDER:5... (14 Replies)
Discussion started by: Ariean
14 Replies

10. Shell Programming and Scripting

Index problem in associate array in awk

I am trying to reformat the table by filling any missing rows. The final table will have consecutive IDs in the first column. My problem is the index of the associate array in the awk script. infile: S01 36407 53706 88540 S02 69343 87098 87316 S03 50133 59721 107923... (4 Replies)
Discussion started by: yifangt
4 Replies
All times are GMT -4. The time now is 08:41 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy