09-11-2013
Strange behaviour of arrays in awk

Imagine 2 files f1 f2:

file1_l1_c1 code_to_find file1_l1_c3
file1_l2_c1 file1_code2 file1_l2_c3
file1_l3_c1 file1_code3 file1_l3_c3

file2_l1_c1 file2_l1_c2 code_to_find
file2_l2_c1 file2_l2_c2 file2_code5
file2_l3_c1 file2_l3_c2 file2_code3

Say we want to print lines from f2 having "code_to_find" as $3. I go the classical way with a
FNR == NR && /file1_l1/ {
	code[$2] = 1

code[$3] {

As expected the output is: file2_l1_c1 file2_l1_c2 code_to_find

Now, if I print the code[] array in the END block:END{ for (i in code) print "code[" i "]=" code[i]}

I would have expected that block to produce the only code[index] with a value i.e. code[code_to_find]=1

But to my great surprise, it returns this:

How come that awk assigns the NULL value to the array with $3 from all files as index? Kind of weird to me.
09-11-2013
The array was loaded in the FNR==NR bracket.

{ if (cde[$3]) {print } else  { delete cde[$3] }; }

END{ for (i in cde) if (cde[i]) print i, "code[" i "]=" cde[i]}

09-11-2013
(If i understand your question):
If your first bloc is false, your second is execute, this is like:
$ cat te_ak 
$ awk -F_ 'code[$1_$2]{ print }; END{ for (i in code) print "code[" i "]=" code[i]} ' te_ak 

09-11-2013
Hi Ripat,

IMO the problem is in this section:
code[$3] {

This is not only a condition, but it also creates an array element code[$3] with an empty value

If you use:
$3 in code {

Then it should work as expected...
09-11-2013
Originally Posted by Scrutinizer
This is not only a condition, but it also creates an array element code[$3] with an empty value
Indeed and that's exactly what I find weird. With code[$3] in the second block I was expecting awk to *evaluate* the value of code[$3] *not* to assign any value to it, albeit NULL.

awk 'foo="bar"{print "block 1"} END{print foo}' f1

Returns bar.

foo="bar" assigns "bar" to foo and returns a TRUE. No problem with that. But in the condition of the second block code[$3] there is no assignment sign and it still assigns a value. I can't stop finding it weird.

Furthermore, if you look to my code above and its return.

FNR == NR && /file1_l1/ {
	code[$2] = 1

code[$3] {


The instruction next should make the program to loop on the first file until it reaches the end of file1. Then it continues with the second file, right? I understand that the condition of the second block assigns a value while evaluating code[$3] but how come that it assigns values from the first file as the pointer NR is already on the second file? See my point?

09-11-2013
That is standard awk behaviour, arrays are not declared. If you refer to a non-existing array element, it automatically creates it. It does not assign an empty value, but rather it creates an unitialized array element with an empty value.. To test the presence of an array element without creating it, you need the index in array expression.

As for the second part. No, not exactly because of the first condition, which makes that the second part gets executed for some of the lines in file1. Try:

FNR==NR { 
  if (/file1_l1/) code[$2] = 1

