rvan,
Can you try with 'optimization' off in your compiler flags?(whats your OS and machine)
On most platforms, size of word is size of pointer(address) or size of register which can hold address in processor's ALU. (it is sizeof(long))
It is 4 bytes on 32 bit machine, and therefore compiler is aligning your data(variables) at 4 byte boundary or it stores variables at address multiple of 4 (divisible by 4).
This makes processor access data efficiently. Each read operation from main memory, reads a word (4 bytes), so if your data is not aligned, it needs to do more reads than it could with proper alignment
In case of array, it can not pad bytes at the end (so that next variable begins at address divisible by 4); how will it find out next element in array otherwise? Then it has to maintain then size of padding; which may be different for each array variable, as it's addresses may be different; and so whole meaning of efficiency is rendered meaningless.
Try reading on SIGBUS signal and also on structure padding/alignment.
~Thanks