No, it is not a bug, but I did not explain it well enough at the beginning.
Still, the block before NR==FNR is hard for me. When and how is this block executed at reading the stream of the two files file1 & file2?
I think I understand the syntax each row, but not the orders as compared with read in A[i] and B[i] after NR==FNR.
Thanks a lot again!
Your stated requirements and your examples don't match!
You talk about wanting to create a list for S01 through S96, but your samples only show data for S01 through S12.
As Scrutinizer said, your sample file1 contains the line:
but your file1_padded contains the line:
and your file_joined contains the line:
I don't understand why the 1290 in your input file became 0 in your output files???
I can almost produce the output you said you want with:
I say almost because the output from the above is with the difference noted above marked in red:
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk.
If you want output for S01 through S096 instead of limiting output to the highest SXX value found in your input files, delete the two lines of code in the awk script shown in red.
This User Gave Thanks to Don Cragun For This Post:
This is related to my understanding how awk works on the streams. I have difficulty to catch the order of the blocks executed when reading file1 & file2.
The BEGIN{} and END{} blocks are obvious.
Is the second block (i.e. i=$1; $1="") saved in memory when going through the two files? $1="" is to used to empty $1, if not, the IDs in A[i] and B[i] will be concatenated IDs when that ID exists in both files.
But I do not get how $1="" works here.
Output is:
1) How does $1="" works behind in awk?
2) It is confusing to me with i in A?A[i]:zero which seems to me as comparing the value with the key of the associate array? What did I miss?
Can you explain for me? Thanks!
----------------------------------------------------------------------------------------------------------------
Thanks Don!
Your reply was posted while I was preparing mine!
There are 96 rows in my real file, but I only used 12 for the post to cut the space.
My apology for the S05 confusion, due to many copy-paste.
If we take Scrutinizer's code:
and duplicate the code you seem to be having problems understanding into a longer script that does exactly the same thing:
does it make it any easier for you to understand?
If the first field is not cleared before $0 is saved in A[i] and B[i], printing i and A[i] or B[i] in the END clause prints the contents of field one from the input lines twice (as you see in your output when you removed the following line:
).
The expression print i (i in A?A[i]:zero) prints the contents of i followed by:
the contents of A[i] if i is an index in A[], or
the contents of the variable zero if i is NOT an index in A[].
Factoring out the common code (as Scrutinizer did) makes it easier to make updates. If one of those three lines needs to be updated at some point in the future, you don't have to notice that the same code has to be changed twice if you only have one copy of the code in your script in the first place.
This User Gave Thanks to Don Cragun For This Post:
Thanks Don!
Now the topic has deviated from the original title!
I was trying to catch the two points of awk in more fundamental details these days, not only merely remember the syntax, then swallow and use it; Point 1: how awk process the two files line by line in this very example? The block
gave me a very good example to understand it, especially when I tried your expanded code from Scruntinizer's, the output is not sorted as I am expecting as my two original files are sorted in ascending order respectively:
It seems to me stack/heap is related, but not quite sure here. Point 2: I understand the ternary comparison i in A?A[i]:zero), but not quite well with why the associate array in awk, i in A means "ifAhas an indexi", especially when you compare with similar perl hash, python dictionary, or C-style array. There must be a good reason for this.
Thanks Don!
Now the topic has deviated from the original title!
I was trying to catch the two points of awk in more fundamental details these days, not only merely remember the syntax, then swallow and use it; Point 1: how awk process the two files line by line in this very example? The block
gave me a very good example to understand it, especially when I tried your expanded code from Scruntinizer's, the output is not sorted as I am expecting as my two original files are sorted in ascending order respectively:
It seems to me stack/heap is related, but not quite sure here. Point 2: I understand the ternary comparison i in A?A[i]:zero), but not quite well with why the associate array in awk, i in A means "ifAhas an indexi", especially when you compare with similar perl hash, python dictionary, or C-style array. There must be a good reason for this.
Did you try the code I suggested? If so, did it do what you want? Do you understand how it works?
I wrote it explicitly because Scrutinizer's code does not guarantee ordered output (which you did not explicitly, originally request), does not produce any output for SXX values that are not present in either of your input files (which was not clear in your original request), and uses tabs in places where your input files use spaces . (I find the output with the tabs used by Scrutinzer easier to read, but that is not what you asked for; I tried to produce the output you said you wanted.)
I will claim to be a reasonably proficient writer of programs using C, shell, awk, and most other POSIX standard utilities. I used to also be a capable writer of FORTRAN, COBOL, ALGOL, Lisp, Snobol, Java, PL/I, and a few dozen various assembler languages; but if I wanted to use these languages again now, I'd need to take some time to familiarize myself with them again. I don't claim to have ever been a passable author of perl or python code. (So I won't attempt to compare C arrays nor awk arrays to a perl hash nor a python library.)
In C you declare an array and its dimensions before you use it. Your C compiler will determine how many dimensions an array can have. In C, an index into an array is always an integral type and indices (for each dimension in the array) start at 0 and run up to one less than the declared size for that dimension.
In awk, you never declare an array. Whenever you reference an array element, an array is created (if the array didn't already exist) and the index used in that reference is added to the array. Array indices in awk can be integral values, floating point values (although floating point types are really dangerous unless you really know what you're doing), or strings. Integral array indices in awk need not be contiguous. The awk index in array operator returns true (1) if there is the element array[index] and returns false (0) otherwise. In awk, there is no inherent order in array elements and the awk for loop that walks through all of the elements of an array:
does so in what should be assumed to be a random order. (Although with any awk I've used, if you walk through the elements of an array twice without adding or removing elements to or from the array between those walks, you'll get elements listed in the same order on both walks.) Nonetheless, if you use contiguous integral values when you create array elements, you can walk through that array in order using a normal awk for loop:
Note also that awk arrays only have one dimension. You can simulate multidimensional arrays in awk by separating dimension indices with a comma (or, equivalently, the awk variable SUBSEP). (Note that SUBSEP has a default value assigned by awk, but the value used varies from system to system. On many systems, the SUBSEP string is defined to be the ASCII FS character (octal value 034).) And, whereas the commands:
and:
are equivalent, the array index specified by:
and by:
are also equivalent.
Hi,
I have a data sample as shown below. I want to fill in the left column so that the line will be continuous. For example, between 1 and 5 should be 2,3,4. And corresponding values in the right column will be 0. Thus the expected data should look like that:
1 1
1 10
1 2
1 3
1 5
1 6
2 0... (6 Replies)
I have a time series data like this
40754,35.6931,51.3092,201610160700,21.0
40754,35.6931,51.3092,201610160800,23.0
40754,35.6931,51.3092,201610160900,24.0
40754,35.6931,51.3092,201610161000,24.0
40754,35.6931,51.3092,201610161300,25.0
40754,35.6931,51.3092,201610161400,23.0... (6 Replies)
Hi Am using unix Aix Ksh
Have Created table called vv
and i have inserted two date
Select * from vv;
Output :-
New_date
21/02/2013
24/02/2013
I have tried Using One query but Unsuccessful so far..
SELECT l.new_date + '1 day' as miss
from vv as l (7 Replies)
hello everyone,
I have a task to input missing data into a file. example of my data below:
Wed Feb 01 09:00:02 EST 2012,,,0.4,0.3,,0.3,,0.3,,0.5,,0.3,,,0.4,0.3,
Wed Feb 01 09:00:11 EST 2012,,,,,,,0.2,,,,,,,,,,
Wed Feb 01 09:00:22 EST... (23 Replies)
Hi All,
I have 100 .txt files which look like this:
3
4
5
6
7
Now, some files have some numbers missing in them and they look like this:
4
5
6 (6 Replies)
Hi All,
I have 100 files with names like this:
1.dat, 2.dat, 3.dat until 100.dat.
My dat files look like this:
42323 0
438939 1
434 0
0.9383
3434
120.23 3
234
As you can see in the second column, some numbers are missing. I want to fill those missing places with 0's in all... (3 Replies)
Hi,
I'm working on a script that will take the contents of a file, that is in a row and column format, and compare it to a arrangment file. Such that if there is any or all blanks in my content file, the blank will be filled with a flag and will retain the row and column configuration.
Ex.
... (2 Replies)
It can't be that hard, but I just can't figure it out:
I have file like:
File Sub-brick M_1 S_1 M_2 S_2 M_4 S_4 ...
xxx 1 214 731 228 621 132 578 ...
and would like to get 0 0 where M_3 S_3 is missing
xxx 1 214 731 228 621 0 0 132 578 ...
I wrote following script, but can't figure out... (3 Replies)
I need to edit a list of numbers on the following form:
1 1.0
2 1.4
5 2.1
7 1.9
I want:
1 1.0
2 1.4
3 0.0
4 0.0
5 2.1
6 0.0
7 1.9
(i want to add the missing number in column 1 together with 0.0 in column 2).
I guess it is rather trivial but i didn't even manage to read column... (5 Replies)