single output of awk script processing multiple files

# 1  
Old 08-08-2009
single output of awk script processing multiple files

Helllo UNIX Forum Smilie

Since I am posting on this board, yes, I am new to UNIX!
I read a copy of "UNIX made easy" from 1990, which felt like a making a "computer-science time jump" backwards Smilie
So, basically I have some sort of understanding what the basic concept is.

Problem Description:
What I am currently trying to do is writing an awk script.
This awk script should be able to repeat the same task on multiple input textfiles (extracting information (numerical values) from specified columns) and write the output to one single output file.

The output should be formatted in such a way, that it appears in 2 columns:
1st: index, 2nd: extracted value

I got to the point of extracting the information from multiple files and writing it into 1 output file. But my problem is, that the index starts all over again every time a new input file is read in, I would like it to increase every time, regardless whether it is a new file or not.

My code looks the following way:

for i 
awk '{if ($1=="string") 
          print i++ " " $2  >> output_file             # index blank value
}' $i                                                                                                                    # reads in the i-th input file

I guess that each time the loop completes one cycle the awk script exits, effectively resetting the index-value i.

Own thoughts:
1a) Is there some sort of "save-attribute" so the awk-script doesn't "forget" the index-value?

1b) Alternatively could the index i of the awk-script get saved as a "global" variable in the shell-script and locally in the awk-script?

2) The other solution I considered was to use wc -l or some awk-command, to see how many lines the output_file would already consists of, but I think that would create a problem, when the file to be analyzed does not exist at that point (which would happen in the vey first run I presume). That could probably also be fixed by creating an empty output_file before any output is written (appended) to it. Then again, if there was no output written (if the value does not match the specified one), I would need to include a control structure checking for content in output_file. In case that there is none, output_file gets removed.

3) The other idea is to create one temporary file containing all the input files and give it to awk. After output is written the temporary file gets deleted again.

I doubt that solutions 1a) or 1b) are possible (are they?) and I don't really like solutions 2) (too complicated) and 3) (use of a temporary file).
My actual goal was to use the awk script, to write a code as smooth and easy as possible...

What solution would you try (if at all any of the before mentioned), or do you have any hints at solving the problem?

Thanks in advance,

# 2  
Old 08-08-2009

Instead of
for i 
awk '{if ($1=="string") 
          print i++ " " $2  >> output_file             # index blank value
' $i

you can pass multiple files to awk directly:
awk '$1 == "string" { print i++, $2  > "output_file" }
' file1 file2 file3 file[456] file7*

Then the counter will not reset. output_file must be quoted if it's not an Awk variable

# 3  
Old 08-09-2009
Thank you very much for your help. I think this solved my problem.
What do the quotation marks around output_file actually mean, though - why were they necessary? What happened when not using them?
# 4  
Old 08-09-2009
It changes output_file from an awk variable into a string.

If you don't use the quotes, depending on your awk version either no output file will be written, or you'll get an error. If you declare an awk variable called output_file, then it would be fine:

/root/tmp # echo x | awk '{print > output_file}'
awk: (FILENAME=- FNR=1) fatal: expression for `>' redirection has null string value
/root/tmp # cat output_file
cat: output_file: cannot open [No such file or directory]

/root/tmp # echo x | awk '{print > "output_file"}'
/root/tmp # cat output_file 

/root/tmp # echo y | awk '{output_file="output_file"; print > output_file}'  
/root/tmp # cat output_file 

# 5  
Old 08-09-2009
thanks for your reply! Smilie

I tested it with a couple test files and it worked just fine (under latest cygwin), to finally conclude the case, I will be looking forward to test the code at work tomorrow.
# 6  
Old 08-10-2009
There is just one more problem:

Every input file is not to be read completely - only up to the position of the appearance of a certain string. Once this string is read, awk should stop processing the current file and start on the next one, until all input-files are processed.

My idea was something like this:

if ($1=="exit_string)
"go to the next input_file"

If all input files are read, exit.

My problem is, that I don't know how I could realize the "go to the next input_file" command. If I would just use the exit command, instead of the "go to next file", awk would quit after reading the first input file.

Does anybody have an idea?
# 7  
Old 08-10-2009

If you're using LINUX, you can use nextfile() (I don't know which other UNIX Awks support this)

awk '$1 == "exit_string" { nextfile() }
     $1 == "string" { print i++, $2  > "output_file" }
' file1 file2 file3 file[456] file7*

