single output of awk script processing multiple files

08-08-2009

Registered User

4, 0

Join Date: Aug 2009

Last Activity: 10 August 2009, 9:27 AM EDT

Posts: 4

Thanks Given: 0

Thanked 0 Times in 0 Posts

single output of awk script processing multiple files

Helllo UNIX Forum

Since I am posting on this board, yes, I am new to UNIX!
I read a copy of "UNIX made easy" from 1990, which felt like a making a "computer-science time jump" backwards

So, basically I have some sort of understanding what the basic concept is.

Problem Description:
What I am currently trying to do is writing an awk script.
This awk script should be able to repeat the same task on multiple input textfiles (extracting information (numerical values) from specified columns) and write the output to one single output file.

The output should be formatted in such a way, that it appears in 2 columns:
1st: index, 2nd: extracted value

I got to the point of extracting the information from multiple files and writing it into 1 output file. But my problem is, that the index starts all over again every time a new input file is read in, I would like it to increase every time, regardless whether it is a new file or not.

Ansatz:
My code looks the following way:

Code:

for i 
do
awk '{if ($1=="string") 
          print i++ " " $2  >> output_file             # index blank value
}' $i                                                                                                                    # reads in the i-th input file
done

I guess that each time the loop completes one cycle the awk script exits, effectively resetting the index-value i.

Own thoughts:
1a) Is there some sort of "save-attribute" so the awk-script doesn't "forget" the index-value?

1b) Alternatively could the index i of the awk-script get saved as a "global" variable in the shell-script and locally in the awk-script?

2) The other solution I considered was to use wc -l or some awk-command, to see how many lines the output_file would already consists of, but I think that would create a problem, when the file to be analyzed does not exist at that point (which would happen in the vey first run I presume). That could probably also be fixed by creating an empty output_file before any output is written (appended) to it. Then again, if there was no output written (if the value does not match the specified one), I would need to include a control structure checking for content in output_file. In case that there is none, output_file gets removed.

3) The other idea is to create one temporary file containing all the input files and give it to awk. After output is written the temporary file gets deleted again.

I doubt that solutions 1a) or 1b) are possible (are they?) and I don't really like solutions 2) (too complicated) and 3) (use of a temporary file).
My actual goal was to use the awk script, to write a code as smooth and easy as possible...

Question
What solution would you try (if at all any of the before mentioned), or do you have any hints at solving the problem?

Thanks in advance,
Kasimir

Last edited by DukeNuke2; 08-08-2009 at 01:02 PM.. Reason: added code tags

Kasimir

View Public Profile for Kasimir

Find all posts by Kasimir

08-08-2009

Administrator Emeritus

9,179, 1,331

Join Date: Jun 2009

Last Activity: 26 February 2019, 5:57 PM EST

Posts: 9,179

Thanks Given: 430

Thanked 1,331 Times in 1,120 Posts

Hi.

Instead of

Code:

for i 
do
awk '{if ($1=="string") 
          print i++ " " $2  >> output_file             # index blank value
}
' $i
done

you can pass multiple files to awk directly:

Code:

awk '$1 == "string" { print i++, $2  > "output_file" }
' file1 file2 file3 file[456] file7*

Then the counter will not reset. output_file must be quoted if it's not an Awk variable

Last edited by Scott; 08-08-2009 at 03:21 PM.. Reason: Quoted output_file variable

Scott

View Public Profile for Scott

Find all posts by Scott

08-09-2009

Registered User

4, 0

Join Date: Aug 2009

Last Activity: 10 August 2009, 9:27 AM EDT

Posts: 4

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thank you very much for your help. I think this solved my problem.
What do the quotation marks around output_file actually mean, though - why were they necessary? What happened when not using them?

Kasimir

View Public Profile for Kasimir

Find all posts by Kasimir

08-09-2009

Administrator Emeritus

9,179, 1,331

Join Date: Jun 2009

Last Activity: 26 February 2019, 5:57 PM EST

Posts: 9,179

Thanks Given: 430

Thanked 1,331 Times in 1,120 Posts

It changes output_file from an awk variable into a string.

If you don't use the quotes, depending on your awk version either no output file will be written, or you'll get an error. If you declare an awk variable called output_file, then it would be fine:

Code:

/root/tmp # echo x | awk '{print > output_file}'
awk: (FILENAME=- FNR=1) fatal: expression for `>' redirection has null string value
/root/tmp # cat output_file
cat: output_file: cannot open [No such file or directory]

Code:

/root/tmp # echo x | awk '{print > "output_file"}'
/root/tmp # cat output_file 
x

Code:

/root/tmp # echo y | awk '{output_file="output_file"; print > output_file}'  
/root/tmp # cat output_file 
y

Scott

View Public Profile for Scott

Find all posts by Scott

08-09-2009

Registered User

4, 0

Join Date: Aug 2009

Last Activity: 10 August 2009, 9:27 AM EDT

Posts: 4

Thanks Given: 0

Thanked 0 Times in 0 Posts

thanks for your reply!

I tested it with a couple test files and it worked just fine (under latest cygwin), to finally conclude the case, I will be looking forward to test the code at work tomorrow.

Kasimir

View Public Profile for Kasimir

Find all posts by Kasimir

08-10-2009

Registered User

4, 0

Join Date: Aug 2009

Last Activity: 10 August 2009, 9:27 AM EDT

Posts: 4

Thanks Given: 0

Thanked 0 Times in 0 Posts

There is just one more problem:

Every input file is not to be read completely - only up to the position of the appearance of a certain string. Once this string is read, awk should stop processing the current file and start on the next one, until all input-files are processed.

My idea was something like this:

if ($1=="exit_string)
"go to the next input_file"

If all input files are read, exit.

My problem is, that I don't know how I could realize the "go to the next input_file" command. If I would just use the exit command, instead of the "go to next file", awk would quit after reading the first input file.

Does anybody have an idea?

Kasimir

View Public Profile for Kasimir

Find all posts by Kasimir

08-10-2009

Administrator Emeritus

9,179, 1,331

Join Date: Jun 2009

Last Activity: 26 February 2019, 5:57 PM EST

Posts: 9,179

Thanks Given: 430

Thanked 1,331 Times in 1,120 Posts

Hi.

If you're using LINUX, you can use nextfile() (I don't know which other UNIX Awks support this)

Code:

awk '$1 == "exit_string" { nextfile() }
     $1 == "string" { print i++, $2  > "output_file" }
     
' file1 file2 file3 file[456] file7*

Last edited by Scott; 08-10-2009 at 10:45 AM.. Reason: conditions were wrong way round

Scott

View Public Profile for Scott

Find all posts by Scott

UNIX for Dummies Questions & Answers

single output of awk script processing multiple files

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Passing multiple files to awk for processing in bash script

Discussion started by: shree11

2. Shell Programming and Scripting

Using a single "find" cmd to search for multiple file types and output individual files

Discussion started by: swaters

3. Shell Programming and Scripting

Combining columns from multiple files into one single output file

Discussion started by: vfrg

4. Shell Programming and Scripting

Processing multiple files awk

Discussion started by: sarathyy

5. Shell Programming and Scripting

awk, multiple files input and multiple files output

Discussion started by: gabrysfe

6. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Discussion started by: Liverpaul09

7. Shell Programming and Scripting

awk script processing data from 2 files

Discussion started by: Alice236

8. Shell Programming and Scripting

Writing output into different files while processing file using AWK

Discussion started by: vidyak

9. Shell Programming and Scripting

awk, perl Script for processing a single line text file

Discussion started by: hmsadiq

10. Shell Programming and Scripting

Help needed in processing multiple variables in a single sed command.

Discussion started by: stevefox