AWK looping over 2 variables

09-12-2012

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

Quote:

Originally Posted by Don Cragun

The script Chubler_XL provided in the message before this should work fine as long as you don't care about the order in which lines appear in the output files and don't want to append to existing files.

My script keeps lines in same order as original file -- to append to existing files change ">" to ">>" on the 2 print lines.

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

09-13-2012

Registered User

89, 1

Join Date: Oct 2010

Last Activity: 19 July 2017, 8:11 AM EDT

Posts: 89

Thanks Given: 18

Thanked 1 Time in 1 Post

Quote:

Originally Posted by Don Cragun

Unfortunately, the script above is functionally equivalent to:

Code:

END {print $2 > "junk" }

which only writes the 2nd field from the last line in your input file into a file named junk. But, I think understand what you're trying to do now.

To be sure that I do understand what you want, please confirm or correct the following statements:

You want files named file_x for 1 <= x <= 173 which contain the copies of the lines from the file named file where the value in the first field in the line is in corresponding range.
Then for each file named file_x you want a file named file_x_f that contains the same number of lines as the file_x file, but only contains the contents of the 2nd field of each line instead of the entire line.
In your description above you sometimes talk about files named file_x and at other times talk about files named output_x. Am I correct in assuming that "output_" was a typo and you meant "file_"?

Is this correct?

Note that since you're creating up to 346 output files from this script, the script is going to have to open and close files while it is running rather than opening everything and letting awk automatically close them when the script terminates.

Please also answer the following questions:

Do you want empty files created for files that don't have any lines that will be directed to those files?
Do existing file_x and file_x_f files need to be removed when this script starts?
If not, should lines to be written by this script replace the contents of existing files or append lines to them?

I'm hoping that you either want all existing files to be removed or overwritten by the script rather than appending to existing files. The file handling logic is much more difficult in an awk script if you want to portably append to existing files. Given the script: print >> file_x some systems will create file_x if it doesn't already exist. Others will only create a file when using print > file_x and will give an error if you try print >> file_x when file_x doesn't already exist.

The script Chubler_XL provided in the message before this should work fine as long as you don't care about the order in which lines appear in the output files and don't want to append to existing files. If you want to append rather than replace, or if you want to have all entries in the output files be in the same order that they appeared in the input file, but script will be more complex.

================
I apologize. Chubler_XL's script does indeed maintain order, and (as he said) you can just replace > with >> if you want to append rather than overwrite. (It is w >> file in ex that may fail if file doesn't already exist. In awk >> file is guaranteed to create the file if it didn't exist and append to it if it did exist.)

The script Chubler_XL wrote works great. I am not in need of specific ordering of lines, so it's all sorted! I was impressed by your use of arrays for the problem.

Do I understand correctly that this is a 2D array comprised of the nearest integer defined by the bucket function and v[bucket]++?

Code:

l[bucket,v[bucket]]=$0

Also, could you explain what the purpose of

Code:

 v[bucket]++

is?

chrisjorg

View Public Profile for chrisjorg

Find all posts by chrisjorg

09-13-2012

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

Bucket is the integer file number you requested, for example:

Code:

-6.650 --> 1
-1.203 --> 55
 0.000 --> 68
 1.293 --> 80
 6.650 --> 134

v[bucket]++ counts how many lines have been found for each bucket, each line is then stored as l[bucket, bucket_line].

This allows two things:

Each line gets a unique array index.
Output order can be made to match input order.

The array solution is much more efficient than writing output files as each line is processed, and works well for small to mid-sized files (say less than 2GB). As awk isn't constantly opening and closing output files.
If you have a huge input file, this solution is likley to fail (by running out of memory or blowing awk internal array size limits). The slower technique of opening each output file as a line is processed and closing it again would become necessary.

If you have a huge input file, this solution is likley to fail (by running out of memory or blowing awk internal array size limits). The slower technique of opening each output file as a line is processed and closing it again would become necessary.

Last edited by Chubler_XL; 09-13-2012 at 05:48 PM..

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

Shell Programming and Scripting

AWK looping over 2 variables

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk nested looping?

Discussion started by: kellyanneghj

2. Shell Programming and Scripting

Help on looping using awk

Discussion started by: jeffreybsu

3. Shell Programming and Scripting

looping in awk

Discussion started by: euval

4. UNIX for Dummies Questions & Answers

Help with AWK looping

Discussion started by: new2awk

5. Shell Programming and Scripting

Urgent - Looping using AWK

Discussion started by: skyineyes

6. Shell Programming and Scripting

Looping script with variables

Discussion started by: jojojmac5

7. Shell Programming and Scripting

looping and awk/sed help

Discussion started by: Zelp

8. Shell Programming and Scripting

Awk: looping problem!

Discussion started by: cstovall

9. Shell Programming and Scripting

looping through variables

Discussion started by: andyfaeglasgow

10. UNIX for Advanced & Expert Users

Looping in awk

Discussion started by: keelba