combining columns from different files

09-09-2005

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

Quote:

Originally Posted by iomaire

Thanks. I really appreciate all your help.

Here are the commands that get me the result I need:

gawk -v c=2 -f io.awk ./result*.dat >unsorted.dat
gawk ' { print | "sort" }' sorted.dat >final.dat

Code:

gawk -v c=2 -f io.awk ./result*.dat | sort -n > final.dat

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

03-26-2006

Registered User

1, 0

Join Date: Mar 2006

Last Activity: 26 March 2006, 7:23 PM EST

Posts: 1

Thanks Given: 0

Thanked 0 Times in 0 Posts

some documentation added

I had a similar data sorting task and found the above program extremely useful as a starting point and learning tool. Here is the same program with documentation from my figuring out of the program, in case it is useful for anyone else.

#This regex quietly ignores all lines in the datafile starting with #.
#If for some reason you don't want to do that, just start with the {

!/^[#]/ {

#This just checks that the inputted column is between 1 and the greatest
#column number:

if ( c > 0 && c <= NF ) {

# idx is a string made of $1, which is the data index, and c, which
# is the column number of the data we want to extract. They are
# separated by the separator SUBSEP, which can be set if you want
# in a BEGIN{} statement. See for example this page on arrays and SUBSEP.

idx = $1 SUBSEP c

# a is a 1-dimensional array, whose index is the string idx. While
# scanning through the first file, the (idx in a) test will return false,
# so a[idx] = $c. In subsequent files, (idx in a) will pass, so
# a[idx] will then equal a[idx] OFS $c. OFS is the output field
# separator which I set to " ", $c is the data column. So a is a
# string variable whose string is the row of data which increases in
# length by an OFS and a data value for each file scanned.

a[idx] = ( idx in a ) ? a[idx] OFS $c : $c

}
}
END {

# idx is as above, except that it is now being recalled as the index
# of a. It is still in the form of a string. I found it more clear
# to call it idx again instead of rec.

for( idx in a ) {

# this creates the array idxA by splitting rec between every field
# separator SUBSEP

split(idx, idxA, SUBSEP)

#idxA[1] is the row index, idxA[2] would be the column number
#a[idx] is the string of data values for the same row collected from each datafile.
#"%d%s%s\n" says to format the printed line as a decimal integer followed by
#two strings then a newline. See for example the printf section of the gawk manual.

printf("%d%s%s\n", idxA[1], OFS, a[idx])

}
}

lwaldron

View Public Profile for lwaldron

Find all posts by lwaldron

03-26-2007

Registered User

1, 0

Join Date: Mar 2007

Last Activity: 9 April 2007, 3:04 PM EDT

Posts: 1

Thanks Given: 0

Thanked 0 Times in 0 Posts

modifying the above script

Hi, i have some questions regarding modifying this script. I should add I'm a awk newbie.

Currently I have many files with 3 columns.

The awk script is similar to above, but I am not interested in printing the index, so slightly modified.

Code:

!/^[#]/ {
  if ( c > 0 && c <= NF ) {
idx = $1 SUBSEP c
 a[idx] = ( idx in a ) ? a[idx] OFS $c : $c
  }
}
END {
  for( idx in a ) {
    split(idx, idxA, SUBSEP)
# modified -->
    printf("%s%s\n", OFS, a[idx])
  }
}

I'd like to combine the 2nd column to one file and the 3rd column to another, so I use commands like this.

gawk -v c=2 -f io.awk ?E+0.final | sort -n > file2.dat
gawk -v c=3 -f io.awk ?E+0.final | sort -n > file3.dat

The script works fine for column 2 but not fine for column 3.

The files look like this:

Quote:

5.00000007E-11 0.0810279995 2.52286541E+09
1.00000001E-10 0.254880995 4.57596416E+09
1.49999999E-10 0.519167006 6.65693082E+09
2.00000003E-10 0.864251971 7.69276518E+09
2.49999993E-10 1.28940797 7.19983002E+09
2.99999997E-10 1.75356197 5.53754522E+09
3.50000001E-10 1.96022499 3.2696681E+09
4.00000005E-10 1.94632304 1.06016013E+09
4.50000009E-10 1.91220903 -492845248.
4.99999986E-10 1.91022301 -897060992.

the problem, as I see it, could be either the scientific notation OR the fact that the above script was written for column 2.

Any suggestions??
Thanks again for this helpful script

phil

Last edited by psny18; 03-26-2007 at 03:50 PM..

psny18

View Public Profile for psny18

Find all posts by psny18

02-14-2009

Registered User

100, 0

Join Date: Feb 2009

Last Activity: 7 November 2016, 6:38 AM EST

Posts: 100

Thanks Given: 19

Thanked 0 Times in 0 Posts

Printing specified columns from all files to a new file side by side...

Hi..
Hi All,

I am also looking for this kind of script... But my application is little different. I also don't need any index column(s). But I need to print $1,$11 compulsorily from all files and any one column at a time from the other columns.
In a nut shell, I need to print the $1,$11 and any other column out of total 34 columns to a new file.
E.g: $1,$11 and $2(or $3...... so on $34 excluding $1,$11, as they are already printed once)of all files should be printed to a new text file. Can anybody modify the below script to match to my requirement...

!/^[#]/ {
if ( c > 0 && c <= NF ) {
idx = $1 SUBSEP c
a[idx] = ( idx in a ) ? a[idx] OFS $c : $c
}
}
END {
for( idx in a ) {
split(idx, idxA, SUBSEP)
# modified -->
printf("%s%s\n", OFS, a[idx])
}
}This script is working very well for one column specified in the variable 'c' during run time..

Thanks ....

ks_reddy

View Public Profile for ks_reddy

Find all posts by ks_reddy

Shell Programming and Scripting

combining columns from different files

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Combining certain columns of multiple files into one file

Discussion started by: ksennin

2. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

Discussion started by: mady135

3. Shell Programming and Scripting

Combining rows into columns

Discussion started by: Selva_2507

4. Linux

[Solved] Combining columns from different files

Discussion started by: TAPE

5. Shell Programming and Scripting

Combining columns from multiple files into one single output file

Discussion started by: vfrg

6. UNIX for Dummies Questions & Answers

Need Help in reading N days files from a Directory & combining the files

Discussion started by: dsfreddie

7. Shell Programming and Scripting

Combining columns from multiple files to one file

Discussion started by: rkmca

8. UNIX for Dummies Questions & Answers

Combining two text files as columns?

Discussion started by: evelibertine

9. Shell Programming and Scripting

Combining columns from different files

Discussion started by: handband2

10. Shell Programming and Scripting

Combining Two fixed width columns to a variable length file

Discussion started by: manneni prakash