combining columns from different files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting combining columns from different files
# 1  
Old 09-07-2005
combining columns from different files

Hi all,
I would be very grateful for some advice on the following.

I have several text files. The files are experiment results with columns of data separated by white space.

The files begin with several lines of header which are all preceeded by a comment character '#'.

Each file has a record number column as its first column and then several columns of values relating to that record. The record numbers go from 1-N in each experiment file

I would like to compare a particular column across several experiment files.

How to I take two of these experiment files and create a new file such that the new file contains:
- none of the header information
- the first column is the record column
- the next column is ,say column 2 from the first file
- the next column is the same column 2 from the second file

I've looked into awk and join but am not sure how to use them in combination to achieve what I want.

Thanks,
Enda

Sample file
File1.dat:
################
# Header info
################
1 10 20 30
2 13 50 13
3 ....
etc
# 2  
Old 09-07-2005
You can work out the desired details for the printout - I just made it: [record#:column#]
nawk -f io.awk file1 file2 fileN

io.awk:
Code:
!/^[#]/ {
  for(i=2; i <= NF; i++) {
    idx = $1 SUBSEP i
    a[idx] = ( idx in a ) ? a[idx] OFS $i : $i
  }
}
END {
  for( rec in a ) {
     split(rec, idxA, SUBSEP)
     printf("[%d:%d]%s%s\n", idxA[1], idxA[2], OFS, a[rec])
  }
}

# 3  
Old 09-08-2005
combining columns from different files

Thanks very much for your help. I ran the script with gawk and it worked fine. It produces an output with a record for each row:column where the values of the fields are the values of the corresponding row:column in each of the input files.

This is a good starting point from which to combine the input files in many different ways.

How would I post-process the output file, to leave me with files of the following format:

record # column_value_(File1) column_value_(File2) column_value_(FileN)
1
2
3

where there is a file for each of the column values. That is col1 across all input files, column 2 across all input files etc

Thanks very much.
# 4  
Old 09-08-2005
something like this [if I understand your formatting correctly]:
Code:
!/^[#]/ {
  for(i=2; i <= NF; i++) {
    idx = $1 SUBSEP i
    val = i "_" $i "_(" FILENAME ")"
    a[idx] = ( idx in a ) ? a[idx] OFS val : val
  }
}
END {
  for( rec in a ) {
     split(rec, idxA, SUBSEP)
     #printf("[%d:%d]%s%s\n", idxA[1], idxA[2], OFS, a[rec])
     printf("%d%s%s\n", idxA[1], OFS, a[rec])
  }
}

# 5  
Old 09-09-2005
Hi again,

I'm sorry I'm not being clear. I've attached a bash script that does what I need but it's messy.
- it assumes that all files with a certain name pattern in a directory are to be included
- the column to extract is hard-coded as I couldn't pass it to awk within the script

Scenario: each file is the result of an experiment. Each column is a particular value measured in the experiment. Each row is the index of the measurement taken.
I want to a create summary file across all experiment result files such that a summary file joins a given colum from each experiment file.

Example problem:
File_A.dat
#################
# Comments
# Header
#################
1 val_A12 val_A13
2 val_A22 val_A23
3 val_A32 val_A33

File_B.dat
#################
# Comments
# Header
#################
1 val_B12 val_B13
2 val_B22 val_B23
3 val_B32 val_B33

Files might have up to 2500 rows and 30 columns.

Want to pass to a script 'Give me all column 2's from File_A.dat File_B.dat ... File_30.dat

Desired result file (for column 2's):
Result.dat:
1 val_A12 val_B12
2 val_A22 val_B22
2 val_A32 val_B32
etc

Like I say, my script does the job but hopefully you can think of a neater way to do this.

Regards,
Éanna
# 6  
Old 09-09-2005
to get the SECOND column - change 'c=2' to the column you'd like.

nawk -v c=2 io.awk file1 file2 fileN
Code:
!/^[#]/ {
  if ( c > 0 && c <= NF ) {
    idx = $1 SUBSEP c
    a[idx] = ( idx in a ) ? a[idx] OFS $c : $c
  }
}
END {
  for( rec in a ) {
     split(rec, idxA, SUBSEP)
     printf("%d%s%s\n", idxA[1], OFS, a[rec])
  }
}

# 7  
Old 09-09-2005
Thanks. I really appreciate all your help.

Here are the commands that get me the result I need:

gawk -v c=2 -f io.awk ./result*.dat >unsorted.dat
gawk ' { print | "sort" }' sorted.dat >final.dat
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Combining certain columns of multiple files into one file

Hello Unix gurus, I have a large number of files (say X) each containing two columns of data and the same number of rows. I would like to combine these files to create a unique merged file containing X columns corresponding to the second column of each file (with a bonus of having the first... (3 Replies)
Discussion started by: ksennin
3 Replies

2. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ... (2 Replies)
Discussion started by: mady135
2 Replies

3. Shell Programming and Scripting

Combining rows into columns

hi experts, I have a flat file with below contents Database1 Table1 column1 Database1 Table1 column2 Database1 Table1 column3 Database1 Table1 column4 Database1 Table2 Column1 Database1 Table2 Column2 Database2 Table1 Column1 Database2 Table1 Column2 Database2 Table1 Column3... (9 Replies)
Discussion started by: Selva_2507
9 Replies

4. Linux

[Solved] Combining columns from different files

Hey Guys & Gals, I am stuck with the following ; I have 2 text files, each containing 2 columns. My goal is to have a column from the 2nd file placed inbetween the columns in the first file. Basically the idea is, each address has a different name (but 1 name per address) but 1 address... (6 Replies)
Discussion started by: TAPE
6 Replies

5. Shell Programming and Scripting

Combining columns from multiple files into one single output file

Hi, I have 3 files with one column value as shown File: a.txt ------------ Data_a1 Data_a2 File2: b.txt ------------ Data_b1 Data_b2 Data_b3 Data_b4 File3: c.txt ------------ Data_c1 Data_c2 Data_c3 Data_c4 Data_c5 (6 Replies)
Discussion started by: vfrg
6 Replies

6. UNIX for Dummies Questions & Answers

Need Help in reading N days files from a Directory & combining the files

Hi All, Request your expertise in tackling one requirement in my project,(i dont have much expertise in Shell Scripting). The requirement is as below, 1) We store the last run date of a process in a file. When the batch run the next time, it should read this file, get the last run date from... (1 Reply)
Discussion started by: dsfreddie
1 Replies

7. Shell Programming and Scripting

Combining columns from multiple files to one file

I'm trying to combine colums from multiple file to a single file but having some issues, appreciate your help. The filenames are the same except for the extension, path1.m0 --------- a b c d e f g h i path1.m1 --------- m n o p q r s t u File names are path1.m The... (3 Replies)
Discussion started by: rkmca
3 Replies

8. UNIX for Dummies Questions & Answers

Combining two text files as columns?

I have one space delimited file with multiple columns and one tab delimited file with multiple columns (They have the same number of rows). I want to basically combine these two text files into a new text file by column. How would I go about doing that? (1 Reply)
Discussion started by: evelibertine
1 Replies

9. Shell Programming and Scripting

Combining columns from different files

I have two files I need to combine. The problem I'm having is I need to only combine data from the second file in the empty spaces of the first. For example: file1 Data Field Data Field Data Field Data Field file2 a - Insert Data b - Insert Data c - Insert Data d - Insert Data... (10 Replies)
Discussion started by: handband2
10 Replies

10. Shell Programming and Scripting

Combining Two fixed width columns to a variable length file

Hi, I have two files. File1: File1 contains two fixed width columns ID of 15 characters length and Name is of 100 characters length. ID Name 1-43<<11 spaces>>Swapna<<94 spaces>> 1-234<<10 spaces>>Mani<<96 spaces>> 1-3456<<9 spaces>>Kapil<<95 spaces>> File2: ... (4 Replies)
Discussion started by: manneni prakash
4 Replies
Login or Register to Ask a Question