Combine columns from 100 files with same structure


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Combine columns from 100 files with same structure
# 1  
Old 09-02-2012
Question Combine columns from 100 files with same structure

Hi,
I have a bunch of files with the following format.
Code:
PUR.1.9 
30910 0.024 0.926 0.050
36587 0.024 0.927 0.049
91857 0.023 0.928 0.049
105797 0.024 0.927 0.049
146659 0.024 0.927 0.049
152695 0.024 0.927 0.049
192118 0.022 0.930 0.048
193310 0.018 0.936 0.046

PUR.2.9  
30910 0.028 0.652 0.320
36587 0.027 0.652 0.320
105797 0.027 0.652 0.321
146659 0.027 0.652 0.321
152695 0.027 0.652 0.321
181843 0.027 0.653 0.321
192118 0.026 0.653 0.321
193310 0.023 0.655 0.322

PUR.3.9  
30910 0.001 0.088 0.911
36587 0.001 0.087 0.912
91857 0.001 0.086 0.913
105797 0.001 0.086 0.913
152695 0.001 0.087 0.913
181843 0.001 0.086 0.914
192118 0.001 0.082 0.917
193310 0 0.076 0.923

I would like to have col2-4 from all files output side by side with just col1 being the unique identifier across all files.
result.txt
Code:
30910	0.024	0.926	0.05	0.028	0.652	0.32	0.001	0.088	0.911
36587	0.024	0.927	0.049	0.027	0.652	0.32	0.001	0.087	0.912
91857	0.023	0.928	0.049	-	-	-	0.001	0.086	0.913
105797	0.024	0.927	0.049	0.027	0.652	0.321	0.001	0.086	0.913
146659	0.024	0.927	0.049	0.027	0.652	0.321	-	-	-
152695	0.024	0.927	0.049	0.027	0.652	0.321	0.001	0.087	0.913
181843	-	-	-	0.027	0.653	0.321	0.001	0.086	0.914
192118	0.022	0.93	0.048	0.026	0.653	0.321	0.001	0.082	0.917
193310	0.018	0.936	0.046	0.023	0.655	0.322	0	0.076	0.923

Please help
# 2  
Old 09-02-2012
Try this:
Code:
#!/bin/bash

awk '{print $1}' PUR.*  | sort -u > keys  #save all keys
while read k ; do 
  out="$k"
  for f in PUR.* ; do
     a=$(grep "^$k " $f | awk '{printf("%-4.3f %-4.3f %-4.3f", $2,$3,$4)}') 
     [[ -z "$a" ]] && a="   -     -     - " 
     out="$out $a " 
  done  
  echo "$out" 
done  < keys

It assumes you can glob all inputs with PUR.*

Last edited by mirni; 09-03-2012 at 03:23 AM.. Reason: fixed to print the key
# 3  
Old 09-02-2012
The solution provided my mirni seems to work OK except that it doesn't print the key at the start of each output line. (It also prints the 0 value as 0.000, but I don't see why that really matters.) It does however invoke awk once for each input file and once for each key, and it invokes grep on each input file once for each key.

I believe that the following provides all of the requested features only invoking awk once and sort once (with no need for grep). However, this script requires that all of the data in the input files be accumulated in awk's address space at once while mirni's script just requires that the entries for a single key be kept in memory at once.

To run this, save the following script in a file (e.g.,combine):
Code:
#!/bin/ksh
if [ $# -lt 2 ]
then
    printf "Usage: %s input_file...\n%s\n" "$(basename "$0")" \
        "    At least two input_file operands must be supplied."
    exit 1
fi
awk 'BEGIN {
    fc = -1
}
FNR==1 {fc++
}
  { if(outc[$1] < fc) {
        # Put in "-" entries for files that did not have a match for $1 in the
        # previous input_file operand.
        for(i = outc[$1]; i < fc; i++)
            out[$1] = out[$1] sprintf(" %-5s %-5s %-5s", "-", "-", "-")
    }
    # Add the entries for this file.
    outc[$1] = fc + 1
    out[$1] = out[$1] sprintf(" %-5.3g %-5.3g %-5.3g", $2, $3, $4)
    key[$1] = $1
    if(length($1) > maxl) maxl = length($1)
}
END{for(i in key) {
        if (outc[key[i]] <= fc) {
            # Put  in "=" entries for files that did not have a match in the
            # last input_file operand.
            for(j = outc[i]; j <= fc; j++)
                out[i] = out[i] sprintf(" %-5s %-5s %-5s", "-", "-", "-")
        }
        printf("%-*s%s\n", maxl, key[i], out[i])
    }
}' "$@" | sort -n

Make it executable (chmod +x combine), and run it with something like the following:
Code:
combine PUR.* > output_file

Although the script above specifies ksh, it will also work with at least bash and sh.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 09-03-2012
You are right, there are more efficient ways to do it. I fixed printing the key
Code:
...
out="$k"
...

# 5  
Old 09-04-2012
Both the scripts worked very well Thank you.
Is it possible to have the filenames as 1st rows in the results file.
Thanks
!GH
# 6  
Old 09-04-2012
Just print them at the beginning:

Code:
#!/bin/bash
 
ls PUR.*
awk '{print $1}' PUR.*  | sort -u > keys  #save all keys 
while read k ; do 
  ...

Use printf for prettier formatting

Last edited by mirni; 09-04-2012 at 04:18 AM..
# 7  
Old 09-04-2012
Quote:
Originally Posted by genehunter
Both the scripts worked very well Thank you.
Is it possible to have the filenames as 1st rows in the results file.
Thanks
!GH
Since the script I gave doesn't know what files will be passed in as arguments, thels PUR.*won't work in the general case. But, if you change:
Code:
awk 'BEGIN {
    fc = -1
}
FNR==1 {fc++
}

in the script I provided to:
Code:
ls -1 -- "$@"
awk 'BEGIN {
    fc = -1
}
FNR==1 {fc++
}

(Note that the first argument to ls (-1) uses the digit one; not the letter ell. It prints only the filenames in a single column, which is what you requested. With-l(the letter ell), you could get a long format listing of the files to be processed.
Using the-1(digit one) only matters when output from the script is not redirected to a file. This is the default when output is not directed to a terminal device file.)
or you could use:
Code:
awk 'BEGIN {
    fc = -1
}
FNR==1 {fc++
    printf("Processing file %d: %s\n", fc + 1, FILENAME)
}

to print the filename of each file as it is processed. And, if the output is not redirected to a file, it provides a progress report as processing starts for each file. Obviously, you can change the printf format string to format the display of the filenames in whatever manner you wish.
This User Gave Thanks to Don Cragun For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Automate splitting of files , scp files as each split completes and combine files on target server

i use the split command to split a one terabyte backup file into 10 chunks of 100 GB each. The files are split one after the other. While the files is being split, I will like to scp the files one after the other as soon as the previous one completes, from server A to Server B. Then on server B ,... (2 Replies)
Discussion started by: malaika
2 Replies

2. Shell Programming and Scripting

Combine Columns

Input NJ090237_0263_GRP,NJ090237_0263_VIEW,NJ090237_0263_PSGRP,NJ090237_0263_GOLD_CSGRP,06E:0_08E:0_09E:0_11E:0,0CE5 NJ090237_0264_GRP,NJ090237_0263_VIEW,NJ090237_0264_PSGRP,NJ090237_0263_GOLD_CSGRP,06E:0_08E:0_09E:0_11E:0,0CE5... (7 Replies)
Discussion started by: greycells
7 Replies

3. Shell Programming and Scripting

Combine columns - awk

Need some help with this ... please 60644,NJ090237_0263_GRP,NJ090237_0263_VIEW,NJ090237_0263_PSGRP,NJ090237_0263_GOLD_CSGRP,,06E:0_08E:0_09E:0_11E:0,0CE5,TDEV,34,VP_TIER... (3 Replies)
Discussion started by: greycells
3 Replies

4. Shell Programming and Scripting

Combine columns from many files but keep them aligned in columns-shorter left column issue

Hello everyone, I searched the forum looking for answers to this but I could not pinpoint exactly what I need as I keep having trouble. I have many files each having two columns and hundreds of rows. first column is a string (can have many words) and the second column is a number.The files are... (5 Replies)
Discussion started by: isildur1234
5 Replies

5. Shell Programming and Scripting

Combine columns from multiple files

Can anybody help on the script to combine/concatenate columns from multiple files input1 4 135 5 185 6 85 11 30 16 72 17 30 21 52 22 76 input2 2 50 4 50 6 33 8 62 10 25 12 46 14 42 15 46output (2 Replies)
Discussion started by: sdf
2 Replies

6. UNIX for Dummies Questions & Answers

How to combine 2 files with 6 columns?

This may seem obvious but I am having problems doing this as columns get converted to rows when i try to write a script. I have 2 files text1.txt and text2.txt each of which have 6 columns of numbers separated by a space. I need to combine the 2 files so that the output file text3.txt maintains... (2 Replies)
Discussion started by: tgoldstone
2 Replies

7. Shell Programming and Scripting

How to combine 2 files into 1 file with 2 columns

Hi Guys, I want to combine 2 files and and put together in 1 file and make two columns out it. See below desired output. Any help will be much appreciated. inputfile1.txt 12345 67890 24580 inputfile2.txt AAAAA BBBBB CCCCC (11 Replies)
Discussion started by: pinpe
11 Replies

8. UNIX for Dummies Questions & Answers

combine the values from the first two columns within a file

Hello everybody, I have a text file containing 10,000 rows and 5000 columns. The values are separated by a tab. Ex. file_ex.ped 1 mike 0 0 2 1 A A G G C T A G 1 jack 0 0 2 2 T A G T C A A C 1 Mary 0 0 1 2 A T G C A T G C ... I would like a out put file 1 mike 0 0 2 1 AA GG CT AG 1... (7 Replies)
Discussion started by: Unilearn
7 Replies

9. Shell Programming and Scripting

Combine multiple columns from multiple files

Hi there, I was wondering if someone can help me with this. I am trying the combine multiple columns from multiple files into one file. Example file 1: c0t0d0 c0t2d0 # hostname vgname c0t0d1 c0t2d1 # hostname vgname c0t0d2 c0t2d2 # hostname vgname c0t1d0 c0t3d0 # hostname vgname1... (5 Replies)
Discussion started by: martva
5 Replies

10. UNIX for Advanced & Expert Users

MV files from one directory structure(multiple level) to other directory structure

Hi, I am trying to write a script that will move all the files from source directory structure(multiple levels might exist) to destination directory structure. If a sub folder is source doesnot exist in destination then I have to skip and goto next level. I also need to delete the files in... (4 Replies)
Discussion started by: srmadab
4 Replies
Login or Register to Ask a Question