compare columns from seven files and print the output


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting compare columns from seven files and print the output
# 1  
Old 06-02-2008
Question compare columns from seven files and print the output

Hi guys,
I need some help to come out with a solution . I have seven such files but I am showing only three for convenience.

filea
a5 20
a8 16

fileb
a3 42
a7 14

filec
a5 23
a3 07

The output file shoud contain the data in table form showing first field of each file with their second field(score) in each file.

ID filea fileb filec
a5 20 00 23
a8 16 00 00
a3 00 42 07
a7 00 14 00

Your help is highly appretiated.

-Smriti

Last edited by smriti_shridhar; 06-03-2008 at 01:33 AM.. Reason: formating not proper
# 2  
Old 06-03-2008
Perhaps the simple solution would be to write a simple script to canonicalize each of those files, so they all have the same labels. Then it's easy to e.g. paste them all side by side, and cut the columns you actually want.
# 3  
Old 06-03-2008
Not sure if this is the expected output.

Code:
$ cat filea
a5 20
a8 16

$ cat fileb
a3 42
a7 14

$ cat filec
a5 23
a3 07

$ cat filea fileb filec > filex

$ cat filex
a5 20
a8 16
a3 42
a7 14
a5 23
a3 07

$ awk '
!arr[$1] {arr[$1] = $0; next}
{arr[$1] = arr[$1] " " $2}
END {for(i in arr) {print arr[i]}}
' filex

a3 42 07
a5 20 23
a7 14
a8 16

or 

$ awk '{Arr[$1]=sprintf("%s %s",Arr[$1],$2)} END {for ( i in Arr) {printf("%s %s\n",i,Arr[i])}}' filex
a3  42 07
a5  20 23
a7  14
a8  16

//Jadu
# 4  
Old 06-05-2008
Quote:
Originally Posted by era
Perhaps the simple solution would be to write a simple script to canonicalize each of those files, so they all have the same labels. Then it's easy to e.g. paste them all side by side, and cut the columns you actually want.
Thanks era,

I suppose I am not getting what u want to convey. Plz make it more clear and I want to repeat that its important for me to know that second field i.e. the scores are coming from which file in the final output, that's why I want a -- or 00 showing absence of score from a particular file if the ID is repeated.

Last edited by smriti_shridhar; 06-05-2008 at 01:30 AM.. Reason: change in address
# 5  
Old 06-05-2008
Quote:
Originally Posted by jaduks
Not sure if this is the expected output.

Code:
$ cat filea
a5 20
a8 16

$ cat fileb
a3 42
a7 14

$ cat filec
a5 23
a3 07

$ cat filea fileb filec > filex

$ cat filex
a5 20
a8 16
a3 42
a7 14
a5 23
a3 07

$ awk '
!arr[$1] {arr[$1] = $0; next}
{arr[$1] = arr[$1] " " $2}
END {for(i in arr) {print arr[i]}}
' filex

a3 42 07
a5 20 23
a7 14
a8 16

or 

$ awk '{Arr[$1]=sprintf("%s %s",Arr[$1],$2)} END {for ( i in Arr) {printf("%s %s\n",i,Arr[i])}}' filex
a3  42 07
a5  20 23
a7  14
a8  16

//Jadu
Thanks for replying,

This won't solve my problem as I need an ordered way where it sholud be clear that which score belongs to which file n if I'll cat that identity will be lost and I wouln't knw if '42' belonged to file a,b or c in the following output.

a3 42 07
a5 20 23
a7 14

your help is really appretiated.

Last edited by smriti_shridhar; 06-05-2008 at 01:40 AM.. Reason: change footer
# 6  
Old 06-05-2008
What I was trying to suggest was that you would change the input files so they have an explicit value for each possible label. So for example filec would become

Code:
a3 07
a5 23
a7 00
a8 00

(Note also the reordering of the fields a3 and a5.)

Once you have that, the rest should be trivial. But maybe modifying the files (or maintaining a modified duplicate for each input file) isn't a very elegant solution.
# 7  
Old 06-05-2008
Use nawk or /usr/xpg4/bin/awk on Solaris.

Code:
awk '{
if (!_[$1]++) id[++n] = $1
fid[FILENAME,$1] = $2
if (FNR == 1) fn[++c] = FILENAME
} END {
  printf "id\t"
  for (i=1; i<=c; i++)
    printf "%s\t", fn[i]
  print
  for (j=1; j<=n; j++) {
    printf "%s\t", id[j]
    for (i=1; i<=c; i++)
      printf "%s\t", (fn[i] SUBSEP id[j]) in fid ? fid[fn[i] SUBSEP id[j]] : "00"  
    print  
    }
}' file*

With your files:

Code:
$ head file*
==> filea <==
a5 20
a8 16

==> fileb <==
a3 42
a7 14

==> filec <==
a5 23
a3 07
$ nawk '{
if (!_[$1]++) id[++n] = $1
> if (!_[$1]++) id[++n] = $1
> fid[FILENAME,$1] = $2
> if (FNR == 1) fn[++c] = FILENAME
> } END {
    printf "%s\t", id[j]
>   printf "id\t"
>   for (i=1; i<=c; i++)
>     printf "%s\t", fn[i]
>   print
>   for (j=1; j<=n; j++) {
>     printf "%s\t", id[j]
>     for (i=1; i<=c; i++)
>   printf "%s\t", (fn[i] SUBSEP id[j]) in fid ? fid[fn[i] SUBSEP id[j]] : "00"
> print
> }
> }' file*
id      filea   fileb   filec
a5      20      00      23
a8      16      00      00
a3      00      42      07
a7      00      14      00

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare two files and print output

Hi All, i am trying to compare two files in Centos 6. F1: /tmp/d21 NAME="xvda" TYPE="disk" SIZE="40G" OWNER="root" GROUP="disk" MODE="brw-rw----" MOUNTPOINT="" NAME="xvda1" TYPE="part" SIZE="500M" OWNER="root" GROUP="disk" MODE="brw-rw----" MOUNTPOINT="/boot" NAME="xvda2" TYPE="part"... (2 Replies)
Discussion started by: balu1234
2 Replies

2. Shell Programming and Scripting

[Solved] awk compare two different columns of two files and print all from both file

Hi, I want to compare two columns from file1 with another two column of file2 and print matched and unmatched column like this File1 1 rs1 abc 3 rs4 xyz 1 rs3 stu File2 1 kkk rs1 AA 10 1 aaa rs2 DD 20 1 ccc ... (2 Replies)
Discussion started by: justinjj
2 Replies

3. Shell Programming and Scripting

Compare columns of multiple files and print those unique string from File1 in an output file.

Hi, I have multiple files that each contain one column of strings: File1: 123abc 456def 789ghi File2: 123abc 456def 891jkl File3: 234mno 123abc 456def In total I have 25 of these type of file. (5 Replies)
Discussion started by: owwow14
5 Replies

4. Shell Programming and Scripting

Match two columns from two files and print output

Hello, I have two files which are of the following format File 1 which has two columns Protein_ID Substitution NP_997239 T53R NP_060668 V267M NP_058515 P856A NP_001206 T55M NP_006601 D371Y ... (2 Replies)
Discussion started by: nans
2 Replies

5. Shell Programming and Scripting

awk compare specific columns from 2 files, print new file

Hello. I have two files. FILE1 was extracted from FILE2 and modified thanks to help from this post. Now I need to replace the extracted, modified lines into the original file (FILE2) to produce the FILE3. FILE1 1466 55.27433 14.72050 -2.52E+03 3.00E-01 1.05E+04 2.57E+04 1467 55.27433... (1 Reply)
Discussion started by: jm4smtddd
1 Replies

6. Shell Programming and Scripting

awk compare 2 columns, 2 files, output whole line

Hello, I have not been able to find what I'm looking for via searching the forum. I could use some help with an awk script or one-liner to solve this simple problem. I have two files. If $1 and $2 from file1 match $1 and $2 from file2, print the whole line from file2. Example file1 ... (2 Replies)
Discussion started by: jm4smtddd
2 Replies

7. Shell Programming and Scripting

Compare columns 2 files and print

File 1 has 16 columns so does File 2 I want to remove all records from File 2 that column 1 and column 16 match between file 1 and file 2 delimter of files is ~ (10 Replies)
Discussion started by: sigh2010
10 Replies

8. Shell Programming and Scripting

Compare selected columns of two files and print whole line with mismatch

hi! i researched about comparing two columns here and got an answer. but after examining my two files, i found out that the first columns of the two files are not unique with each other. all i want to compare is the 2nd and 3rd column. FILE 1: ABS 456 315 EBS 923 163 JYQ3 654 237 FILE 2:... (1 Reply)
Discussion started by: engr.jay
1 Replies

9. Shell Programming and Scripting

Compare two columns in two files and print the difference

one file . . importing table employee 119 . . importing table jobs 1 2nd file . . importing table employee 120 . . importing table jobs 1 and would like... (2 Replies)
Discussion started by: jhonnyrip
2 Replies

10. Shell Programming and Scripting

compare two columns of different files and print the matching second file..

Hi, I have two tab separated files; file1: S.No ddi fi cu o/l t+ t- 1 0.5 0.6 o 0.1 0.2 2 0.2 0.3 l 0.3 0.4 3 0.5 0.8 l 0.1 0.6 ... (5 Replies)
Discussion started by: vasanth.vadalur
5 Replies
Login or Register to Ask a Question