The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
compare two files and print the last row into first cdfd123 Shell Programming and Scripting 1 04-27-2008 08:23 AM
awk to compare lines of two files and print output on screen chlfc Shell Programming and Scripting 3 03-24-2008 04:16 AM
compare files by lines and columns giviut Shell Programming and Scripting 4 01-17-2008 06:00 AM
Compare two files and merge columns in a third CM64 Shell Programming and Scripting 20 04-04-2007 01:41 PM
Compare multiple columns between 2 files stevesmith Shell Programming and Scripting 15 09-20-2006 12:04 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 06-02-2008
smriti_shridhar smriti_shridhar is offline
Registered User
  
 

Join Date: Jan 2008
Posts: 47
Question compare columns from seven files and print the output

Hi guys,
I need some help to come out with a solution . I have seven such files but I am showing only three for convenience.

filea
a5 20
a8 16

fileb
a3 42
a7 14

filec
a5 23
a3 07

The output file shoud contain the data in table form showing first field of each file with their second field(score) in each file.

ID filea fileb filec
a5 20 00 23
a8 16 00 00
a3 00 42 07
a7 00 14 00

Your help is highly appretiated.

-Smriti

Last edited by smriti_shridhar; 06-03-2008 at 12:33 AM.. Reason: formating not proper
  #2 (permalink)  
Old 06-03-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
Perhaps the simple solution would be to write a simple script to canonicalize each of those files, so they all have the same labels. Then it's easy to e.g. paste them all side by side, and cut the columns you actually want.
  #3 (permalink)  
Old 06-03-2008
jaduks's Avatar
jaduks jaduks is offline
Registered User
  
 

Join Date: Aug 2007
Location: Assam,India
Posts: 166
Not sure if this is the expected output.

Code:
$ cat filea
a5 20
a8 16

$ cat fileb
a3 42
a7 14

$ cat filec
a5 23
a3 07

$ cat filea fileb filec > filex

$ cat filex
a5 20
a8 16
a3 42
a7 14
a5 23
a3 07

$ awk '
!arr[$1] {arr[$1] = $0; next}
{arr[$1] = arr[$1] " " $2}
END {for(i in arr) {print arr[i]}}
' filex

a3 42 07
a5 20 23
a7 14
a8 16

or 

$ awk '{Arr[$1]=sprintf("%s %s",Arr[$1],$2)} END {for ( i in Arr) {printf("%s %s\n",i,Arr[i])}}' filex
a3  42 07
a5  20 23
a7  14
a8  16
//Jadu
  #4 (permalink)  
Old 06-05-2008
smriti_shridhar smriti_shridhar is offline
Registered User
  
 

Join Date: Jan 2008
Posts: 47
Quote:
Originally Posted by era View Post
Perhaps the simple solution would be to write a simple script to canonicalize each of those files, so they all have the same labels. Then it's easy to e.g. paste them all side by side, and cut the columns you actually want.
Thanks era,

I suppose I am not getting what u want to convey. Plz make it more clear and I want to repeat that its important for me to know that second field i.e. the scores are coming from which file in the final output, that's why I want a -- or 00 showing absence of score from a particular file if the ID is repeated.

Last edited by smriti_shridhar; 06-05-2008 at 12:30 AM.. Reason: change in address
  #5 (permalink)  
Old 06-05-2008
smriti_shridhar smriti_shridhar is offline
Registered User
  
 

Join Date: Jan 2008
Posts: 47
Quote:
Originally Posted by jaduks View Post
Not sure if this is the expected output.

Code:
$ cat filea
a5 20
a8 16

$ cat fileb
a3 42
a7 14

$ cat filec
a5 23
a3 07

$ cat filea fileb filec > filex

$ cat filex
a5 20
a8 16
a3 42
a7 14
a5 23
a3 07

$ awk '
!arr[$1] {arr[$1] = $0; next}
{arr[$1] = arr[$1] " " $2}
END {for(i in arr) {print arr[i]}}
' filex

a3 42 07
a5 20 23
a7 14
a8 16

or 

$ awk '{Arr[$1]=sprintf("%s %s",Arr[$1],$2)} END {for ( i in Arr) {printf("%s %s\n",i,Arr[i])}}' filex
a3  42 07
a5  20 23
a7  14
a8  16
//Jadu
Thanks for replying,

This won't solve my problem as I need an ordered way where it sholud be clear that which score belongs to which file n if I'll cat that identity will be lost and I wouln't knw if '42' belonged to file a,b or c in the following output.

a3 42 07
a5 20 23
a7 14

your help is really appretiated.

Last edited by smriti_shridhar; 06-05-2008 at 12:40 AM.. Reason: change footer
  #6 (permalink)  
Old 06-05-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
What I was trying to suggest was that you would change the input files so they have an explicit value for each possible label. So for example filec would become

Code:
a3 07
a5 23
a7 00
a8 00
(Note also the reordering of the fields a3 and a5.)

Once you have that, the rest should be trivial. But maybe modifying the files (or maintaining a modified duplicate for each input file) isn't a very elegant solution.
  #7 (permalink)  
Old 06-05-2008
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
addict
  
 

Join Date: Jan 2007
Location: Варна, България / Milano, Italia
Posts: 2,847
Use nawk or /usr/xpg4/bin/awk on Solaris.

Code:
awk '{
if (!_[$1]++) id[++n] = $1
fid[FILENAME,$1] = $2
if (FNR == 1) fn[++c] = FILENAME
} END {
  printf "id\t"
  for (i=1; i<=c; i++)
    printf "%s\t", fn[i]
  print
  for (j=1; j<=n; j++) {
    printf "%s\t", id[j]
    for (i=1; i<=c; i++)
      printf "%s\t", (fn[i] SUBSEP id[j]) in fid ? fid[fn[i] SUBSEP id[j]] : "00"  
    print  
    }
}' file*
With your files:

Code:
$ head file*
==> filea <==
a5 20
a8 16

==> fileb <==
a3 42
a7 14

==> filec <==
a5 23
a3 07
$ nawk '{
if (!_[$1]++) id[++n] = $1
> if (!_[$1]++) id[++n] = $1
> fid[FILENAME,$1] = $2
> if (FNR == 1) fn[++c] = FILENAME
> } END {
    printf "%s\t", id[j]
>   printf "id\t"
>   for (i=1; i<=c; i++)
>     printf "%s\t", fn[i]
>   print
>   for (j=1; j<=n; j++) {
>     printf "%s\t", id[j]
>     for (i=1; i<=c; i++)
>   printf "%s\t", (fn[i] SUBSEP id[j]) in fid ? fid[fn[i] SUBSEP id[j]] : "00"
> print
> }
> }' file*
id      filea   fileb   filec
a5      20      00      23
a8      16      00      00
a3      00      42      07
a7      00      14      00
Sponsored Links
Closed Thread

Bookmarks

Tags
solaris

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 02:31 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0