Convert a 3 column tab delimited file to a matrix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Convert a 3 column tab delimited file to a matrix
# 1  
Old 03-12-2013
Convert a 3 column tab delimited file to a matrix

Hi all,

I have a 3 columns input file like this:
Code:
CPLX9PC-4943    CPLX9PC-4943    1
CPLX9PC-4943    CpxID123        0
CPLX9PC-4943    CpxID126        0
CPLX9PC-4943    CPLX9PC-5763    0.5
CPLX9PC-4943    CpxID13 0
CPLX9PC-4943    CPLX9PC-6163    0
CPLX9PC-4943    CPLX9PC-6164    0.04
CPLX9PC-4943    CPLX9PC-6165    0.027027
::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::


I need it in the form of a matrix where column 1 and column 2 are row and column labels and column 3 contains the values to fill in the matrix.

Thanks and Regards
# 2  
Old 03-12-2013
what is the desired output you are looking for. Please provide the example.
# 3  
Old 03-12-2013
I am taking an example data set here:
Code:
a       a       1
a       b       2
a       c       4
b       a       2
b       b       1
b       c       7
c       a       4
c       b       7
c       c       1

Now the desired output would be:
Code:
        a       b       c
a       1       2       4
b       2       1       7
c       4       7       1

My data set is similar to it but I have 25 variables(in this example, I have 3 variables a, b and c). Basically output would be a similarity matrix with two same halves. Thanks for the reply.
# 4  
Old 03-13-2013
Try this:
Code:
awk '
{a[$1,$2]=$3; b[$1];}
END{
  for(i in b)   #print header
    printf("\t%s",i);  
  printf("\n");
  for(i in b) { #each row
     printf("%s\t",i); 
     for(j in b)  #each column
        printf("%d\t", a[i,j]);
     printf("\n");
   } 
}' SUBSEP="\t" input.txt

Not that the rows/columns are not sorted.

But honestly, for this kind of data-mangling job, you are better off using some other data processing tool. E.g. with GNU R and "reshape" package it's piece of cake:
Code:
 d = read.table("mydata.txt")
 library(reshape)
 cast(d, V1 ~ V2 )


Last edited by mirni; 03-13-2013 at 12:17 AM.. Reason: R addendum
These 2 Users Gave Thanks to mirni For This Post:
# 5  
Old 03-13-2013
If you don't have GNU R and want a table that adjusts to the input given, you could try something like:
Code:
awk '
NF == 3 {
        if(!($1 in rh)) {
                # Add a new row heading.
                row[++nr] = $1  
                rh[$1]
                if(length($1) > rw) rw = length($1)
        }
        if(!($2 in ch)) {
                # Add a new column heading.
                col[++nc] = $2
                ch[$2]
                cw[nc] = length($2)
        }
        # Add a datapoint.
        d[$1, $2] = $3
}
END {   printf("%*s", rw, "") 
        for(i = 1; i <= nc; i++) {
                if(cw[i] < 11) cw[i] = 12
                printf(" %*.*s", cw[i], cw[i], col[i])
        }
        printf("\n")
        for(i = 1; i <= nr; i++) {
                printf("%*.*s", -rw, rw, row[i])
                for(j = 1; j <= nc; j++)
                        if((row[i], col[j]) in d)
                                printf(" %*.6f", cw[i], d[row[i], col[j]])
                        else    printf(" %*s", cw[i], "")
                printf("\n")
        }
}' input

As always, if you're using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk.
With the input file given in the 1st message in this thread, it produces the output:
Code:
             CPLX9PC-4943     CpxID123     CpxID126 CPLX9PC-5763      CpxID13 CPLX9PC-6163 CPLX9PC-6164 CPLX9PC-6165
CPLX9PC-4943     1.000000     0.000000     0.000000     0.500000     0.000000     0.000000     0.040000     0.027027

And, with the input given in message #3 in this thread, it produces:
Code:
             a            b            c
a     1.000000     2.000000     4.000000
b     2.000000     1.000000     7.000000
c     4.000000     7.000000     1.000000


Last edited by Don Cragun; 03-13-2013 at 01:17 AM.. Reason: Add output produced with alternative input
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 03-13-2013
Or..

Code:
$ awk '{A[$1,$2]=$3;B[$1]++;C[$2]++}END{for(i in C){s=s?s"\t"i:"\t"i}
print s;
for(i in B){s=i;
for(j in C){s=s"\t"A[j,i]}
print s}}' file

        a       b       c
a       1       2       4
b       2       1       7
c       4       7       1

This User Gave Thanks to pamu For This Post:
# 7  
Old 03-13-2013
This post results (after adapting the output format) in
Code:
              CPLX9PC-4943     CpxID123     CpxID126 CPLX9PC-5763      CpxID13 CPLX9PC-6163 CPLX9PC-6164 CPLX9PC-6165
 CPLX9PC-4943            1            0            0          0.5            0            0         0.04     0.027027

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Replace a column in tab delimited file with column in other tab delimited file,based on match

Hello Everyone.. I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files FileP.txt ... (2 Replies)
Discussion started by: YogeshG
2 Replies

2. Shell Programming and Scripting

Convert pipe demilited file to vertical tab delimited

Hi All, How can we convert pipe delimited ( or comma ) file to vertical tab (VT) delimited. Regards PK (4 Replies)
Discussion started by: prasson_ibm
4 Replies

3. UNIX for Beginners Questions & Answers

Convert Excel File (xls) to tab delimited text file on AIX

Hi i have a problem in my job i try to convert an excel file (xls extention) to text file (tab delimited), but no result with this comand cat xxx.xls > xxx.txt Do you have eny idea? PS: sorry for my english Thanks!! (4 Replies)
Discussion started by: frisso
4 Replies

4. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

5. Shell Programming and Scripting

How to convert space&tab delimited file to CSV?

Hello, I have a text file with space and tab (mixed) delimited file and need to convert into CSV. # cat test.txt /dev/rmt/tsmmt32 HP Ultrium 6-SCSI J3LZ 50:03:08:c0:02:72:c0:b5 F00272C0B5 0/0/6/1/1.145.17.255.0.0.0 /dev/rmt/c102t0d0BEST /dev/rmt/tsmmt37 ... (6 Replies)
Discussion started by: prvnrk
6 Replies

6. Shell Programming and Scripting

Delete an entire column from a tab delimited file

Hi, Can anyone please tell me about how we can delete an entire column from a tab delimited file? Mu input_file.txt looks like this: And I want the output as: I used the below code nawk -v d="1" 'BEGIN{FS=OFS="\t"}{$d=""}{print}' input_file.txtBut in the output, the first column is... (5 Replies)
Discussion started by: sampoorna
5 Replies

7. Shell Programming and Scripting

how to convert comma delimited file to tab separator

Hi all, How can i convert comma delimited .csv file to tab separate using sed command or script. Thanks, Krupa (4 Replies)
Discussion started by: krupasindhu18
4 Replies

8. Shell Programming and Scripting

Extract second column tab delimited file

I have a file which looks like this: 73450 articles and news developmental psychology 2006-03-30 16:22:40 1 http://www.usnews.com 73450 articles and news developmental psychology 2006-03-30 16:22:40 2 http://www.apa.org 73450 articles and news developmental psychology 2006-03-30... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

9. UNIX for Dummies Questions & Answers

How to convert a text file into tab delimited format?

I have a text file that made using text editor in Ubuntu. However the text file is not being recognized as space or tab delimited, the formatting seems to be messed up. How can I convert the text file into tab delimited format? (3 Replies)
Discussion started by: evelibertine
3 Replies

10. Shell Programming and Scripting

How to convert tab delimited file to .csv file

Hi, Can any one please help me in converting a tab delimited file in .csv file. Records in my file are similar to mentioned below: DET 001 0201 AC032508970 01478E1X8 DET 002 0202 AC032508971 01478E1X8 Could any one please suggest me what approach would be more suitable for this or if... (5 Replies)
Discussion started by: dtidke
5 Replies
Login or Register to Ask a Question