Awk: conversion of matrix formats


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk: conversion of matrix formats
# 8  
Old 03-15-2013
dear don cragun,

i use the table for another program, therefore i don't need a pretty print. I need tab's.

and there i found another little problem,
after each line there seems to be a tab, but I don't need this tab for the last field in each line.

how can i easily replace the last tab with a newline.

Code:
fn=$1
fname=${fn%.*}
echo $fname

awk 'BEGIN {FS="\t"};
    NF >= 3 {HD[$1]++; HD[$3]++; PP[$1,$3] = PP[$3,$1] = 1 }
    END    {printf "\t"
        for (i in HD) { printf "%s\t" ,i } printf "\n" #<- here replace the tab with newline
        for (i in HD) {printf "%s\t", i ;
            for (j in HD) { printf "%s\t", PP[i,j]?PP[i,j]:"0" } printf "\n" #<- here replace the last tab with newline
            } }' $fn > $fname.adj

---------- Post updated at 07:55 AM ---------- Previous update was at 07:47 AM ----------

i should read more carefully:

here is the solution:

Code:
awk 'BEGIN {FS="\t"};
    NF >= 3 {HD[$1]++; HD[$3]++; PP[$1,$3] = PP[$3,$1] = 1 }
    END    {
        for (i in HD) { printf "\t%s" ,i } printf "\n"
        for (i in HD) {printf "%s", i ;
            for (j in HD) { printf "\t%s", PP[i,j]?PP[i,j]:"0" } printf "\n"
            } }' $fn > $fname.adj

# 9  
Old 03-15-2013
This proposal makes use of the fact that we're dealing with a symmetrical matrix, needing to retain only the upper triangular matrix. A sorting step would be much easier to implement, and only half of the "interaction" array elements would be needed:
Code:
awk     '               {B1 = B3 = 0                                    # boolean variable for finding headers
                         for (i=1; i<=n; i++)                           # check all headers found up to now
                                {B1 = B1 || (HD[i] == $1)               # if $1 or $3 found in headers array,
                                 B3 = B3 || (HD[i] == $3)               # record in respective boolean var
                                }
                           if (!B1) HD[++n] = $1                        # if new header, record in new
                           if (!B3) HD[++n] = $3                        # header array element
                         PP[$1,$3] = 1                                  # record protein interaction
                        }
         END            {printf "\t"                                    # a header sort step may slip in here!
                         for (i=1; i<=n; i++)   printf "%s\t", HD[i]    # print column headers
                                                printf "\n"
                         for (i=1; i<=n; i++)
                                {printf "%s\t",HD[i]
                                 for (j=1; j<i; j++)
                                        printf "%d\t", PP[HD[j],HD[i]]  # print lower triangular matrix
                                 printf "%d\t", 0                       # diagonal elements are always zero!
                                 for (j=i+1; j<=n; j++)
                                        printf "%d\t", PP[HD[i],HD[j]]  # print upper triangular matrix
                                 printf "\n"
                                }
                        }
        ' file

This User Gave Thanks to RudiC For This Post:
# 10  
Old 03-15-2013
Quote:
Originally Posted by dietmar13
dear don cragun,

i use the table for another program, therefore i don't need a pretty print. I need tab's.

and there i found another little problem,
after each line there seems to be a tab, but I don't need this tab for the last field in each line.

how can i easily replace the last tab with a newline.

Code:
fn=$1
fname=${fn%.*}
echo $fname

awk 'BEGIN {FS="\t"};
    NF >= 3 {HD[$1]++; HD[$3]++; PP[$1,$3] = PP[$3,$1] = 1 }
    END    {printf "\t"
        for (i in HD) { printf "%s\t" ,i } printf "\n" #<- here replace the tab with newline
        for (i in HD) {printf "%s\t", i ;
            for (j in HD) { printf "%s\t", PP[i,j]?PP[i,j]:"0" } printf "\n" #<- here replace the last tab with newline
            } }' $fn > $fname.adj

Please look again at the script I suggested in my last post in this thread. It got rid of the trailing tabs. The spots marked in red above are the areas that need to change to get rid of trailing tabs.

The BEGIN clause marked in green above does not match the sample input you provided, so I took it out. If you really have tab delimiters in your input AND have spaces in some of your field names, put that clause back in the script I provided. The ++'s marked in orange don't affect the output of your program, but will make it run a little slower. (It won't be noticeable with your sample data, but if you have millions of lines of input, it will make a difference. You will note that I removed them in the script I provided.)
This User Gave Thanks to Don Cragun For This Post:
# 11  
Old 03-16-2013
@Don Cragun and @RudiC

The task is solved, my downstream application works and I have learned a lot about programing awk...

thank you!

dietmar
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to sum the matrix using awk?

input A1 B1 A2 B2 0 0 1 1 1 0 0 1 0 1 1 0 1 1 1 1 Output label A1 B1 A2 B2 A1 2 1 1 2 B1 1 2 2 1 A2 1 2 3 2 B2 2 1 2 3 Ex: The number of times that A1 and B1 row values are both 1 should be printed as output. The last row of A1 and B1 in the input match by having 1 in both... (4 Replies)
Discussion started by: quincyjones
4 Replies

2. Shell Programming and Scripting

Using awk to parse a file with mixed formats in columns

Greetings I have a file formatted like this: rhino grey weight=1003;height=231;class=heaviest;histology=9,0,0,8 bird white weight=23;height=88;class=light;histology=7,5,1,0,0 turtle green weight=40;height=9;class=light;histology=6,0,2,0... (2 Replies)
Discussion started by: Twinklefingers
2 Replies

3. Shell Programming and Scripting

how to rearrange a matrix with awk

Hi, every one. I have two files ,one is in matrix like this, one is a list with the same data as the matrix. AB AE AC AD AA AF SA 3 4 5 6 4 6 SC 5 7 2 8 4 3 SD 4 6 5 3 8 3 SE 45 ... (5 Replies)
Discussion started by: xshang
5 Replies

4. Shell Programming and Scripting

conversion: 3 columns into matrix

Hi guys, here https://www.unix.com/shell-programming-scripting/193043-3-column-csv-correlation-matrix-awk-perl.html I found awk script converting awk '{ OFS = ";" if (t) { if (l != $1) t = t OFS $1 } else t = OFS $1 x = x ? x OFS $NF : $NF l = $1 }... (2 Replies)
Discussion started by: grincz
2 Replies

5. Shell Programming and Scripting

Summing up a matrix using awk

Hi there, If anyone can help me sorting out this small task would be great. Given a matrix like the following: 100 3 3 3 3 3 ... 200 5 5 5 5 5 ... 400 1 1 1 1 1 ... 500 8 8 8 8 8 ... 900 0 0 0 0... (5 Replies)
Discussion started by: JRodrigoF
5 Replies

6. UNIX for Dummies Questions & Answers

tab-separated file to matrix conversion

hello all, i have an input file like that A A X0 A B X1 A C X2 ... A Z Xx B A X1 B B X3 .... Z A Xx Z B X4 and i want to have an output like that A B C D A X0 X1 X2 Xy B X1 X3 X4 (4 Replies)
Discussion started by: TheTransporter
4 Replies

7. Shell Programming and Scripting

awk? adjacency matrix to adjacency list / correlation matrix to list

Hi everyone I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Discussion started by: stonemonkey
5 Replies

8. Shell Programming and Scripting

awk matrix problem

hi there I'm very new in programing and i've started with awk. I'm processing 200 data files and I need to do some precessing on them. The files have 3 columns with N-lines for each line a have on the first and second value is the same for all the files and only the third is variable. like... (2 Replies)
Discussion started by: philstar
2 Replies

9. Shell Programming and Scripting

matrix inverse (awk)

I need to inverse a matrix given in a file. The problem is I'm stuck with writing determinant finding algoritm into code. I found this algoritm about finding determinant of nxn matrix. This is what i need: Matrices and Determinants and here: a11 a12 a13 a21 a22 a23 a31 a32 a33... (0 Replies)
Discussion started by: vesyyr
0 Replies

10. UNIX for Dummies Questions & Answers

need help-matrix inverse (awk)

I have few days to complete my awk homework. But I'm stucked. i hope some1 will help me out. I have to inverse n x n matrix, but I have problems with finding the determinant of the matrix. I found the algoritm, how to find a determinant of n x n matrix:... (0 Replies)
Discussion started by: vesyyr
0 Replies
Login or Register to Ask a Question