Reformatting data in matrix form


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Reformatting data in matrix form
# 1  
Old 03-20-2012
Reformatting data in matrix form

Hi,

Some assistance with respect to the following problem will be very helpful.
I want to reformat my dataset in the following manner for subsequent analysis.

I have first column values (which repeat for each value of 2nd column) which are names, the second column specifies position ad the third column is
the 1st value, fourth column is 2nd value. I want to put the names as column headers and the values for a particular position as the value of the 4th column in the input. In case of missing record, it should take the value of the third column of any record for that position.

For example in the input dataset, A through D are the names and C does not occur for pos2, and B,C,D does not occur for pos3. So the value of C for pos2 will be taken from the tird column of any record for pos2 which is 9 (third column is constant for a particular pos ). For pos3, only B will have value 9 while A, C and D will have 7 (third column for pos3).

For the record, I have 80 names and 15677899 records in my actual dataset.


Input


Code:
A pos1 1 2
B pos1 1 3
C pos1 1 4
D pos1 1 5
A pos2 9 6
B pos2 9 7
D pos2 9 8
B pos3 7 9


Expected output

Code:
       A B C D
pos1   2 3 4 5
pos2   6 7 9 8
pos3   7 9 7 7


Last edited by newbie83; 03-20-2012 at 01:33 PM..
# 2  
Old 03-20-2012
I have a way to do this, but it will need to be told what the columns are, since it can't go backwards and insert the first line after it's collated everything.

Code:
$ cat rotarr.awk

BEGIN {
        if(!COLS)       COLS="|A|B|C|D"

        split(COLS, C, "|");
        for(N=2; C[N]; N++)     printf("\t%s", C[N]);
        printf("\n");
}

!LASTROW { LASTROW=$2 }

$2 != LASTROW {
        printf("%s", LASTROW);
        for(N=2; C[N]; N++)
        {
                if(!DATA[C[N]]) DATA[C[N]]=DEF;
                printf("\t%s", DATA[C[N]]);
        }

        printf("\n");
        for(X in DATA)  delete DATA[X];
        LASTROW=$2
}

{
        DATA[$1]=$4
        DEF=$3
}

END {
        printf("%s", LASTROW);
        for(N=2; C[N]; N++)
        {
                if(!DATA[C[N]]) DATA[C[N]]=DEF;
                printf("\t%s", DATA[C[N]]);
        }

        printf("\n");
        for(X in DATA)  delete DATA[X];
}

$ awk -v COLS="|A|B|C|D" -f rotarr.awk data

        A       B       C       D
pos1    2       3       4       5
pos2    6       7       9       8
pos3    7       9       7       7

$

This User Gave Thanks to Corona688 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match child with parents and form matrix

thank you for letting me join this forum, lots of learning opportunities looks like. Myself a biologist, very new into unix, so please excuse if I use incorrect language. I am using cygwin on windows, it can run perl, awk , sed etc. I have 2 files, the first sample sheet, tells which parent... (10 Replies)
Discussion started by: jalaj841
10 Replies

2. UNIX for Dummies Questions & Answers

Form balanced matrix by filtering data

I need to form a matrix out of unbalanced set of records. First eliminate the sample that do not have at least 3 variables (col2). So, in the example, samples 4 and 5 get eliminated. Then form a matrix of values (col3) from the samples using only variables that are present accross all samples.... (3 Replies)
Discussion started by: senhia83
3 Replies

3. Shell Programming and Scripting

How order a data matrix using awk?

is it possible to order the following row clusters from ascending to descending. thanx in advance input 1 2 4 0 1 2 4 0 3 3 3 3 1 5 1 0 1 5 1 0 6 0 0 0 5 1 1 1... (4 Replies)
Discussion started by: quincyjones
4 Replies

4. Shell Programming and Scripting

Transpose Data form Different form

HI Guys, I have data in File A.txt RL03 RL03_A_1 RL03_B_1 RL03_C_1 RL03 -119.8 -119.5 -119.5 RL07 RL07_A_1 RL07_B_1 RL07_C_1 RL07 -119.3 -119.5 -119.5 RL15 RL15_A_1 RL15_C_1 RL15 -120.5 -119.4 RL16... (2 Replies)
Discussion started by: asavaliya
2 Replies

5. Shell Programming and Scripting

convert data into matrix- awk

is it possible to count the number of keys based on state and cell and output it as a simple matrix. Ex: cell1-state1 has 2 keys cell3-state1 has 4 keys. Note: Insert 0 if no data available. input key states cell key1 state1 cell1 key1 state2 cell1 key1 ... (21 Replies)
Discussion started by: quincyjones
21 Replies

6. Ubuntu

How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies

7. Shell Programming and Scripting

Cut and paste data in matrix form

I have large formatted data file with five columns. This has to be rearranged in lower order matrix form as shown below for sample data. 1 2 3 4 5 1.0 3.0 2.0 5.0 3.0 2.0 4.0 3.0 1.0 6.0 2.0 3.0 4.0 5.0 1.0 1.0 4.0 2.0 3.0 5.0 3.0 5.0 4.0 2.0 8.0 1.0 3.0 2.0 4.0 5.0 2.0... (7 Replies)
Discussion started by: dhilipumich
7 Replies

8. Shell Programming and Scripting

extract data from a data matrix with filter criteria

Here is what old matrix look like, IDs X1 X2 Y1 Y2 10914061 -0.364613333 -0.362922333 0.001691 -0.450094667 10855062 0.845956333 0.860396667 0.014440333 1.483899333... (7 Replies)
Discussion started by: ssshen
7 Replies

9. Shell Programming and Scripting

Reformatting Data in AWK

Dear AWK Users, I have a data set that is so large (Gigabytes) that it cannot be opened in the vi editor in its entirety. But I can manipulate the entire thing in AWK. It is formatted in a regular manner such that it has the variable descriptions or listings preceeding the variables. The latter... (13 Replies)
Discussion started by: sda_rr
13 Replies

10. UNIX for Dummies Questions & Answers

changing data into matrix form

Hi, I have a file whose structure is like this 7 7 1 2 3 4 5 1 3 4 8 6 1 4 5 6 0 2 6 8 3 8 2 5 7 8 0 5 7 9 4 1 3 8 0 2 2 3 5 6 8 basically first two row tell the number of rows and column but the data following them are not arranged in that format. now i want to create another... (1 Reply)
Discussion started by: g0600014
1 Replies
Login or Register to Ask a Question