Form balanced matrix by filtering data


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Form balanced matrix by filtering data
# 1  
Old 10-21-2014
Form balanced matrix by filtering data

I need to form a matrix out of unbalanced set of records. First eliminate the sample that do not have at least 3 variables (col2). So, in the example, samples 4 and 5 get eliminated.

Then form a matrix of values (col3) from the samples using only variables that are present accross all samples. So in the example, var4 from sample 3 gets eliminated.



Code:
 
Input
sample1 var1 xx
sample2 var1 yy
sample3 var1 zz
sample4 var1 zz
sample1 var2 xx
sample2 var2 xx
sample3 var2 yy
sample5 var2 yy
sample1 var3 xx
sample2 var3 tt
sample3 var3 yy
sample3 var4 yy
sample5 var3 yy
 
Output
sample1 sample2 sample3
xx yy zz
xx xx yy
xx tt yy

# 2  
Old 10-21-2014
What have you tried so far?
# 3  
Old 10-21-2014
The following approach might work for a smaller data set but for millions of rows that I have will need some sophisticated approach.
I have broken it down into steps,

Code:
 
awk '{print $1}' mydata | sort | uniq -c | awk '{ if ($1>2) print $2}' > tmp
 
grep -f tmp mydata > mydata_filtered

Then I take my data into R and use the reshape package

Code:
 
library(reshape)
mydata=read.table('mydata_filtered')
y=cast(mydata,mydata$V1~mydata$V2,value=mydata$V3)

# 4  
Old 10-22-2014
While this
Code:
awk     '       {LN[$2]++; HD[$1]++; MX[$2,$1]=$3}
         END    {for (i in HD) if (HD[i] < 3) delete HD[i]
                 for (i in LN) if (LN[i] < 3) delete LN[i]
                                printf "%10s", ""; for (i in HD) printf "%10s", i; print "";
                 for (j in LN) {printf "%10s",j;   for (i in HD) printf "%10s", MX[j,i]; print ""}
                }
        ' file
             sample1   sample2   sample3
      var1        xx        yy        zz
      var2        xx        xx        yy
      var3        xx        tt        yy

works for the small sample given, I'm afraid it will show limitations soon as the input file grows larger...

---------- Post updated at 13:33 ---------- Previous update was at 13:09 ----------

OK, this might do:
Code:
awk     '       {LN[$2]++; HD[$1]++; MX[$2,$1]=$3}
         END    {do     {CNT=0
                         for (i in HD) if (HD[i] < 3) {delete HD[i]; for (j in LN) if (MX[j,i]) {delete MX[j,i]; LN[j]--; CNT++}}
                         for (j in LN) if (LN[j] < 3) {delete LN[j]; for (i in HD) if (MX[j,i]) {delete MX[j,i]; HD[i]--; CNT++}}
                        }
                 while (CNT > 0)

                                printf "%10s", ""; for (i in HD) printf "%10s", i; print "";
                 for (j in LN) {printf "%10s",j;   for (i in HD) printf "%10s", MX[j,i]; print ""}
                }
        ' file

Please test on a meaningful data set.
This User Gave Thanks to RudiC For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match child with parents and form matrix

thank you for letting me join this forum, lots of learning opportunities looks like. Myself a biologist, very new into unix, so please excuse if I use incorrect language. I am using cygwin on windows, it can run perl, awk , sed etc. I have 2 files, the first sample sheet, tells which parent... (10 Replies)
Discussion started by: jalaj841
10 Replies

2. Shell Programming and Scripting

How order a data matrix using awk?

is it possible to order the following row clusters from ascending to descending. thanx in advance input 1 2 4 0 1 2 4 0 3 3 3 3 1 5 1 0 1 5 1 0 6 0 0 0 5 1 1 1... (4 Replies)
Discussion started by: quincyjones
4 Replies

3. Shell Programming and Scripting

[Solved] Converting the data into matrix with 0's and 1's

I have a file that contains 2 columns tag,pos cat input_file tag pos atg 10 ata 16 agt 15 agg 19 atg 17 agg 14 I have used following command to sort the file based on second column sort -k 2 input_file tag pos atg 10 agg 14 agt 15 ata 16 agg 19 atg 17 (2 Replies)
Discussion started by: raj_k
2 Replies

4. Shell Programming and Scripting

Transpose Data form Different form

HI Guys, I have data in File A.txt RL03 RL03_A_1 RL03_B_1 RL03_C_1 RL03 -119.8 -119.5 -119.5 RL07 RL07_A_1 RL07_B_1 RL07_C_1 RL07 -119.3 -119.5 -119.5 RL15 RL15_A_1 RL15_C_1 RL15 -120.5 -119.4 RL16... (2 Replies)
Discussion started by: asavaliya
2 Replies

5. Shell Programming and Scripting

Reformatting data in matrix form

Hi, Some assistance with respect to the following problem will be very helpful. I want to reformat my dataset in the following manner for subsequent analysis. I have first column values (which repeat for each value of 2nd column) which are names, the second column specifies position ad the... (1 Reply)
Discussion started by: newbie83
1 Replies

6. Shell Programming and Scripting

convert data into matrix- awk

is it possible to count the number of keys based on state and cell and output it as a simple matrix. Ex: cell1-state1 has 2 keys cell3-state1 has 4 keys. Note: Insert 0 if no data available. input key states cell key1 state1 cell1 key1 state2 cell1 key1 ... (21 Replies)
Discussion started by: quincyjones
21 Replies

7. Ubuntu

How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies

8. Shell Programming and Scripting

Cut and paste data in matrix form

I have large formatted data file with five columns. This has to be rearranged in lower order matrix form as shown below for sample data. 1 2 3 4 5 1.0 3.0 2.0 5.0 3.0 2.0 4.0 3.0 1.0 6.0 2.0 3.0 4.0 5.0 1.0 1.0 4.0 2.0 3.0 5.0 3.0 5.0 4.0 2.0 8.0 1.0 3.0 2.0 4.0 5.0 2.0... (7 Replies)
Discussion started by: dhilipumich
7 Replies

9. Shell Programming and Scripting

extract data from a data matrix with filter criteria

Here is what old matrix look like, IDs X1 X2 Y1 Y2 10914061 -0.364613333 -0.362922333 0.001691 -0.450094667 10855062 0.845956333 0.860396667 0.014440333 1.483899333... (7 Replies)
Discussion started by: ssshen
7 Replies

10. UNIX for Dummies Questions & Answers

changing data into matrix form

Hi, I have a file whose structure is like this 7 7 1 2 3 4 5 1 3 4 8 6 1 4 5 6 0 2 6 8 3 8 2 5 7 8 0 5 7 9 4 1 3 8 0 2 2 3 5 6 8 basically first two row tell the number of rows and column but the data following them are not arranged in that format. now i want to create another... (1 Reply)
Discussion started by: g0600014
1 Replies
Login or Register to Ask a Question