Transpose columns to Rows : Big data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Transpose columns to Rows : Big data
# 1  
Old 08-09-2010
Transpose columns to Rows : Big data

Hi,
I did read a few posts on the subjects, tried out a few solutions, but did not solve my problem.
https://www.unix.com/302121568-post11.html
https://www.unix.com/shell-programmin...ows-etc-4.html

Please help. Problem very similar to the second link poster, but slighlt different input format. The field separator is space . The actual data matrix is a file with 2000 rows and 600,000 columns.
Input style:
Code:
IID    PAT    MAT    SEX    PHENOTYPE    rs15286_1    rs319_1    rs80300_1    rs40777_1    rs8597_1    rs5136_1    rs60595_1    rs64968_1    rs4405_1    rs1554_1
TD-MIKV    0 0 2 1 1 0 0 1 0 1 0 1 1 0
TD-HA4Q 0 0 2 1 1 0 0 0 0 0 0 0 0 0
TD-H9ZG 0 0 2 2 0 0 0 1 0 0 0 0 0 0
TD-HAQX 0 0 2 1 0 0 0 2 0 0 0 0 0 0
TD-HA5E 0 0 2 2 0 1 1 1 0 0 0 1 1 0
TD-MGFV 0 0 2 2 1 0 0 0 0 NA 0 0 0 1
TD-HB4V 0 0 2 1 0 0 1 0 1 NA 0 1 1 0
TD-MIPE 0 0 2 2 0 0 0 1 0 0 0 0 0 0
TD-MINR 0 0 2 2 0 0 0 0 0 2 0 1 1 0

Output style
Code:
   IID TD-MIKV TD-HA4Q TD-H9ZG TD-HAQX TD-HA5E TD-MGFV TD-HB4V TD-MIPE TD-MINR
PAT 0 0 0 0 0 0 0 0 0
MAT 0 0 0 0 0 0 0 0 0
SEX 2 2 2 2 2 2 2 2 2
PHENOTYPE 1 1 2 1 2 2 1 2 2
rs15286_1 1 1 0 0 0 1 0 0 0
rs319_1 0 0 0 0 1 0 0 0 0
rs80300_1 0 0 0 0 1 0 1 0 0
rs40777_1 1 0 1 2 1 0 0 1 0
rs8597_1 0 0 0 0 0 0 1 0 0
rs5136_1 1 0 0 0 0 NA NA 0 2
rs60595_1 0 0 0 0 0 0 0 0 0
rs64968_1 1 0 0 0 1 0 1 0 1
rs4405_1 1 0 0 0 1 0 1 0 1
rs1554_1 0 0 0 0 0 1 0 0 0

awk or python preferable, since I understand them a teeny weeny bit.
Thanks in advance,
Regards
~GH
# 2  
Old 08-09-2010
A bit ole school... but here's a possible shell solution:

Code:
#!/bin/sh

filein="$1"
fileout="$2"
cols=`head -1 <"$filein" | wc -w`
count=1
while [ $count -le $cols ]; do
        if [ $count = 1 ]; then
                tr -s ' ' ' ' <"$filein" | cut -f$count -d' ' | tr '\012' ' ' >"$fileout"
        else
                tr -s ' ' ' ' <"$filein" | cut -f$count -d' ' | tr '\012' ' ' >>"$fileout"
        fi
        echo "" >>"$fileout"
        count=`expr $count + 1`
done

In some cases, this may actually work better than reading everything into memory and processing... Anyhow... something to consider...
# 3  
Old 08-09-2010
#deleted#

Last edited by kurumi; 08-10-2010 at 11:39 PM..
# 4  
Old 08-10-2010
Code:
awk '{for (i=1; i<=NF; i++) a[i]=a[i](NR!=1?FS:"")$i} END {for (i=1; i in a; i++) print a[i]}' file

# 5  
Old 08-10-2010
Quote:
Originally Posted by alister
Code:
awk '{for (i=1; i<=NF; i++) a[i]=a[i](NR!=1?FS:"")$i} END {for (i=1; i in a; i++) print a[i]}' file

The code will have problem, if some lines don't have same columns.

kurumi's code is still correct in this situation.
# 6  
Old 08-10-2010
Quote:
Originally Posted by rdcwayx
The code will have problem, if some lines don't have same columns.

kurumi's code is still correct in this situation.
Strictly speaking, neither solution correctly handles irregular lengths; properly handling a file with differing row lengths would involve not making any assumptions based on a single row (if row n has more columns than row 1, truncation would occur).

In any case, nothing in the original post implies that rows vary in width. Quite the contrary, the word "matrix" is used, which indicates a rectangular array. Still, it's good that you pointed that out just in case it is a concern.

On a different note, if memory allows, when dealing with a very large dataset, my solution should be much much much faster. Only one instance of awk is needed and the file is only read once. kurumi's will fork-exec 600,000 awk processes and read the file 600,000 times.
# 7  
Old 08-10-2010
Quote:
Originally Posted by alister
On a different note, if memory allows, when dealing with a very large dataset, my solution should be much much much faster. Only one instance of awk is needed and the file is only read once. kurumi's will fork-exec 600,000 awk processes and read the file 600,000 times.
I wrote the code based on what OP has provided, so please don't assume anything else. As for the 600,000 figure i don't know where you got it from. I assume you mean 600,000 lines of records. If that's so, my code will call awk 15 * 600,000 times (for 15 columns of date) i/o, while yours will fill up memory with the whole big file. Because OP has a big file, do you think its advisable to fill everything to memory ? think about it.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Transpose rows to certain columns

Hello, I have the following data and I want to use awk to transpose each value to a certain column , so in case the value is not available the column should be empty. Example: Box Name: BoxA Weight: 1 Length :2 Depth :3 Color: red Box Name: BoxB Weight: 3 Length :4 Color: Yellow... (5 Replies)
Discussion started by: rahman.ahmed
5 Replies

2. Shell Programming and Scripting

Transpose comma delimited data in rows to columns

Hello, I have a bilingual database with the following structure a,b,c=d,e,f The right half is in a Left to right script and the second is in a Right to left script as the examples below show What I need is to separate out the database such that the first word on the left hand matches the first... (4 Replies)
Discussion started by: gimley
4 Replies

3. Shell Programming and Scripting

Transpose rows to columns complex

Input: IN,A,1 IN,B,3 IN,B,2 IN,C,7 BR,A,1 BR,A,5 BR,C,9 AR,C,9 Output: CNTRY,A,B,C IN,1,5,7 BR,6,0,9 AR,0,0,9 (7 Replies)
Discussion started by: unme
7 Replies

4. Shell Programming and Scripting

awk to transpose every 7 rows into columns

input: a1 a2 a3 a4 a5 a6 a7 b1 b2 b3 .. b7 .. z1 .. z7 (12 Replies)
Discussion started by: ux4me
12 Replies

5. Shell Programming and Scripting

Columns to Rows - Transpose - Special Condition

Hi Friends, Hope all is well. I have an input file like this a gene1 10 b gene1 2 c gene2 20 c gene3 10 d gene4 5 e gene5 6 Steps to reach output. 1. Print unique values of column1 as column of the matrix, which will be a b c (5 Replies)
Discussion started by: jacobs.smith
5 Replies

6. Shell Programming and Scripting

Transpose Data from Columns to rows

Hello. very new to shell scripting and would like to know if anyone could help me. I have data thats being pulled into a txt file and currently have to manually transpose the data which is taking a long time to do. here is what the data looks like. Server1 -- Date -- Other -- value... (7 Replies)
Discussion started by: Mikes88
7 Replies

7. Shell Programming and Scripting

transpose rows to columns

Any tips on how I can awk the input data to display the desired output per below? Thanking you in advance. input test data: 2 2010-02-16 10:00:00 111111111111 bytes 99999999999 bytes 90% 4 2010-02-16 12:00:00 333333333333 bytes 77777777777 bytes 88% 5 2010-02-16 11:00:00... (4 Replies)
Discussion started by: ux4me
4 Replies

8. Shell Programming and Scripting

Transpose columns to Rows

I have a data A 1 B 2 C 3 D 4 E 5 i would like to change the data A B C D E 1 2 3 4 5 Pls suggest how we can do it in UNIX. Start using code tags, thanks. Also start reading your PM's you get from Mods as well read the Forum Rules. That might not do any harm. (24 Replies)
Discussion started by: aravindj80
24 Replies

9. Shell Programming and Scripting

Transpose Rows Into Columns

I'm aware there are a lot of resources dedicated to the question of transposing rows and columns, but I'm a total newbie at this and the task appears to be beyond me. I have 40 text files with content that looks like this: Dokument 1 von 146 Orange County Register (California) June 26, 2010... (2 Replies)
Discussion started by: spindoctor
2 Replies

10. Shell Programming and Scripting

Rows to Columns - File Transpose

Hi I have an input file and I want to transpose it but I need to take care that if any field is missing for a record it should be popoulated with space for that field - using a shell script INFILE ---------- emp=1 sal=2 loc=abc emp=2 sal=21 sal=22 loc=xyz emp=5 loc=abc OUTFILE... (10 Replies)
Discussion started by: 46019
10 Replies
Login or Register to Ask a Question