Transpose columns to Rows : Big data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Transpose columns to Rows : Big data
# 8  
Old 08-10-2010
I have not assumed anything but what was stated in the original post. And yes, if there is sufficient ram to hold the data in memory, then most definitely that would be the best way to go (unless you want to wait days instead of hours).

The 600,000 figure comes from the original post, which states that the data contains 2,000 rows of 600,000 columns each (that's 600,001 awk invocations, 1 to determine the number of columns in the first row plus one per column), with each of those invocations reading the entire file.

A test run on a small (compared to 2,000x600,000) 500x1000 matrix shows that your i/o heavy approach takes 2m35s versus 5s for the ram hungry solution. So, yeah, I think it's a good idea to fill up memory when the gains are on the order of 30x faster (1 hour versus 1.25 days). And as the data set grows, so will the disparity in performance.

Obviously, if the machine does not have sufficient ram, then my approach is not feasible. If it does, it's the way to go.

Last edited by alister; 08-10-2010 at 03:55 AM..
# 9  
Old 08-10-2010
Quote:
Originally Posted by alister
A test run on a small (compared to 2,000x600,000) 500x1000 matrix shows that your i/o heavy approach takes 2m35s versus 5s for the ram hungry solution.
Ok, so the OP has 2000x 600000 columns. that is over 2GB in file size. do you slurp more than 2GB into memory to do such stuff? What about other processes?

Last edited by kurumi; 08-10-2010 at 07:05 AM..
# 10  
Old 08-10-2010
Quote:
Originally Posted by kurumi
Ok, so the OP has 2000x 600000 columns. that is over 2GB in file size. do you slurp more than 2GB into memory to do such stuff? What about other processes?
Hi kurumi:

What about other processes? Perhaps there are none besides the basic os services. Perhaps there are thousands. Perhaps there is a dedicated machine for this problem. Perhaps not. Perhaps the hardware available to solve this problem is an old pentium with only a few megabytes of free ram. Perhaps it's a 64-bit monster with many gigabytes of free ram. I do not know. You do not know.

What we do know is that your solution is horribly slow and inefficient even on comparitively small datasets, but it may be the best and only possible approach if there is insufficient ram. My solution is much faster, but horribly ram hungry. This type of trade-off (ram vs i/o-time) is common in algorithmic design. Choice is good and the original poster can choose which best suits his situation, if indeed either of these solutions is suitable.

If I had that 64-bit monster at my disposal, I know which approach I'd choose. Smilie

Regards,
Alister
# 11  
Old 08-10-2010
I'd be interested in timing the various solutions on your large data.
# 12  
Old 08-12-2010
Just to update.
I am running the solution from cjcox since 3 days.
I am almost at the end of the run.
Will test that & report.
I will also test kurumi's solution if you can re-post it.
And alister's solution too.

All I can say is thanks for the solution.. but it takes a long time !
Can you let me know how to time the run already started?
Thanks
# 13  
Old 08-12-2010
Hello, genehunter:

I don't think you can time it after having started it, except perhaps by using ps to determine the process' start/elapsed time. You can then poll once per minute or so with a script or cronjob until the process is no longer running, at which point run date to record the current time.

For future runs, prepend 'time ' to whatever command you use to launch the job.

On a different note, I'm curious. What's the size of the data file and what are the specs of the machine being used to process it?

Regards,
Alister

Last edited by alister; 08-12-2010 at 07:33 PM..
# 14  
Old 08-12-2010
The size of the datafile is 1.9G
I am running on a 12 core Xeon with openSUSE 10.2 (X86-64)
66002864k total, 65684428k used.
Hope some of that makes sense! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Transpose rows to certain columns

Hello, I have the following data and I want to use awk to transpose each value to a certain column , so in case the value is not available the column should be empty. Example: Box Name: BoxA Weight: 1 Length :2 Depth :3 Color: red Box Name: BoxB Weight: 3 Length :4 Color: Yellow... (5 Replies)
Discussion started by: rahman.ahmed
5 Replies

2. Shell Programming and Scripting

Transpose comma delimited data in rows to columns

Hello, I have a bilingual database with the following structure a,b,c=d,e,f The right half is in a Left to right script and the second is in a Right to left script as the examples below show What I need is to separate out the database such that the first word on the left hand matches the first... (4 Replies)
Discussion started by: gimley
4 Replies

3. Shell Programming and Scripting

Transpose rows to columns complex

Input: IN,A,1 IN,B,3 IN,B,2 IN,C,7 BR,A,1 BR,A,5 BR,C,9 AR,C,9 Output: CNTRY,A,B,C IN,1,5,7 BR,6,0,9 AR,0,0,9 (7 Replies)
Discussion started by: unme
7 Replies

4. Shell Programming and Scripting

awk to transpose every 7 rows into columns

input: a1 a2 a3 a4 a5 a6 a7 b1 b2 b3 .. b7 .. z1 .. z7 (12 Replies)
Discussion started by: ux4me
12 Replies

5. Shell Programming and Scripting

Columns to Rows - Transpose - Special Condition

Hi Friends, Hope all is well. I have an input file like this a gene1 10 b gene1 2 c gene2 20 c gene3 10 d gene4 5 e gene5 6 Steps to reach output. 1. Print unique values of column1 as column of the matrix, which will be a b c (5 Replies)
Discussion started by: jacobs.smith
5 Replies

6. Shell Programming and Scripting

Transpose Data from Columns to rows

Hello. very new to shell scripting and would like to know if anyone could help me. I have data thats being pulled into a txt file and currently have to manually transpose the data which is taking a long time to do. here is what the data looks like. Server1 -- Date -- Other -- value... (7 Replies)
Discussion started by: Mikes88
7 Replies

7. Shell Programming and Scripting

transpose rows to columns

Any tips on how I can awk the input data to display the desired output per below? Thanking you in advance. input test data: 2 2010-02-16 10:00:00 111111111111 bytes 99999999999 bytes 90% 4 2010-02-16 12:00:00 333333333333 bytes 77777777777 bytes 88% 5 2010-02-16 11:00:00... (4 Replies)
Discussion started by: ux4me
4 Replies

8. Shell Programming and Scripting

Transpose columns to Rows

I have a data A 1 B 2 C 3 D 4 E 5 i would like to change the data A B C D E 1 2 3 4 5 Pls suggest how we can do it in UNIX. Start using code tags, thanks. Also start reading your PM's you get from Mods as well read the Forum Rules. That might not do any harm. (24 Replies)
Discussion started by: aravindj80
24 Replies

9. Shell Programming and Scripting

Transpose Rows Into Columns

I'm aware there are a lot of resources dedicated to the question of transposing rows and columns, but I'm a total newbie at this and the task appears to be beyond me. I have 40 text files with content that looks like this: Dokument 1 von 146 Orange County Register (California) June 26, 2010... (2 Replies)
Discussion started by: spindoctor
2 Replies

10. Shell Programming and Scripting

Rows to Columns - File Transpose

Hi I have an input file and I want to transpose it but I need to take care that if any field is missing for a record it should be popoulated with space for that field - using a shell script INFILE ---------- emp=1 sal=2 loc=abc emp=2 sal=21 sal=22 loc=xyz emp=5 loc=abc OUTFILE... (10 Replies)
Discussion started by: 46019
10 Replies
Login or Register to Ask a Question