Large file - columns into rows etc


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Large file - columns into rows etc
# 8  
Old 06-09-2010
No will not matter;Tabs are considered to be spaces (actually 8 spaces)

Smilie
# 9  
Old 06-09-2010
Array::Transpose Uses two Variables

if the first example I gave runs the box out of memory with only one array, using Array::Transpose should run the box out of memory twice as quickly, yes?

After a brief RTS A::T uses two named variables to do the lifting
# 10  
Old 06-09-2010
same code with modification where the max array (@in) size is equal to row size SmilieSmilieSmilie @in array of references:

Code:
#!/bin/perl -w

use Array::Transpose ;
open(FH,"<infile.txt") or die "something wrong $!" ;

while (<FH>) {
chomp $_ ;
push @in , [ split /\s+/ , $_  ] ;
}

END{
         foreach (transpose(\@in)) {
                  print "@{$_}\n" ;
	     }
}

SmilieSmilieSmilie
# 11  
Old 06-11-2010
Thanks!

I think this has worked. I'm still getting the "out of memory" and the same grep errors but when I count the rows and columns, they seem to be right.

Dunno how else to check whether the file has been sorted correctly, though, if I can't view it.
# 12  
Old 06-11-2010
The algorithm in Array::Transpose is better than the one I was working together, I would suggest using that. The out of memory error is still there because there is still a single array holding all the data at once.

One way around this problem is to break the run into smaller chunks of data

Code:
bash> head -100000 | tail -100000 | perl -e '....' > out.1
bash> head -200000 | tail -100000 | perl -e '....' > out.2
bash> head -300000 | tail -100000 | perl -e '....' > out.3
bash> head -400000 | tail -100000 | perl -e '....' > out.4
bash> head -500000 | tail -100000 | perl -e '....' > out.5

Then /bin/cat the out files together:

Code:
bash> cat out.1 out.2 out.3 out.4 out.5 > concattted.out

This will break the data in five more managable chunks. Let us know if you are still getting the "Out of Memory!" Error. you can always break the datafile into 50_000 line segments, and so on...

HTH,

Scott
# 13  
Old 06-11-2010
I will try this. One question though, which could be slightly stupid:

For example: head -100000 | tail -100000 | perl -e '....' > out.1

I know head -100000 = first 100000 lines of the file and
tail -100000 = last 100000 lines in the file...
but what lines are specified if you put both like you have done?
# 14  
Old 06-11-2010
Split Large file with 2

Quote:
Originally Posted by Myrona
what lines are specified if you put both like you have done?
You are correct that the first line is redundant, i.e. you could simply do this:

Code:
bash> head -100000 | perl -e '....' > out.1

I use this idiom frequently, and I like starting the pattern of pipelines in a way that can stay constant. It a preference thing.

you could also cat the first one

Last edited by deindorfer; 06-11-2010 at 12:40 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extract spread columns from large file

Dear all, I want to extract around 300 columns from a very large file with almost 2million columns. There are no headers, but I can find out which column numbers I want. I know I can extract with the function 'cut -f2' for example just the second column but how do I do this for such a large... (1 Reply)
Discussion started by: fndijk
1 Replies

2. UNIX for Dummies Questions & Answers

Help with solution to add together columns of large file

Hi everyone. I have a file with ~500 columns and I would like to perform a simple calculation on every two columns. The file looks like this: $cat input id A B C D E F.....X 1 2 4 2 3 4 1 n 2 4 6 4 6 4 5 n 3 4 7 5 2 2 3 n 4 ... (5 Replies)
Discussion started by: torchij
5 Replies

3. Shell Programming and Scripting

Deleting all the fields(columns) from a .csv file if all rows in that columns are blanks

Hi Friends, I have come across some files where some of the columns don not have data. Key, Data1,Data2,Data3,Data4,Data5 A,5,6,,10,, A,3,4,,3,, B,1,,4,5,, B,2,,3,4,, If we see the above data on Data5 column do not have any row got filled. So remove only that column(Here Data5) and... (4 Replies)
Discussion started by: ks_reddy
4 Replies

4. Shell Programming and Scripting

Dedup a large file(30M rows)

Hi, I have a large file with number of records in there. I need some help to find only first row based on a key and ignore other rows with the same key. I tried few things but file is huge(30 million rows). So need some solution that is very efficient. e.g Junk|Apple|7|Random|data|here...... (2 Replies)
Discussion started by: ran123
2 Replies

5. Shell Programming and Scripting

Convert columns to rows in a file

Hello, I have a huge tab delimited file with around 40,000 columns and 900 rows I want to convert columns to a row. INPUT file look like this. the first line is a headed of a file. ID marker1 marker2 marker3 marker4 b1 A G A C ... (5 Replies)
Discussion started by: ryan9011
5 Replies

6. UNIX for Dummies Questions & Answers

Delete large number of columns rom file

Hi, I have a data file that contains 61 columns. I want to delete all the columns except columns, 3,6 and 8. The columns are tab de-limited. How would I achieve this on the terminal? Thanks (2 Replies)
Discussion started by: lost.identity
2 Replies

7. Shell Programming and Scripting

Deleting specific rows in large files having rows greater than 100000

Hi Guys, I need help in modifying a large text file containing more than 1-2 lakh rows of data using unix commands. I am quite new to the unix language the text file contains data in a pipe delimited format sdfsdfs sdfsdfsd START_ROW sdfsd|sdfsdfsd|sdfsdfasdf|sdfsadf|sdfasdf... (9 Replies)
Discussion started by: manish2009
9 Replies

8. Shell Programming and Scripting

Rows to Columns - File Transpose

Hi I have an input file and I want to transpose it but I need to take care that if any field is missing for a record it should be popoulated with space for that field - using a shell script INFILE ---------- emp=1 sal=2 loc=abc emp=2 sal=21 sal=22 loc=xyz emp=5 loc=abc OUTFILE... (10 Replies)
Discussion started by: 46019
10 Replies

9. Shell Programming and Scripting

How to delete rows by RowNumber from a Large text file

Friends, I have text file with 700,000 rows. Once I load this file to our database via our cutom process, it logs the row number for rejected rows. How do I delete rows from a Large text file based on the Row Number? Thanks, Prashant (8 Replies)
Discussion started by: ppat7046
8 Replies

10. Shell Programming and Scripting

How to changes rows to columns in a file

Hi, I have a small requirement in chainging the rows to columns. The below example.txt contains info as shown Name:Person1 Age:30 Name:Person2 Age:40 Name:Person3 Age:50 I want to make it displayed as hown below Name:Person1 Age:30 Name:person2 Age:40 Name:Person3 Age:50 I... (4 Replies)
Discussion started by: oracle123
4 Replies
Login or Register to Ask a Question