Perl : Large amount of data put into an array


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl : Large amount of data put into an array
# 1  
Old 06-05-2013
Perl : Large amount of data put into an array

This basic code works.

I have a very long list, almost 10000 lines that I am building into the array. Each line has either 2 or 3 fields as shown in the code snippit. The array elements are static (for a few reasons that out of scope of this question) the list has to be "built in".

It runs very well in the Nix environment but slows down in the Windows environment.

Question is what can I do to speed things up ?


Code:
@MYARRAY = ("field1 field2 field3", "filed1 field2" ........ 10000 lines);
foreach $eachline (@MYARRAY) {
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   } 
}

# 2  
Old 06-05-2013
Maybe eliminate double handling, doing something akin to shell:
Code:
grep <pattern> <<EOF
field1 field2 field3-line1
field1 field2 field3-line2
...
EOF

# 3  
Old 06-05-2013
So many lines - you normally read them from a file!
Then instead of
Code:
open (FH, "<file");
@MYARRAY = <FH>;
foreach $eachline (@MYARRAY) {
   chomp $eachline;
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   }
}

You save the memory with
Code:
open (FH, "<file");
foreach $eachline (<FH>) {
   chomp $eachline;
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   }
}

and in this case even save the loop with
Code:
open (FH, "<file");
print grep (/\b$ARGV[0]\b/, <FH>);

# 4  
Old 06-06-2013
Quote:
Originally Posted by MadeInGermany
You save the memory with
Code:
open (FH, "<file");
foreach $eachline (<FH>) {
   chomp $eachline;
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   }
}

Actually, the program would be significantly quicker if you read the file using a while loop. Here's a small test to prove it:

Code:
[user@host ~]$ seq 1 1000000 > file
[user@host ~]$ time perl -e 'open I, "< file"; for (<I>) {$i++}; close I; END { print "$i\n" }'
1000000

real    0m0.563s
user    0m0.546s
sys     0m0.046s
[user@host ~]$ time perl -e 'open I, "< file"; while (<I>) {$i++}; close I; END { print "$i\n" }'
1000000

real    0m0.156s
user    0m0.171s
sys     0m0.015s
[user@host ~]$

Quote:
Originally Posted by MadeInGermany
and in this case even save the loop with
Code:
open (FH, "<file");
print grep (/\b$ARGV[0]\b/, <FH>);

IMHO, though visually this is not a loop, technically it is, as grep "evaluates the block or expression for each element of list"as per perldoc.

Last edited by balajesuri; 06-06-2013 at 12:46 AM..
# 5  
Old 06-06-2013
Quote:
Originally Posted by MadeInGermany
So many lines - you normally read them from a file!
Then instead of
Code:
open (FH, "<file");
@MYARRAY = <FH>;
foreach $eachline (@MYARRAY) {
   chomp $eachline;
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   }
}

You save the memory with
Code:
open (FH, "<file");
foreach $eachline (<FH>) {
   chomp $eachline;
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   }
}

and in this case even save the loop with
Code:
open (FH, "<file");
print grep (/\b$ARGV[0]\b/, <FH>);

All these snippets may be pretty memory-intensive. In all these cases, the file-handle is being read in list context. So, you'd end up consuming (and storing in memory) the file data all at once.
As balajesuri has already pointed out, reading the handle in scalar context would be much better.
# 6  
Old 06-10-2013
The only thing faster than the scalar loop is to memory map the file in PERL so it is a big string in memory, and not by much, as many OS read flat files via mmap() (automatic buffering in RAM via VM using no swap).

It's a classic case of advanced tools doing expensive favors for you! But there is a base cost below which perl level code cannot go.

Compressing the file might speed the flow out a pipe, as CPUs are so much faster than disks. Old compress is faster than gzip -1. However, if it gets referenced on the same host often, it may be in RAM.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to make awk command faster for large amount of data?

I have nginx web server logs with all requests that were made and I'm filtering them by date and time. Each line has the following structure: 127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br) These text files are... (21 Replies)
Discussion started by: brenoasrm
21 Replies

2. Shell Programming and Scripting

Read file and get the data then put it in array

Hi, I have a file called "readfile" it contains below parameters #cat readfile word=/abc=225,/abc/cba=150 three=12 four=45 five=/xyz/yza likewise multiple line. From the above file, I have to read "word" output should be like, /abc /abc/cba these values need to be put in... (3 Replies)
Discussion started by: munna_dude
3 Replies

3. Shell Programming and Scripting

How do I put data piped to my script into an array for repeated processing

Hi folks, Sorry for something I'm sure was answered already, I just could not find it in a search of the forums. I am trying to build a script that eats a config file as: cat file.cnf | ConProcess.shWhat I want to do inside the script is: !#/usr/bin/bash # captured piped cat into an... (6 Replies)
Discussion started by: Marc G
6 Replies

4. Shell Programming and Scripting

storing large data in unix array variable

Hi, I have table in sql ..from this table im storing the first coloumn values in shell array variable ... after this passing this variable as an arugument in SQL procedure. But the proc. is running fine only for 1024 values in array ... How to store more than 1024 values in the array... (5 Replies)
Discussion started by: ankitknit
5 Replies

5. Shell Programming and Scripting

How to tar large amount of files?

Hello I have the following files VOICE_hhhh SUBSCR_llll DEL_kkkk Consider that there are 1000 VOICE files+1000 SUBSCR files+1000DEL files When i try to tar these files using tar -cvf backup.tar VOICE* SUBSCR* DEL* i get the error: ksh: /usr/bin/tar: arg list too long How can i... (9 Replies)
Discussion started by: chriss_58
9 Replies

6. Shell Programming and Scripting

perl, put one array into many array when field is equal to sth

Hi Everyone, #!/usr/bin/perl use strict; use warnings; my @test=("a;b;qqq;c;d","a;b;ggg;c;d","a;b;qqq;c;d"); would like to split the @test array into two array: @test1=(("a;b;qqq;c;d","a;b;qqq;c;d"); and @test2=("a;b;ggg;c;d"); means search for 3rd filed. Thanks find the... (0 Replies)
Discussion started by: jimmy_y
0 Replies

7. Shell Programming and Scripting

Perl Array / pattern match large CPU usage

Hi, I have one file in this format 20 value1 33 value2 56 value3 I have another file in this format: 34,30-SEP-09,57,100237775,33614510126,2,34 34,30-SEP-09,57,100237775,33620766654,2,34 34,30-SEP-09,108,100237775,33628458122,2,34 34,30-SEP-09,130,100237775,33635266741,2,254... (6 Replies)
Discussion started by: Donkey25
6 Replies

8. AIX

amount of memory allocated to large page

We just set up a system to use large pages. I want to know if there is a command to see how much of the memory is being used for large pages. For example if we have a system with 8GB of RAm assigned and it has been set to use 4GB for large pages is there a command to show that 4GB of the *GB is... (1 Reply)
Discussion started by: daveisme
1 Replies

9. Programming

Read/Write a fairly large amount of data to a file as fast as possible

Hi, I'm trying to figure out the best solution to the following problem, and I'm not yet that much experienced like you. :-) Basically I have to read a fairly large file, composed of "messages" , in order to display all of them through an user interface (made with QT). The messages that... (3 Replies)
Discussion started by: emitrax
3 Replies

10. Linux

shmat() Failure While Using a Large Amount of Shared Memory

Hi, I'm developing a data processing pipeline with multiple stages, with data being moved between the stages using shared memory segments. The size of the data is typically of the order of hundreds of megabytes, and there are typically a few tens of main shared memory segments each of size... (2 Replies)
Discussion started by: theicarusagenda
2 Replies
Login or Register to Ask a Question