Perl : Large amount of data put into an array

06-05-2013

Registered User

23, 1

Join Date: May 2006

Last Activity: 13 November 2018, 10:06 AM EST

Location: Hurdle Mills North Carolina

Posts: 23

Thanks Given: 16

Thanked 1 Time in 1 Post

Perl : Large amount of data put into an array

This basic code works.

I have a very long list, almost 10000 lines that I am building into the array. Each line has either 2 or 3 fields as shown in the code snippit. The array elements are static (for a few reasons that out of scope of this question) the list has to be "built in".

It runs very well in the Nix environment but slows down in the Windows environment.

Question is what can I do to speed things up ?

Code:

@MYARRAY = ("field1 field2 field3", "filed1 field2" ........ 10000 lines);
foreach $eachline (@MYARRAY) {
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   } 
}

sumguy

View Public Profile for sumguy

Find all posts by sumguy

06-05-2013

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

Maybe eliminate double handling, doing something akin to shell:

Code:

grep <pattern> <<EOF
field1 field2 field3-line1
field1 field2 field3-line2
...
EOF

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

06-05-2013

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

So many lines - you normally read them from a file!
Then instead of

Code:

open (FH, "<file");
@MYARRAY = <FH>;
foreach $eachline (@MYARRAY) {
   chomp $eachline;
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   }
}

You save the memory with

Code:

open (FH, "<file");
foreach $eachline (<FH>) {
   chomp $eachline;
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   }
}

and in this case even save the loop with

Code:

open (FH, "<file");
print grep (/\b$ARGV[0]\b/, <FH>);

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

06-06-2013

Registered User

2,019, 606

Join Date: Apr 2009

Last Activity: 27 February 2021, 12:15 PM EST

Location: India

Posts: 2,019

Thanks Given: 50

Thanked 606 Times in 567 Posts

Quote:

Originally Posted by MadeInGermany

You save the memory with

Code:

open (FH, "<file");
foreach $eachline (<FH>) {
   chomp $eachline;
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   }
}

Actually, the program would be significantly quicker if you read the file using a while loop. Here's a small test to prove it:

Code:

[user@host ~]$ seq 1 1000000 > file
[user@host ~]$ time perl -e 'open I, "< file"; for (<I>) {$i++}; close I; END { print "$i\n" }'
1000000

real    0m0.563s
user    0m0.546s
sys     0m0.046s
[user@host ~]$ time perl -e 'open I, "< file"; while (<I>) {$i++}; close I; END { print "$i\n" }'
1000000

real    0m0.156s
user    0m0.171s
sys     0m0.015s
[user@host ~]$

Quote:

Originally Posted by MadeInGermany

and in this case even save the loop with

Code:

open (FH, "<file");
print grep (/\b$ARGV[0]\b/, <FH>);

IMHO, though visually this is not a loop, technically it is, as grep "evaluates the block or expression for each element of list"as per perldoc.

Last edited by balajesuri; 06-06-2013 at 12:46 AM..

balajesuri

View Public Profile for balajesuri

Find all posts by balajesuri

06-06-2013

Registered User

1,413, 498

Join Date: Mar 2012

Last Activity: 8 November 2019, 2:39 AM EST

Location: India

Posts: 1,413

Thanks Given: 101

Thanked 498 Times in 474 Posts

Quote:

Originally Posted by MadeInGermany

So many lines - you normally read them from a file!
Then instead of

Code:

open (FH, "<file");
@MYARRAY = <FH>;
foreach $eachline (@MYARRAY) {
   chomp $eachline;
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   }
}

You save the memory with

Code:

open (FH, "<file");
foreach $eachline (<FH>) {
   chomp $eachline;
   if ($eachline =~ /\b$ARGV[0]\b/) {
     print "$eachline\n";
   }
}

and in this case even save the loop with

Code:

open (FH, "<file");
print grep (/\b$ARGV[0]\b/, <FH>);

All these snippets may be pretty memory-intensive. In all these cases, the file-handle is being read in list context. So, you'd end up consuming (and storing in memory) the file data all at once.
As balajesuri has already pointed out, reading the handle in scalar context would be much better.

elixir_sinari

View Public Profile for elixir_sinari

Find all posts by elixir_sinari

06-10-2013

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

The only thing faster than the scalar loop is to memory map the file in PERL so it is a big string in memory, and not by much, as many OS read flat files via mmap() (automatic buffering in RAM via VM using no swap).

It's a classic case of advanced tools doing expensive favors for you! But there is a base cost below which perl level code cannot go.

Compressing the file might speed the flow out a pipe, as CPUs are so much faster than disks. Old compress is faster than gzip -1. However, if it gets referenced on the same host often, it may be in RAM.

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

Shell Programming and Scripting

Perl : Large amount of data put into an array

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to make awk command faster for large amount of data?

Discussion started by: brenoasrm

2. Shell Programming and Scripting

Read file and get the data then put it in array

Discussion started by: munna_dude

3. Shell Programming and Scripting

How do I put data piped to my script into an array for repeated processing

Discussion started by: Marc G

4. Shell Programming and Scripting

storing large data in unix array variable

Discussion started by: ankitknit

5. Shell Programming and Scripting

How to tar large amount of files?

Discussion started by: chriss_58

6. Shell Programming and Scripting

perl, put one array into many array when field is equal to sth

Discussion started by: jimmy_y

7. Shell Programming and Scripting

Perl Array / pattern match large CPU usage

Discussion started by: Donkey25

8. AIX

amount of memory allocated to large page

Discussion started by: daveisme

9. Programming

Read/Write a fairly large amount of data to a file as fast as possible

Discussion started by: emitrax

10. Linux

shmat() Failure While Using a Large Amount of Shared Memory

Discussion started by: theicarusagenda