![]() |
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| perl script to add a line into a file | karthikn7974 | Shell Programming and Scripting | 2 | 06-03-2008 02:11 AM |
| help on a perl script to edit file | meghana | Shell Programming and Scripting | 9 | 05-14-2008 03:42 PM |
| grep ^M in file using perl script.... | zedex | Shell Programming and Scripting | 12 | 02-06-2008 08:43 AM |
| Have a shell script check for a file to exist before processing another file | heprox | Shell Programming and Scripting | 3 | 11-14-2006 03:26 AM |
| File processing on perl | garric | Shell Programming and Scripting | 2 | 09-02-2006 12:25 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Aim:
To scan a file and ignore all characters that has an ASCII value from 0 to 31 and 127 to 255 and accept only those characters having an ASCII between 32 and 126. Script: #!/usr/local/bin/perl $filename = "$ARGV[0]"; if (-e $filename) { open(OUT, "${filename}") || die "can't open $filename\n"; while (<OUT>){ $found= ""; $stat=0; chomp $_; my @charArray = split(//, $_); my $ref = \@charArray; foreach (@charArray) { $val = ord($$ref[$stat]); if(($val>31)&&($val<127)){ $found = "$found$$ref[$stat]"; } $stat++; } $found = "$found\n"; print $found; } close(OUT); } Problem: The code mentioned above runs for 20-25 mins for a 500 MB file. This is very slow. Can someone let me know if this can be done in a more efficient way so as to reduce the file processing duration? |
|
||||
|
try this, Code:
#! /opt/third-party/bin/perl
open(FILE, "<", $ARGV[0]) || die ("unable to open <$!>\n");
while( read(FILE, $data, 1) == 1 ) {
$ordVal = ord($data);
print "$ordVal" if( $ordVal >= 32 && $ordVal <= 126 );
}
close(FILE);
exit(0);
|
|
||||
|
Hi Madhan,
Corrected code: #!/usr/local/bin/perl open(FILE, "<", $ARGV[0]) || die ("unable to open <$!>\n"); while( read(FILE, $data, 1) == 1 ) { if((ord($data)>=32)&&(ord($data)<=126)){ print "$data"; } if(ord($data)==10){ print "\n";} } close(FILE); Its great it takes just 10 mins now. Is there anything else that can be done to reduce the duration further? |
|
||||
|
Minor change but this will make a difference change the following Quote:
Code:
print "$data" if((ord($data)>=32)&&(ord($data)<=126)); print "\n" if(ord($data)==10); |
|
||||
|
Hi Madhan,
Thanks..It still takes 10 mins. I have one question here, if we are reading the entire file and moving one character by character won't it consume valuable memory? For eg in C we can take certain bytes (as first batch) from the file and process it and then follow with the next batch of the file. Can anything be done here? Please correct me if I am wrong. |
|
||||
|
Quote:
If you are really into optimization, you can calculate a baseline by running just perl -ne 1 on the file and then see how much your additional processing takes time. Add some more steps piecemeal and see if there are any really big jumps in the stats. If there are, figure out if you are disabling some internal optimization and if rephrasing the code can get it back. Can you split the processing, like tr -d '\000-\037\200-\377' <file | perl ... and get away with it? (Or '\000-\011\013-\037\200-\377' if you want to preserve the newlines, like matrixmadhan observed.) Last edited by era; 03-29-2008 at 06:53 PM.. Reason: newline observation |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|