perl: reg.expr: combine starting and ending removal in one exprecion


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting perl: reg.expr: combine starting and ending removal in one exprecion
# 1  
Old 08-26-2009
perl: reg.expr: combine starting and ending removal in one exprecion

Hello,
I am new in perl and in regular exprecion; so I am looking for help (or an experienced advise.)

The target is a triming spaces from a string: i.e., remove spases from begining and from end of a string.
One of main point of a searched solution is performance: for current task it is very important.
Therefore simple loop character by character seems to me ineffective.
I guess the reg.exp. engine should be pretty sufficient.
So, I've come out with:
Code:
 # $str = "   some text    "
  $str=~s/^[ ]*//;
  $str=~s/[ ]*$//;

This works fine,but,
first of all: I could not combine both patern into one exprecion.
Would you suggest how it could be done?
second: What about time? What expected to be longer : one more complicated regexp or twice, but simple? Who know how that task is processed inside of perl?

And last question: is there simpler way to do such simple task in perl?
# 2  
Old 08-26-2009
this is probably the most efficient way:

Code:
$str = "   some text    ";
$str =~ s/^\s+//;
$str =~ s/\s+$//;


You can do it with one regexp but it might be a little slower, only testing would tell for sure:

Code:
$str = "   some text    ";
$str =~ s/^\s+|\s+$//g;

As far as a simpler way, I don't think so.

---------- Post updated at 11:45 AM ---------- Previous update was at 11:30 AM ----------

Run this on the machine that will run the code to see which is best for that machine:

Code:
#!/usr/bin/perl

use warnings;
use strict;

use Benchmark qw(cmpthese timethese);

sub double_star {
  my $string = shift;
  $string =~ s/^\s*//;
  $string =~ s/\s*$//;
  return $string;
}

sub double_plus {
  my $string = shift;
  $string =~ s/^\s+//;
  $string =~ s/\s+$//;
  return $string;
}

sub single_or {
  my $string = shift;
  $string =~ s/^\s+|\s+$//g;
  return $string;
}

sub replace {
  my $string = shift;
  $string =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
  return $string;
}

sub for_star {
  my $string = shift;
  for ($string) { s/^\s+//; s/\s+$//; }
  return $string;
}

sub for_plus {
  my $string = shift;
  for ($string) { s/^\s*//; s/\s*$//; }
  return $string;
}

sub regex_or {
  my $string = shift;
  $string =~ s/(?:^ +)||(?: +$)//g;
  return $string;
}

cmpthese(
  -1,
  {
    'single_or'   => q|single_or(  '    Mary had a little lamb.   ');|,
    'double_star' => q|double_star('    Mary had a little lamb.   ');|,
    'double_plus' => q|double_plus('    Mary had a little lamb.   ');|,
    'replace'     => q|replace(    '    Mary had a little lamb.   ');|,
    'for_star'    => q|for_star(   '    Mary had a little lamb.   ');|,
    'for_plus'    => q|for_plus(   '    Mary had a little lamb.   ');|,
    'regex_or'    => q|regex_or(   '    Mary had a little lamb.   ');|,
  }
);

Results might vary machine to machine and perl version to perl version.

Last edited by KevinADC; 08-26-2009 at 01:37 PM.. Reason: added "g" to the last regexp
# 3  
Old 08-27-2009
KevinADC
Very BIG thank you!!
So much appreciate so informative and complete answer!!!
For sure I own you at least beer! Smilie
You have showed many different way to do this with reg.exp., plus you discovered for me the Benchmark!!!
Very appreciate!!

You are right: double \s+ is most efficient!
Code:
                Rate regex_or replace single_or for_plus double_star for_star double_plus
regex_or     22791/s       --    -26%      -49%     -53%        -63%     -67%        -78%
replace      30919/s      36%      --      -31%     -36%        -50%     -56%        -70%
single_or    44823/s      97%     45%        --      -7%        -28%     -36%        -57%
for_plus     48188/s     111%     56%        8%       --        -22%     -31%        -53%
double_star  61837/s     171%    100%       38%      28%          --     -11%        -40%
for_star     69591/s     205%    125%       55%      44%         13%       --        -33%
double_plus 103385/s     354%    234%      131%     115%         67%      49%          --

(It is strange that the '|' - 'or' in reg exp. does not work in debugin. But in a sub it works! In 'perl -d -e 0' :
Code:
  DB<387> $str="    some val    "; $str=~s/^\s+||\s+$//;print ">$str<";
>some val    <
  DB<388> $str="    some val    "; $str=~s/^\s*||\s*$//;print ">$str<";
>some val    <
  DB<389> sub single_or {my $string = shift;$string =~ s/^\s+|\s+$//g;return $string;}

  DB<390> print ">".single_or ($str)."<";
>some val<
  DB<384> $str="    some val    "
  DB<385> $str=~s/^\s*(\S*(?:\s+\S+)*)\s*$/\1/; print ">$str<";
>some val<

Once again: THANK YOU!

Last edited by alex_5161; 08-27-2009 at 12:59 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Search for words starting and ending with

im trying to search for a WORD in a file which begins with a number followed by a hypen follwed multiple words and end with a dot "." and pront the entire line which matches the above. Please note that there is a space at the begining of each line i/p file 19458 00000-CONTROL-PARA.... (5 Replies)
Discussion started by: anijan
5 Replies

2. Shell Programming and Scripting

Text to column starting/ending with special character in each row

Hello, Here is my text data excerpted from the webpage: input My target is to get: What i tried is: sed 's/.*\(connector\)/1/' input > output but all characters coming before the word "connector" are deleted which is not good for me. My question: (9 Replies)
Discussion started by: baris35
9 Replies

3. UNIX for Advanced & Expert Users

Pring starting and ending numbers using UNIX

Hi all, I need to do scrip for printing starting and ending numbers along with count in given file.:wall: Input: a.txt 10000030 10000029 10000028 10000027 10000026 10000024 10000023 10000021 10000018 10000018 10000017 10000016 10000015 10000014 (2 Replies)
Discussion started by: jackbell2013
2 Replies

4. Shell Programming and Scripting

if statement to check files with different ending but same starting name

I am trying to check if files staring with filename but ending with diffent dates e.g. filename.2011-10-25. The code I am using is below if It works find only if one file is present but returns binary operator expected when there are mulptiple files. Please help me correcting it. I... (5 Replies)
Discussion started by: ningy
5 Replies

5. Programming

How to prevent incorrect string using reg expr in Java?

Hi All, I need your input on how to mask out / ignore a string that does not match a working regular expression (continually refining) pattern in Java. Below is the code snippet which is picking up all the lines with the correct regular expression string except one known so far: public... (0 Replies)
Discussion started by: gjackson123
0 Replies

6. Shell Programming and Scripting

print column that match reg expr

Hi all, I want to cut a column which match the regular expression "beta", if I don't know the column number? cat test alpha;beta;gamma 11;22;33 44;55;66 77;88;99 should be command .... beta 22 55 (6 Replies)
Discussion started by: research3
6 Replies

7. Shell Programming and Scripting

PERL: Simple reg expr validate 6 digits number

Hi there! I'm trying to validate a simple 6 digits number with reg expr. I ONLY want 6 digits so when i type 7 digits the script should no validate the number. I've write this code: #!/usr/bin/perl while(<STDIN>){ if($_=~/\d{6}/){ print "Bingo!\n"; ... (2 Replies)
Discussion started by: BufferExploder
2 Replies

8. UNIX for Dummies Questions & Answers

scipt dividing strings /reg expr

Hello! I've got txt-file containing lots of data in sentences like this: ;;BA;00:00:03:00;COM;CLOQUET-LAFOLLYE;SIMON; but sometime more than on in a line like this: ;;BA;00:00:03:00;COM;CLOQUET-LAFOLLYE;SIMON;;;BA;00:00:03:00;REA;RTL9;;;;BAC;:00;TIT;SEMAINE SPECIALE ~SSLOGAN~T DVD;; ... (3 Replies)
Discussion started by: maco_home
3 Replies

9. Shell Programming and Scripting

var substitution in a reg expr ?

In a shell script, how I can achieve substitution of shell script var to a regular expression, as shown below. var=`head -1 file1` awk '$0!~/$var/ {print $0}' file1 > file2 In the case above $var value literally considered for non-exists criteria. (3 Replies)
Discussion started by: videsh77
3 Replies

10. Shell Programming and Scripting

Text replace by position instead of reg expr.

Can we replace the contents the of the rows of file, from one position to another position by mentioning, some start position & the width? (4 Replies)
Discussion started by: videsh77
4 Replies
Login or Register to Ask a Question