Using regular expressions to separate apples from oranges


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using regular expressions to separate apples from oranges
# 1  
Old 03-09-2012
Bug Using regular expressions to separate apples from oranges

I have a problem that I think could (should?) be possible using regular expressions. I've been using regular expressions for some time, so I have some experience with it, but I can't find a way to make this work correctly.

Say I have a long string of different fruits:

Quote:
apples bananas pears oranges grapes mangos melons
and I need the first part of it, up until AT LEAST one of a few keywords occur:

Quote:
pears|oranges|mangos
Although grapes and melons are not keywords, they should still be omitted. I only want the text before pears included. The order of all the fruits could be random and as such, the regex should still work.

I tried this, but it doesn't work as I expect:

(.+)(pears|oranges|mangos).*

Instead of matching any of the words, it includes as much as it can until the last keyword it can match - I need it to basically do the opposite.

(By the way, the string could theoretically contain none of the keywords, where I would like to use the entire string instead)

I'm doing this in a perl-script - Any suggestions are welcome Smilie
# 2  
Old 03-09-2012
The .* matches absolutely anything, since * is a modifier applied to the last thing.

Got my logic backwards. Thinking.
# 3  
Old 03-09-2012
I'd forget about matching the stuff before, and just find the first instance of pears|mangos|oranges, and get the offset of where it started.

Code:
#!/usr/bin/perl

my $str="apples bananas pears oranges grapes mangos melons";

$str =~ /(pears|oranges|mangos)/g;
print "found verboten fruit at ", pos($str), "\n";
print "match was '", $1, "'\n";
print "things before: ", substr($str, 0, pos($str)-length($1)), "\n";

Be careful with =~ //g matches, because if you re-run it again on the same string, it will return the next match, not the same match.
# 4  
Old 03-09-2012
Yes, that was my though too, but implementing it the way I normally script, would be akward.

I'm new to perl, but this looks really simple! Thanks a lot for the example Smilie
# 5  
Old 03-10-2012
Printing only line that contain one of the words until the keyword:
Code:
$ echo "apples bananas pears oranges grapes mangos melons" | perl -ne 'print if s/(pears|oranges|mangos).*//'
apples bananas

Printing every line, but cutting the line from the keyword to the end:
Code:
perl -pe 's/(pears|oranges|mangos).*//'


Last edited by Scrutinizer; 03-10-2012 at 06:16 AM..
# 6  
Old 03-10-2012
Quote:
Originally Posted by Corona688
I'd forget about matching the stuff before, and just find the first instance of pears|mangos|oranges, and get the offset of where it started.

Code:
#!/usr/bin/perl

my $str="apples bananas pears oranges grapes mangos melons";

$str =~ /(pears|oranges|mangos)/g;
print "found verboten fruit at ", pos($str), "\n";
print "match was '", $1, "'\n";
print "things before: ", substr($str, 0, pos($str)-length($1)), "\n";

Be careful with =~ //g matches, because if you re-run it again on the same string, it will return the next match, not the same match.
why use print 3 times?

Code:
perl -ne 'print"$1\n" if /^(.+*(?:pears|oranges|mangos))/' infile

infile contains: apples bananas pears oranges grapes mangos melons

Last edited by tip78; 03-11-2012 at 03:20 AM..
tip78
# 7  
Old 03-15-2012
Quote:
Originally Posted by tip78
why use print 3 times?
Just as a demonstration of the various features I'm using. I don't play perl golf, I'd never be able to remember what I did Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular Expressions

I am new to shell scripts.Can u please help me on this req. test_user = "Arun" if echo "test_user is a word" else echo "test_user is not a word" (1 Reply)
Discussion started by: chandrababu
1 Replies

2. Shell Programming and Scripting

Help with regular expressions

I have a file that I'm trying to find all the cases of phone number extensions and deleting them. So input file looks like: abc x93825 def 13234 x52673 hello output looks like: abc def 13234 hello Basically delete lines that have 5 numbers following "x". I tried: x\(4) but it... (7 Replies)
Discussion started by: pxalpine
7 Replies

3. Shell Programming and Scripting

Regular expressions help

need a regex that matches when a number has a zero (0) at the end of it so like 10 20 120 30 330 1000 and so on (6 Replies)
Discussion started by: linuxkid
6 Replies

4. Shell Programming and Scripting

Regular Expressions

what elements does " /^/ " match? I did the test which indicates that it matches single lowercase character like 'a','b' etc. and '1','2' etc. But I really confused with that. Because, "/^abc/" matches strings like "abcedf" or "abcddddee". So, what does caret ^ really mean? Any response... (2 Replies)
Discussion started by: DavidHe
2 Replies

5. UNIX for Dummies Questions & Answers

Regular expressions

In regular expressions with grep(or egrep), ^ works if we want something in starting of line..but what if we write ^^^ or ^ for pattern matching??..Hope u all r familiar with regular expressions for pattern matching.. (1 Reply)
Discussion started by: aadi_uni
1 Replies

6. UNIX for Advanced & Expert Users

regular expressions

I have a flat file with the following drug names Nutropin AQ 20mg PEN Cart 2ml Norditropin Cart 15mg/1.5ml I have to extract digits that are before mg i.e 20 and 15 ; how to do this using regular expressions Thanks ram (1 Reply)
Discussion started by: ramky79
1 Replies

7. UNIX for Dummies Questions & Answers

regular expressions

how to find for a file whose name has all characters in uppercase after 'project'? I tried this: find . -name 'project**.pdf' ./projectABC.pdf ./projectABC123.pdf I want only ./projectABC.pdf What is the regular expression that correponds to "all characters are capital"? thanks (8 Replies)
Discussion started by: melanie_pfefer
8 Replies

8. Shell Programming and Scripting

regular expressions

Hi, can anyone advise me how to shorten this: if || ; then I tried but it dosent seem to work, whats the correct way. Cheers (4 Replies)
Discussion started by: jack1981
4 Replies

9. Shell Programming and Scripting

Regular Expressions

How can i create a regular expression which can detect a new line charcter followed by a special character say * and replace these both by a string of zero length? Eg: Input File san.txt hello hi ... (6 Replies)
Discussion started by: sandeep_hi
6 Replies
Login or Register to Ask a Question