regexp: match string that contains list of chars


 
Thread Tools Search this Thread
# 1  
regexp: match string that contains list of chars

Hi,

I'm curious about how to do a very simple thing with regular expressions that I'm unable to figure out.

If I want to find out if a string contains 'a' AND 'b' AND 'c' it can be very easily done with grep:
Code:
echo $STRING|grep a|grep b|grep c

but, how would you do that in a single regexp?

A possible solution would be:

Code:
/a.*b.*c|a.*c.*b|b.*c.*a|b.*a.*c|c.*b.*a|c.*a.*b/

but it's so ugly it's nasty!! (imagine if instead of 3 chars we have 10!)

any gurus out there know how to do this?

cheers

Last edited by vbe; 10-14-2010 at 11:12 AM.. Reason: code tags...
# 2  
Code:
awk '/a/&&/b/&&/c/'

# 3  
Scrutinizer: that's very cool, it's better than chaining up grep's Smilie But I would like to do that in a single regexp i.e. without the use of shell tools like grep, awk, sed...
# 4  
Well, awk has not one regex but one command, which is nicer formatted, and you can use sed, which is faster, generally and because it doe not evaluate regex if the line is done=dead, and the list of patterns can be of any length, easily viewed:

Code:
sed '
  /a/!d
  /b/!d
  /c/!d
 '



---------- Post updated at 10:45 AM ---------- Previous update was at 10:42 AM ----------

If you want just regex not commands, you are probably out of luck. The searches are too unrelated for one regex. What context do you want to use it in, if not a command?
This User Gave Thanks to DGPickett For This Post:
# 5  
If you want to use return codes (like grep -q)
Code:
awk '/a/&&/b/&&/c/{f=1;exit}END{if(!f){exit 1}}'

# 6  
Quote:
Originally Posted by DGPickett
If you want just regex not commands, you are probably out of luck. The searches are too unrelated for one regex. What context do you want to use it in, if not a command?
For example in any programming language that supports pcre: C, perl, python, Ruby... of course every programming language has other ways to check this, for example in python:
Code:
>>> s = "axbxc"
>>> 'a' in s and 'b' in s and 'c' in s
True

I just want to know if it's possible to do that in a single regular expression, just out of curiousity and simply to get a better understanding of regexps.

I tried to do it like this:

Code:
/([abc]).*([^\1]).*[^\2]/

But of course that doesn't work because [^\1] matches *all* the characters except the character that matched in the first parenthesis set... I think this should be done with some kind of backtracking.

And BTW I'm sure that it can be done with regexps! I mean, if you can test if a number is a primer number with regular expressions, I refuse to believe this simple thing can't be done Smilie
# 7  
Well, if you dislike but must apply the three regex in sequence, you might try this simple heurism: Put the three regex in as list, and apply them in the current order; if one misses, removing a line from contention, move it to the top of the list if not already, sliding the others down. For instance, consider c e q as three regex. The q will reject more lines than c, usually, and the e less, but by letting the best rejecters float to the top, you save a lot of second and third regex searches.

I have a name for this, but it is not politically correct, something about how a dictator selects a military commander -- death at first failure. It came to me one day as a text editor took very long to find instances of 'equal': it did a character scan and for every first character, it stopped and did a sting compare, tragically missing the filtering power of q. If I searched for 'qual', it was quick (Borland Sprint on I386 dos emulation under UNIX SVR3).
 

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed - print only the chars that match a given set in a string

For a given string that may contain any ASCII chars, i.e. that matches .*, find and print only the chars that are in a given subset. The string could also have numbers, uppercase, special chars such as ~!@#$%^&*(){}\", whatever a user could type in without going esoteric For simplicity take... (1 Reply)
Discussion started by: naderra
1 Replies

2. Shell Programming and Scripting

sed to remove newline chars based on pattern mis-match

Greetings Experts, I am in AIX; I have a file generated through awk after processing the input files. Now I need to replace or remove the new-line characters on all lines that doesn't have a ; which is the last character on the line. I tried to use sed 's/\n/ /g' After checking through the... (6 Replies)
Discussion started by: chill3chee
6 Replies

3. Shell Programming and Scripting

Regexp for string that might contain a given character

I'm probably just not thinking of the correct term to search for :-) But I want to match a pattern that might be 'ABC' or '1ABC' there might be three characters, or there might be four, but if there are four, the first has to be 1 (1 Reply)
Discussion started by: jnojr
1 Replies

4. Shell Programming and Scripting

Question on TCL regexp and match

Hello everyone, I'm new in tcl scripting. I'm currently studying a tcl script and came across this line: regexp {(\d+)(\S?)} $opts match opt swi According to my understanding, this line means to search in the opts variable for one or more digit, followed by a non-whitespace character... (2 Replies)
Discussion started by: mar85
2 Replies

5. Shell Programming and Scripting

Repeatable chars in a string

I have a string I keep appending too upto certain amount of chars. Is there some sort of way for me to check the string to see if I hit my limit of repeatable characters? For example, assume I allow for 2 repeatable chars, this will be a valid string Xxh03dhJUX, so I can append the last... (3 Replies)
Discussion started by: BeefStu
3 Replies

6. Shell Programming and Scripting

extract string until regexp from backside

Hi, I searched in the forums, but I didn't find a good solution. My problem is: I have a string like "TEST.ABC201005.MONTHLY.D101010203". I just want to have the string until the D100430, so that the string should look like: "TEST.ABC201005.MONTHLY.D" The last characters after the D can be... (8 Replies)
Discussion started by: elifchen
8 Replies

7. Shell Programming and Scripting

perl regexp: no match across newlines

Hi. Here's a tricky one (at least to me): I have a file named theFile.txt (UTF-8) that contains the following: a b cWhen I execute perl -pe 's|a.*c|d|sg' theFile.txtin bash 3.2 on MAC OS X 10.6, I get no match, i.e. the result is a b cagain. Any clues why? (2 Replies)
Discussion started by: BatManWSL
2 Replies

8. Shell Programming and Scripting

Retreive string between two chars

I want to write a shell script in order to retreive some data from a log file that i have written into. The string that i want to get is the number 2849 (that is located between | | ). To explain further, this is the result i get after running "grep LOGIN filename.log" but i need to get the... (25 Replies)
Discussion started by: danland
25 Replies

9. Shell Programming and Scripting

regexp to get first line of string

Hi everybody for file in * #Bash performs filename expansion #+ on expressions that globbing recognizes. do output="`grep -n "$1" "$file"`" echo "$file: `expr "$output" : '\(^.*$\)'`" done In the above bash script segment, I try to print just the first line of string named... (3 Replies)
Discussion started by: jonas.gabriel
3 Replies

10. Shell Programming and Scripting

RegExp negative match not working

or I donít know how to make it work Ö Hello im trying to build regexp that will match me single string or function call inside of brackets for example I have : <% myFunction("blah",foo) %> or <% myVar %> and not match : <% if(myFunction("blah",foo)==1) %> or <% while(myvar < 3){... (2 Replies)
Discussion started by: umen
2 Replies

Featured Tech Videos