Bash regex


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Bash regex
# 1  
Old 04-15-2011
Bash regex

Hello everybody,
I'm clearly not an expert in bash scripting as I've written maybe less than 10 scripts in my life. I'm trying to strip an xml string removing every tag in it. I'm using bash substitution to do so, but apparently I missed something about what is a regex for bash ...

As an example, my input is:
VAR='<value key="Qt4ProjectManager.Qt4BuildConfiguration.BuildDirectory" type="QString">/home/share/path/to/build/directory</value>'

I use the command:
echo ${VAR#<[^>]*>}

I thought it was supposed to remove the shortest match of a substring starting with < and ending with >. But the output is the exact input string ...

The regex I know are those I use with flex so maybe it is not the same.

PS: The desired output for the example is '/home/share/path/to/build/directory'
# 2  
Old 04-15-2011
Code:
 
echo $VAR | sed 's!<value.*>\(.*\)</value>.*!\1!'


Last edited by panyam; 04-15-2011 at 11:19 AM.. Reason: added echo $VAR
# 3  
Old 04-15-2011
Ruby(1.9+)
Code:
$ echo $VAR
<value key="Qt4ProjectManager.Qt4BuildConfiguration.BuildDirectory" type="QString">/home/share/path/to/build/directory</value>
$ echo $VAR|ruby -e 'puts gets[/<value.*?>(.*)<\/value>/,1]'
/home/share/path/to/build/directory

# 4  
Old 04-15-2011
Quote:
Originally Posted by kerloi
I'm using bash substitution to do so, but apparently I missed something about what is a regex for bash ...

As an example, my input is:
VAR='<value key="Qt4ProjectManager.Qt4BuildConfiguration.BuildDirectory" type="QString">/home/share/path/to/build/directory</value>'

I use the command:
echo ${VAR#<[^>]*>}

I thought it was supposed to remove the shortest match of a substring starting with < and ending with >. But the output is the exact input string ...

The regex I know are those I use with flex so maybe it is not the same.

PS: The desired output for the example is '/home/share/path/to/build/directory'
Bash parameter substitution and pathname expansion (file globbing) do not use regular expressions.

The portable subset of pattern matching features used in parameter expansion and pathanme expansion isn't very powerful. It's documented @ Shell Command Language

Bash (and ksh) support more useful functionality, so either reference the relevant section of your man page or visit Pattern Matching - Bash Reference Manual

Quick tips:
The preferred way to negate a bracketed list of characters is with a "!", though the "^" usually works (older syntax).

In regex grammar, an * means that the preceding character or subexpression can match any number of times, including none. In the shell's pattern matching grammar, * is not a quantifier/repeater; it is a wildcard that itself represents any number of any characters (none included).

. is not special. It stands for a dot.

? is a wildcard that matches any single characters (it does not mean that the previous character is optional).

So, what does your original pattern actually accomplish?
${VAR#<[^>]*>} tries to match from the beginning of VAR's value a '<' followed by one and only one character so long as it is not a '>' followed by as few characters as possible (since # is not greedy) until the first occurrence of a ">". This pattern requires that at least one character be present between '<' and the first '>'. Looking at it from a regular expressionist's point of view, it seems the intent is to allows the space between '<' and '>' to be empty. If so, the proper pattern is ${VAR#<*>}.

All that said, I don't know why your result is the unchanged value of $VAR.
Given your sample data, both your pattern and my suggested pattern return /home/share/path/to/build/directory</value>.

Perhaps you can printf %s "$VAR" | od -c -tx1 to take a look at VAR's exact contents (it should print the character over its hexadecimal byte value). Perhaps there's an "invisible" character at the beginning?

Regards,
Alister

---------- Post updated at 11:07 AM ---------- Previous update was at 10:59 AM ----------

Quote:
Originally Posted by kerloi
PS: The desired output for the example is '/home/share/path/to/build/directory'
Missed that crucial bit. The following should work on most sane, posix-like shells:
Code:
temp=${VAR#<*>}
echo "${temp%<*}"

Regards,
Alister
# 5  
Old 04-18-2011
Thanks all for your replies and for your explainations alister. But it is still not working with the bash only syntaxe :
Code:
${VAR#<*>}

My bash version is : GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu) I know that bash is currently on version 4.X so ...

But panyam an kurumi's solutions works great.
Thanks a lot to all of you.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using RegEx with variable within bash if [[ ]]

I stumbled upon a problem, which I simplified to this: There is a list of numbers, stored in variable $LIST, lets use `seq 5 25` for demonstration. There is a number that should be compared against this list. For demonstration I use user input - read VALUE I am trying to compare RegEx... (2 Replies)
Discussion started by: Zorbeg
2 Replies

2. Shell Programming and Scripting

Bash regex evaluation not workin

I'm building a script that may received start and end date as parameters. I whant to make it as flexible as possible so I'm accepting epoch and date in a way that "date --date=" command may accept. In order to know if parameter provided is an epoc or a "date --date=" string I evaluate if the value... (2 Replies)
Discussion started by: lramirev
2 Replies

3. UNIX for Dummies Questions & Answers

Regex for (a|b) in bash

I am trying to find files using the following by using simple bash script: if -2014 ]]; then echo "yes";fi What I need to find are any files with date 08-**-2014 so August 2014 any files. I can use if -2014 ]]; then echo "yes";fi That works fine. How do I get files beginning with 08... (1 Reply)
Discussion started by: newbie2010
1 Replies

4. UNIX for Dummies Questions & Answers

Need help with Regex for bash

Hi, I am trying to match this word: hexagon-bx.mydomain.com with regex. I have tried this: "\.*]*$" So far I have not been successful. I also need to make sure that the regex will match words that just have lowercase letters and numbers in them, such as camera01. How can I create such an... (5 Replies)
Discussion started by: newbie2010
5 Replies

5. Shell Programming and Scripting

Hi im new to bash scripting I want to know what does the regex expression do ??

# check host value regex='^(||1|2|25)(\.(||1|2|25)){3}$' if ')" != "" ]; then if ]; then echo host $host not found exit 4 fi elif ]; then echo $host is an invalid host address exit 5 fi (1 Reply)
Discussion started by: kevin298
1 Replies

6. Shell Programming and Scripting

Bash regex help

I've been using the following regex below in a bash script on RHEL 5.5 using version GNU bash, version 3.2.25(1)-release I've tried using the script on RHEL 6.3 which uses GNU bash, version 4.1.2(1)-release I assume there's been alot of changes to bash since that's quite a jump in revisions.... (12 Replies)
Discussion started by: woodson2
12 Replies

7. Shell Programming and Scripting

[BASH] Allow name with spaces (regex)

Hey all, I have a very simple regular expression that I use when I want to allow only letters with spaces. (I know this regex has a lot of shortcomings, but I'm still trying to learn them) isAlpha='^*$'However, when I bring this over to BASH it doesn't allow me to enter spaces. I use the... (3 Replies)
Discussion started by: whyte_rhyno
3 Replies

8. Shell Programming and Scripting

Bash string replacement - how to use regex?

Hello I have a bash script where I need to do a substring replacement like this: variable2=${variable1/foo/bar} However, I only want "foo" replaced if it is at the end of the line. However, this does not work: variable2=${variable1/foo$/bar} as you can see I'm using the $ regex for... (2 Replies)
Discussion started by: Ubuntu-UK
2 Replies

9. Shell Programming and Scripting

bash regex =~ case insensetive, possible?

It can get very annoying that bash regex =~ is case-sensetive, is there a way to set it to be case-insensetive? if ]; then echo match else echo no match fi (8 Replies)
Discussion started by: TehOne
8 Replies

10. Shell Programming and Scripting

regex test in bash

Hi I want to do a regex test and branch based on the test result, but this doesn't seems to work :confused: if \) ]] then echo success else echo failed fi (1 Reply)
Discussion started by: subin_bala
1 Replies
Login or Register to Ask a Question