Non-greedy pattern matching in shell script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Non-greedy pattern matching in shell script
# 1  
Old 07-28-2014
Non-greedy pattern matching in shell script

Hi all,

Is Perl included by default in Ubuntu? I'm trying to write a program using as few languages as possible, and since I'm using a few Perl one-liners to do non-greedy matching, it's considered another language, and this is a bad thing.

Basically, I'm using a Perl one-liner to grab XML between tags, where $2 is the name of the tag and $3 is the nth tag with that name:
Code:
perl -pe "s/(.*?<$2>){$3}(.*?)<\/$2>.*/\2/"

To escape forward slashes in XML:
Code:
content=$(echo "$4" | perl -pe "s/<\//<\\\\\//")

And to grab an XML tag based on both its tag and content, where $2 is the name of the tag, $3 is the nth tag with that name, and $content is an XML string escaped as above:
Code:
perl -pe "s/(.*?<$2>){$3}(.*$content.*?)<\/$2>.*/\2/"

I can't use sed because it doesn't have non-greedy matching, I can't use grep because it doesn't have non-greedy matching without Perl-like extensions, and to my knowledge Bash cannot do something this complicated on its own.

Does anyone know of another way I can do this, so it's not "another language" we have to use to maintain with?

Thanks,
Zel2008
# 2  
Old 07-28-2014
Try replacing your <tag> and </tag> with two single characters like ~ and @

You can then use [^~]*. Just incase these two special characters appear in the input replace them with two unique strings and replace these back when done:

Code:
sed -r -e "s,~,UNIQUE_STR1,g" \
    -e "s,@,UNIQUE_STR2,g" \
    -e "s,<${2}>,~,g" \
    -e "s,</${2}>,@,g" \
    -e "s/([^~]*~){$3}([^@]*)@.*/\2/" \
    -e "s,UNIQUE_STR1,~,g" \
    -e "s,UNIQUE_STR2,@,g" ${1}

This assumes the whole document is on 1 line which is likely to cause issues with sed when your XML gets large so it's not ideal, but a good example of the concept.

Another approach is to use the awk Record Separator (RS) by replacing the start and end tags with a single character:

Code:
sed -e "s,~,UNIQUE_STR,g" \
    -e "s,<${2}>,~,g" \
    -e "s,</${2}>,~,g" ${1} | \
awk "NR==${3}*2" RS=\~ | \
sed -e "s,UNIQUE_STR,~,g"

Now, awk can simply select the N*2 record for the required data.

Again we replace the UNIQUE_STR with ~ for the final result.

Last edited by Chubler_XL; 07-28-2014 at 06:09 PM.. Reason: Rewording to make concept clearer
# 3  
Old 07-29-2014
Thanks Chubler,

I'll try this out and see how it works, thanks. Is sed included by default in Ubuntu? We have a major requirement that things not be too difficult to maintain, and we don't want to risk needing to reinstall sed to make things work.

Thanks,
Zel2008
# 4  
Old 07-29-2014
Yes sed is available by default on Ubuntu, it's POSIX so should be pretty widely available. However the -r option, though availble on Ubuntu isn't POSIX so is not as portable.

Solution 2 is pretty portable and should work on most systems though Solaris may need nawk instead of awk.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pattern matching and replace in shell script

Hi I want to find a line in a file which contains a word and replace the patterns. Sample file content temp.xml ==================== <applications> <application> Name="FirstService" location="http://my.website.selected/myfirstService/V1.0/myfirst.war" ... (1 Reply)
Discussion started by: sakthi.99it
1 Replies

2. Shell Programming and Scripting

Help me to find files in a shell script with any matching pattern

Hi friends.. I have many dirs in my working directory. Every dir have thousands of files (.jsp, .java, .xml..., etc). So I am working with an script to find every file recursively within those directories and subdirectories ending with .jsp or .java which contains inside of it, the the pattern... (3 Replies)
Discussion started by: hnux
3 Replies

3. UNIX for Dummies Questions & Answers

sed non-greedy pattern matching with wildcard

Toby> cat sample1 This is some arbitrary text before var1, This IS SOME DIFFERENT ARBITRARY TEXT before var2 Toby> sed -e 's/^This .* before //' -e 's/This .* before //' sample1 var2 I need to convert the above text in sample1 so that the output becomes var1, var2 by... (2 Replies)
Discussion started by: TobyNorris
2 Replies

4. Shell Programming and Scripting

Korn Shell for pattern matching and extracting

Guys, i'm new to shell scripting. Here's what i need. I need a shell script which would read a file containing only 1 line which never changes. File containts - SQL_Mgd_Svc_ELONMCL54496 |EMEA\brookkev, EMEA\fieldgra, EMEA\tidmamar, EMEA\attfiste, EMEA\baldogar, EMEA\clarkia2, EMEA\conwasha,... (9 Replies)
Discussion started by: butterfly20
9 Replies

5. Homework & Coursework Questions

shell script that can create, monitor the log files and report the issues for matching pattern

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: Write an automated shell program(s) that can create, monitor the log files and report the issues for matching... (0 Replies)
Discussion started by: itian2010
0 Replies

6. Shell Programming and Scripting

Pattern matching in shell script

Hi, I am using following command to extract string from a file. String will be after last / (slash). awk -F\ / '{print $NF}' $FILE but while appending the output in file in script, it dosent work. File created but of zero size... can anyone please help `awk -F\\\/ '{print $NF}' $FILE` >... (3 Replies)
Discussion started by: Deei
3 Replies

7. Shell Programming and Scripting

shell script pattern matching

Hi, I need to create a shell script through which i need to populate email addresses in email columns of database table in mysql. Let say if email contains yahoo, hotmail, gtalk than email addresses need to move in their respective columns. # !/bin/sh yim="example@yahoo.com"... (3 Replies)
Discussion started by: mirfan
3 Replies

8. Shell Programming and Scripting

Pattern matching in shell scripting.

Hey Guys, I have a shell script that is very simple and does the following. #!/usr/bin/bash set -x echo -n "can you write device drivers?" read answer if then echo "wow, you must be very skilled" else echo "neither can i, i am just shell script" fi you see where the... (6 Replies)
Discussion started by: Irishboy24
6 Replies

9. Shell Programming and Scripting

shell pattern matching

Hello Members I am facing a problem regarding pattern matching.please guide me to solve the issue.My requirement is like: There is table in oracle database, in that table contain columns ,inside the column so many files are there. my requirement is that to search a pattern for example: pattern... (5 Replies)
Discussion started by: rakeshforum
5 Replies

10. Shell Programming and Scripting

Pattern matching in a shell script?

I'm looking for a way to match a particular string in another string and if a match is found execute some command. I found the case statement can be used like this; case word in ) command ;; ] ... esac If my string to find is say "foo" in the string $mystring... (1 Reply)
Discussion started by: paulobrad
1 Replies
Login or Register to Ask a Question