Find all matching words in text according to pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find all matching words in text according to pattern
# 1  
Old 06-25-2013
Find all matching words in text according to pattern

Hello dear Unix shell professionals,
I am desperately trying to get a seemingly simple logic to work. I need to extract words from a text line and save them in an array. The text can look anything like that:

Code:
aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}...

I am handicapped though in different regards:
  • Can't use perl
  • Stuck on a ancient GNU bash, version 3.00.16(1)-release (powerpc-ibm-aix5.1)
  • grep -o is not installed
My attempt was this:
Code:
line="aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}";
if [[ $line =~ '(\${[^{]*})' ]]; 
    then
        echo "matching[1]: ${BASH_REMATCH[1]}";
        echo "matching[2]: ${BASH_REMATCH[2]}";
        echo "matching[3]: ${BASH_REMATCH[3]}";
    fi;

Output:
Code:
matching[1]: ${important}
matching[2]:
matching[3]:

So it prints the first match correctly, however it ignores all the remaining matches. Please anyone help me with this, I am stuck here for 2 days now Smilie. If it works with "awk", it should be fine too, but I can't figure out the syntax. Beware that I use a old shell.
# 2  
Old 06-25-2013
Try this:
Code:
line='aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}'
IFS=\$ read -a _a <<< "$line" 
_regex='(\{[^}]+})'
for _e in "${_a[@]}"; do
  [[ $_e =~ $_regex ]] &&
    _n+=( "\$${BASH_REMATCH[0]}" )
done
# your matches are in the _n array

For example:

Code:
$ line='aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}'
_regex='(\{[^}]+})'
$ IFS=\$ read -a _a <<< "$line"
$ _regex='(\{[^}]+})'
$ for _e in "${_a[@]}"; do
>   [[ $_e =~ $_regex ]] &&
>     _n+=( "\$${BASH_REMATCH[0]}" )
> done
# your matches are in the _n array:
$ # your matches are in the _n array:
$ declare -p _n
declare -a _n='([0]="\${important}" [1]="\${important2}" [2]="\${importantstring3}")'

This User Gave Thanks to radoulov For This Post:
# 3  
Old 06-25-2013
Wow! Awesome solution! Many thanks!!!!!

I had to convert parts of it to make it compatible to my old shell, as I got a syntax error but all in all it works perfectly! I even tried to trick it with random "$" or random braces "{", but it still only outputs the correct ones!

Code:
line='aaaa$}aaa${important}xxxxxxxx${important2}oo{o$}oo$oo${importantstring3}'
IFS=\$ read -a words <<< "$line" 
regex='(\{[^}]+})'
for e in "${words[@]}"; do
    if [[ $e =~ $regex ]]; then    
        echo "\$${BASH_REMATCH[0]}";
    fi;
done

Thanks again, you made a very happy user Smilie

---------- Post updated at 08:25 AM ---------- Previous update was at 07:05 AM ----------

Though I am satisfied with the solution, as I assume it will not produce errors, I have found something where I could trick it. If I use this line:
Code:
line='aaaa$aa{yyy}aaaaaa${important}xxxx

It will print ${yyy} as matching. That is because it only uses the "$" as separator and indirectly allows random characters to follow afterwards. I still wonder if there isn't any regex which will cover this (sorry, I am not the best at expressions and think in pseudo code, but somehow it bugs me):

First one would need to determine that these 2 characters must always come first:
[\$][\{]

Then comes a term where everything is allowed, except these:
[everything allowed except \$,\{]

The previous term is read until the closing bracket comes:
[\}].

This is my naive thinking, but it seems the thought process is easier than the actual implementation.
# 4  
Old 06-25-2013
Something like this:

Code:
IFS=\$ read -a words <<< "$line" 
regex='^(\{[^}]+})'
for e in "${words[@]}"; do
    if [[ $e =~ $regex ]]; then    
        echo "\$${BASH_REMATCH[0]}";
    fi;
done

You said that you can't use Perl Smilie
Code:
% perl -le'print join $/, shift =~ /\${.*?}/g' 'aaaa$}aaa${important}xxxxxxxx${important2}oo{o$}oo$oo${importantstring3}'
${important}
${important2}
${importantstring3}
% perl -le'print join $/, shift =~ /\${.*?}/g' 'aaaa$aa{yyy}aaaaaa${important}xxxx'
${important}


Last edited by radoulov; 06-25-2013 at 10:36 AM..
This User Gave Thanks to radoulov For This Post:
# 5  
Old 06-25-2013
Damn, thanks again!
This works perfectly, although in this case I initially wasn't sure why it worked. But now I realize: you use the first as anchor character "^" to define, that at the beginning of the line the following expression in '(...)' must follow. I was confused initially as the grymoire docs described the anchor to be used "on the beginning of a line" - and then I wasn't sure where the "line" was in this case. Was it the original "$line" or the splitted parts of the line? Obviously in this case every splitted part is its own "line". Thats why it works. Eventually I understood Smilie

Regarding Perl: yeah, there was the choice between perl or bash scripts and then the thought came "use something which is always available and more down-to-earth" - and the decision fell to default shell scripts.

While it is an interesting learning experience I have previously used some perl and it was way more comfortable. I am not sure the pure shellscripting decision was right after all, especially seeing that perl is installed on most unix machines anyways...sigh, but what can you do.
# 6  
Old 06-25-2013
Quote:
Originally Posted by Grünspanix
But now I realize: you use the first as anchor character "^" to define, that at the beginning of the line the following expression in '(...)' must follow. I was confused initially as the grymoire docs described the anchor to be used "on the beginning of a line" - and then I wasn't sure where the "line" was in this case. Was it the original "$line" or the splitted parts of the line? Obviously in this case every splitted part is its own "line". Thats why it works. Eventually I understood Smilie
Correct, perhaps "the beginning of the string" would be more appropriate.

Quote:
Regarding Perl: yeah, there was the choice between perl or bash scripts and then the thought came "use something which is always available and more down-to-earth" - and the decision fell to default shell scripts.

While it is an interesting learning experience I have previously used some perl and it was way more comfortable. I am not sure the pure shellscripting decision was right after all, especially seeing that perl is installed on most unix machines anyways...sigh, but what can you do.
That's OK, actually. I almost always use only pure shell scripting too, but Perl makes the string manipulation really, really easy.
Moreover, Perl is often available even where bash is not (an old HP-UX springs to mind Smilie).
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Find pattern suffix matching pattern

Hi, I am trying to get a result out of this but fails please help. Have two files /tmp/1 & /tmp/hosts. /tmp/1 IP=123.456.789.01 WAS_HOSTNAME=abcdefgh.was.tb.dsdc /tmp/hosts 123.456.789.01 I want this result in /tmp/hosts if hostname is already there dont want duplicate entry. ... (5 Replies)
Discussion started by: rajeshwebspere
5 Replies

2. UNIX for Dummies Questions & Answers

find Search - Find files not matching a pattern

Hello all, this is my first and probably not my last question around here. I do hope you can help or at least point me in the right direction. My question is as follows, I need to find files and possible folders which are not owner = AAA group = BBB with a said location and all sub folders ... (7 Replies)
Discussion started by: kilobyter
7 Replies

3. Shell Programming and Scripting

Pattern Matching and text deletion using VI

Can someone please assist me, I'm trying to get vi to remove all the occurences of the text in a file i.e. "DEVICE=/dev/mt??". The "??" represents a number variable. Is there a globel search and delete command that I can use? Thank You in Advance. (3 Replies)
Discussion started by: roadrunner
3 Replies

4. UNIX for Dummies Questions & Answers

find files NOT matching name pattern

Hi, I have following files in my directory: /TESTDONTDEL> ls -alt total 14 drwxr-xr-x 2 oracle dba 1024 May 15 06:30 . -rw-r--r-- 1 oracle dba 40 May 15 06:30 exception.txt -rw-r--r-- 1 oracle dba 19 May 15 06:22 ful_1234_test1.txt -rw-r--r-- 1... (2 Replies)
Discussion started by: sagarparadkar
2 Replies

5. Shell Programming and Scripting

insert text into another file after matching pattern

i am not sure what i should be using but would like a simple command that is able to insert a certain block of text that i define or from another text file into a xml file after a certain match is done for e.g insert the text </servlet-mapping> <!-- beechac added - for epic post-->... (3 Replies)
Discussion started by: cookie23patel
3 Replies

6. UNIX for Advanced & Expert Users

pattern matching with comma delimited text

Hi, I have two files that I need to match patterns with and the second file has comma delimited rows of data that match but I'm having trouble getting a script to work that gives me the match output to these sets : file 1: PADG_05255 PADG_06803 PADG_07148 PADG_02849 PADG_02886... (8 Replies)
Discussion started by: greptastic
8 Replies

7. UNIX for Dummies Questions & Answers

Find files matching a pattern

Hi, I am writing a BASH shell script. I would like to count all the files in the CURRENT directory matching a specific pattern. Could someone suggest the best/simplest way to do this. I have thought of these solutions (for simplicity the pattern is all files starting with A): ls -1 *A | wc -l... (5 Replies)
Discussion started by: msb65
5 Replies

8. UNIX for Advanced & Expert Users

I am trying to find pattern between two words but unable to get that pattern..

HI.... It's fallow up file .. #./show.sh click enter button.. i am gettng the fallowup file. its keep on running every time why because there are lots of users working on it. In that file i want to search pattern between two words for ex: SELECT DISTINCT... (7 Replies)
Discussion started by: ksr.test
7 Replies

9. Shell Programming and Scripting

text manipulation and pattern matching

Hi guys, I need help: I started receiving automatic emails containing download information. The problem is that these emails are coming in a rich format (I have no control of this) so the important information is buried under a bunch of mumbo-jumbo. To complicated things even further I need to... (10 Replies)
Discussion started by: caprica13
10 Replies

10. Programming

getting file words as pattern matching

Sir, I want to check for the repation of a user address in a file i used || as my delimiter and want to check repetaip0n of the address that is mailid and then i have to use IMAP and all. How can i do this... I am in linux ...and my file is linux file. ... (5 Replies)
Discussion started by: arunkumar_mca
5 Replies
Login or Register to Ask a Question