Help with pattern matching


 
Thread Tools Search this Thread
Top Forums Programming Help with pattern matching
# 1  
Old 03-14-2010
Help with pattern matching

I have a group of strings like:

Code:
[[something and something [[like this]] and this [[but]] here is [[what]].]] [[another big string [[with other]] substrings.]]

I need a regular expression/way to extract those strings like so:

Code:
string1 => something and something [[like this]] and this [[but]] here is [[what]].
string2 => another big string [[with other]] substrings.

I tried (PHP):
preg_match("/\[\[(.+?)\]\]/"...
but this obviously catches: something and something [[like this

I was looking for something simple and more elegant but of course I can solve the problem going from char to char and counting the number of open/close brackets to identify when they are really closed...

You can answer in any language...
# 2  
Old 03-14-2010
For the SPECIFIC example you posted, this works.

Code:
$ echo "[[something and something [[like this]] and this [[but]] here is [[what]].]] [[another big string [[with other]] substrings.]]" \
| perl -nle 'print "$1\n$2" if (/\[\[(.*\]\]\.)\]\]\s+(\[\[.*\.)\]\]/);'
something and something [[like this]] and this [[but]] here is [[what]].
[[another big string [[with other]] substrings.
$

Are the "[[" the delimiters of something? Are the ".]]" the break to a new string? If so, there may be an easier regex combined with global matches or a loop.

I cannot recommend more highly using an interactive regex tool. What seems hard is really easy when you see it working realtime.

I use RegExr: Online Regular Expression Testing Tool but there are others, including ones that duplicate the specific regex flavor you are using.

Last edited by drewk; 03-14-2010 at 04:21 PM.. Reason: typo...
# 3  
Old 03-15-2010
Quote:
Originally Posted by drewk
For the SPECIFIC example you posted, this works.

Code:
$ echo "[[something and something [[like this]] and this [[but]] here is [[what]].]] [[another big string [[with other]] substrings.]]" \
| perl -nle 'print "$1\n$2" if (/\[\[(.*\]\]\.)\]\]\s+(\[\[.*\.)\]\]/);'
something and something [[like this]] and this [[but]] here is [[what]].
[[another big string [[with other]] substrings.
$

Are the "[[" the delimiters of something? Are the ".]]" the break to a new string? If so, there may be an easier regex combined with global matches or a loop.

I cannot recommend more highly using an interactive regex tool. What seems hard is really easy when you see it working realtime.

I use RegExr: Online Regular Expression Testing Tool but there are others, including ones that duplicate the specific regex flavor you are using.
Yes, that works, but the problem is that the strings change in size and in number of substrings, and the .]] also may change!

Another example:
Code:
[[something is something]] [[and then I was [[wondering]] why is  [[this like so]]]] [[anyway I think [[that]] I [[can]] go to]]]] [[blabla]] [[bla bla [[foo]] [[bar]]]] [[foobar]] [[not]]

The [[ ]] are the delimiters of the string, however that string can have substrings ALSO delimited by [[ ]]

Thanks for your reply...
# 4  
Old 03-15-2010
Code:
[[something is something]] [[and then I was [[wondering]] why is  [[this like so]]]]

If you were ever unfortunate and had to code lisp, this was a common problem.
There was a kind of descent parse: incrementing and decrementing a counter. When the counter got back to zero you were at the end of the line/command.

C:
Code:
int foo(char *src)
{
   char dest[80][120]={0x0};
   char *p=NULL;
   char *q=NULL;
   int line=0;
   int count=0;

   for(p=src, q=dest[line]; *p; p++)
   {
        if(*p=='[') count++;
        if(*p==']') count--;
        if(!count && *p==']')
        {
              line++;
              q=dest[line];
             
        }
        if(!count) continue;
        *q=*p;
          q++;
   }
}

You will have to knock off the leading and trailing square brackets. This also does not handle inter-string characters like spaces.

Last edited by jim mcnamara; 03-15-2010 at 10:54 AM..
# 5  
Old 03-15-2010
Quote:
Originally Posted by redoubtable
Yes, that works, but the problem is that the strings change in size and in number of substrings, and the .]] also may change!

Another example:
Code:
[[something is something]] [[and then I was [[wondering]] why is  [[this like so]]]] [[anyway I think [[that]] I [[can]] go to]]]] [[blabla]] [[bla bla [[foo]] [[bar]]]] [[foobar]] [[not]]

The [[ ]] are the delimiters of the string, however that string can have substrings ALSO delimited by [[ ]]

Thanks for your reply...
So what specifically is the delimiter and the action desired?

In your first example, the "[[sub strings]]" were kept in the broken out string 1 and string 2. Are all the enclosing [[ and ]] ultimately enclosing separate strings that will be recursively knocked down? In your example, you only showed one break on the ".]]"
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Big pattern file matching within another pattern file in awk or shell

Hi I need to do a patten match between files . I am new to shell scripting and have come up with this so far. It take 50 seconds to process files of 2mb size . I need to tune this code as file size will be around 50mb and need to save time. Main issue is that I need to search the pattern from... (2 Replies)
Discussion started by: nitin_daharwal
2 Replies

2. UNIX for Dummies Questions & Answers

Grep -v lines starting with pattern 1 and not matching pattern 2

Hi all! Thanks for taking the time to view this! I want to grep out all lines of a file that starts with pattern 1 but also does not match with the second pattern. Example: Drink a soda Eat a banana Eat multiple bananas Drink an apple juice Eat an apple Eat multiple apples I... (8 Replies)
Discussion started by: demmel
8 Replies

3. Shell Programming and Scripting

PHP - Regex for matching string containing pattern but without pattern itself

The sample file: dept1: user1,user2,user3 dept2: user4,user5,user6 dept3: user7,user8,user9 I want to match by '/^dept2.*/' but don't want to have substring 'dept2:' in output. How to compose such regex? (8 Replies)
Discussion started by: urello
8 Replies

4. Shell Programming and Scripting

Sed: printing lines AFTER pattern matching EXCLUDING the line containing the pattern

'Hi I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match. Which option is to be used to exclude the line containing the pattern? sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Discussion started by: essem
11 Replies

5. UNIX for Dummies Questions & Answers

Find pattern suffix matching pattern

Hi, I am trying to get a result out of this but fails please help. Have two files /tmp/1 & /tmp/hosts. /tmp/1 IP=123.456.789.01 WAS_HOSTNAME=abcdefgh.was.tb.dsdc /tmp/hosts 123.456.789.01 I want this result in /tmp/hosts if hostname is already there dont want duplicate entry. ... (5 Replies)
Discussion started by: rajeshwebspere
5 Replies

6. Shell Programming and Scripting

sed - matching pattern one but not pattern two

All, I have the following file: -------------------------------------- # # /etc/pam.d/common-password - password-related modules common to all services # # This file is included from other service-specific PAM config files, # and should contain a list of modules that define the services... (2 Replies)
Discussion started by: RobertBerrie
2 Replies

7. Shell Programming and Scripting

counting the lines matching a pattern, in between two pattern, and generate a tab

Hi all, I'm looking for some help. I have a file (very long) that is organized like below: >Cluster 0 0 283nt, >01_FRYJ6ZM12HMXZS... at +/99% 1 279nt, >01_FRYJ6ZM12HN12A... at +/99% 2 281nt, >01_FRYJ6ZM12HM4TS... at +/99% 3 283nt, >01_FRYJ6ZM12HM946... at +/99% 4 279nt,... (4 Replies)
Discussion started by: d.chauliac
4 Replies

8. Shell Programming and Scripting

Pattern Matching

Hi Folks, I have the following requirement: I have a file that is containing numerous queries. The tables name mentioned in the queries are in the following format : SchemaName.Tablename. e.g COPDB.TableName. I need to take out all the COPDB.TableName pattern and write it to a different... (6 Replies)
Discussion started by: Siv_Pat
6 Replies

9. UNIX for Dummies Questions & Answers

Pattern Matching

Hi Folks, I have the following requirement: I have a file that is containing numerous queries. The tables name mentioned in the queries are in the following format : SchemaName.Tablename. e.g COPDB.TableName. I need to take out all the COPDB.TableName pattern and write it to a different... (0 Replies)
Discussion started by: Siv_Pat
0 Replies

10. Shell Programming and Scripting

comment/delete a particular pattern starting from second line of the matching pattern

Hi, I have file 1.txt with following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433 ** ** ** In file 2.txt I have the following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433... (4 Replies)
Discussion started by: imas
4 Replies
Login or Register to Ask a Question