Go Back   The UNIX and Linux Forums > Top Forums > Programming
google site



Programming Post questions about C, C++, Java, SQL, and other programming languages here.

Reply
English Japanese Spanish French German Portuguese Italian Powered by Powered by Google
 
Thread Tools Search this Thread Display Modes
  #1  
Old 03-14-2010
Registered User
 

Join Date: Aug 2008
Location: Portugal
Posts: 244
Thanks: 0
Thanked 0 Times in 0 Posts
Help with pattern matching

I have a group of strings like:


Code:
[[something and something [[like this]] and this [[but]] here is [[what]].]] [[another big string [[with other]] substrings.]]

I need a regular expression/way to extract those strings like so:


Code:
string1 => something and something [[like this]] and this [[but]] here is [[what]].
string2 => another big string [[with other]] substrings.

I tried (PHP):
preg_match("/\[\[(.+?)\]\]/"...
but this obviously catches: something and something [[like this

I was looking for something simple and more elegant but of course I can solve the problem going from char to char and counting the number of open/close brackets to identify when they are really closed...

You can answer in any language...
Sponsored Links
  #2  
Old 03-14-2010
Registered User
 

Join Date: Mar 2010
Location: la jolla, ca
Posts: 101
Thanks: 4
Thanked 3 Times in 3 Posts
For the SPECIFIC example you posted, this works.


Code:
$ echo "[[something and something [[like this]] and this [[but]] here is [[what]].]] [[another big string [[with other]] substrings.]]" \
| perl -nle 'print "$1\n$2" if (/\[\[(.*\]\]\.)\]\]\s+(\[\[.*\.)\]\]/);'
something and something [[like this]] and this [[but]] here is [[what]].
[[another big string [[with other]] substrings.
$

Are the "[[" the delimiters of something? Are the ".]]" the break to a new string? If so, there may be an easier regex combined with global matches or a loop.

I cannot recommend more highly using an interactive regex tool. What seems hard is really easy when you see it working realtime.

I use RegExr: Online Regular Expression Testing Tool but there are others, including ones that duplicate the specific regex flavor you are using.

Last edited by drewk; 03-14-2010 at 02:21 PM.. Reason: typo...
  #3  
Old 03-15-2010
Registered User
 

Join Date: Aug 2008
Location: Portugal
Posts: 244
Thanks: 0
Thanked 0 Times in 0 Posts
Quote:
Originally Posted by drewk View Post
For the SPECIFIC example you posted, this works.


Code:
$ echo "[[something and something [[like this]] and this [[but]] here is [[what]].]] [[another big string [[with other]] substrings.]]" \
| perl -nle 'print "$1\n$2" if (/\[\[(.*\]\]\.)\]\]\s+(\[\[.*\.)\]\]/);'
something and something [[like this]] and this [[but]] here is [[what]].
[[another big string [[with other]] substrings.
$

Are the "[[" the delimiters of something? Are the ".]]" the break to a new string? If so, there may be an easier regex combined with global matches or a loop.

I cannot recommend more highly using an interactive regex tool. What seems hard is really easy when you see it working realtime.

I use RegExr: Online Regular Expression Testing Tool but there are others, including ones that duplicate the specific regex flavor you are using.
Yes, that works, but the problem is that the strings change in size and in number of substrings, and the .]] also may change!

Another example:

Code:
[[something is something]] [[and then I was [[wondering]] why is  [[this like so]]]] [[anyway I think [[that]] I [[can]] go to]]]] [[blabla]] [[bla bla [[foo]] [[bar]]]] [[foobar]] [[not]]

The [[ ]] are the delimiters of the string, however that string can have substrings ALSO delimited by [[ ]]

Thanks for your reply...
  #4  
Old 03-15-2010
...@...
 

Join Date: Feb 2004
Location: NM
Posts: 6,729
Thanks: 0
Thanked 53 Times in 50 Posts

Code:
[[something is something]] [[and then I was [[wondering]] why is  [[this like so]]]]

If you were ever unfortunate and had to code lisp, this was a common problem.
There was a kind of descent parse: incrementing and decrementing a counter. When the counter got back to zero you were at the end of the line/command.

C:

Code:
int foo(char *src)
{
   char dest[80][120]={0x0};
   char *p=NULL;
   char *q=NULL;
   int line=0;
   int count=0;

   for(p=src, q=dest[line]; *p; p++)
   {
        if(*p=='[') count++;
        if(*p==']') count--;
        if(!count && *p==']')
        {
              line++;
              q=dest[line];
             
        }
        if(!count) continue;
        *q=*p;
          q++;
   }
}

You will have to knock off the leading and trailing square brackets. This also does not handle inter-string characters like spaces.

Last edited by jim mcnamara; 03-15-2010 at 08:54 AM..
  #5  
Old 03-15-2010
Registered User
 

Join Date: Mar 2010
Location: la jolla, ca
Posts: 101
Thanks: 4
Thanked 3 Times in 3 Posts
Quote:
Originally Posted by redoubtable View Post
Yes, that works, but the problem is that the strings change in size and in number of substrings, and the .]] also may change!

Another example:

Code:
[[something is something]] [[and then I was [[wondering]] why is  [[this like so]]]] [[anyway I think [[that]] I [[can]] go to]]]] [[blabla]] [[bla bla [[foo]] [[bar]]]] [[foobar]] [[not]]

The [[ ]] are the delimiters of the string, however that string can have substrings ALSO delimited by [[ ]]

Thanks for your reply...
So what specifically is the delimiter and the action desired?

In your first example, the "[[sub strings]]" were kept in the broken out string 1 and string 2. Are all the enclosing [[ and ]] ultimately enclosing separate strings that will be recursively knocked down? In your example, you only showed one break on the ".]]"
Sponsored Links
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
counting the lines matching a pattern, in between two pattern, and generate a tab d.chauliac Shell Programming and Scripting 4 03-19-2009 12:30 PM
pattern matching adityamahi Shell Programming and Scripting 7 12-30-2008 09:02 AM
comment/delete a particular pattern starting from second line of the matching pattern imas Shell Programming and Scripting 4 10-13-2008 02:37 AM
pattern matching in an if-then lumix Shell Programming and Scripting 4 12-14-2007 03:25 PM
Pattern Matching op4_u Shell Programming and Scripting 10 07-18-2006 12:30 AM



All times are GMT -4. The time now is 07:39 AM.