Regex - Capturing groups


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Regex - Capturing groups
# 1  
Old 11-13-2013
Regex - Capturing groups

I am having trouble with regex capturing groups, For Ex :

I am having a file with

Code:
ABC  CDLF SFSDFK PRIMARY INDEX(XYZ,DEF,GHI);
XYZ   FLJ SDFKLD; PRIMARY INDEX(ABC);
BHI    SDKFLFLSFD  PRIMARY INDEX (QWE , RTY , LHJ);

My output should be :

Code:
ABC XYZ,DEF,GHI
XYZ ABC
BHI  OWE,RTY,LHJ

I am able to do the regex match, but not sure how to capture only a portion of the match.

Code:
gawk -v RS=";" '{
match($0,"PRIMARY[ ]+INDEX[ ]*[A-Za-z_ ]*[(]+",arr)
print arr[0]
}' tmp2.txt

Appreciate your help !
# 2  
Old 11-13-2013
Use RSTART and RLENGTH variables set by match function:
Code:
gawk '
        {
                match ( $0, /\(.*\)/ )
                print $1, substr ( $0, RSTART + 1, RLENGTH - 2 )
        }
' file

Or use gsub function:
Code:
awk '{ gsub(/[ ].*\(|\).*/," ")}1' file

# 3  
Old 11-13-2013
Basically you need to define each part that you want as a sub-expression using ellipses. Assuming the expression was matched, you'll get the matching parts in arr[1], arr[2], etc (and everything in 0, as well as some internal stuff used for RSTART, etc).

Also note that match breaking it into an array like this is GNUawk-specific.

Something to play with:
Code:
$ awk '{match ($0, /INDEX[ ]*\(([^\,]*)\,*([^\,]*)*/, arr); for (i in arr) {printf "%s:%s - [%s]\n", NR, i, arr[i]}}' x.txt
1:0start - [26]
1:0length - [13]
1:1start - [32]
1:2start - [36]
1:0 - [INDEX(XYZ,DEF]
1:1 - [XYZ]
1:2 - [DEF]
1:2length - [3]
1:1length - [3]
2:0start - [27]
2:0length - [11]
2:1start - [33]
2:2start - [38]
2:0 - [INDEX(ABC);]
2:1 - [ABC);]
2:2 - []
2:2length - [0]
2:1length - [5]
3:0start - [28]
3:0length - [17]
3:1start - [35]
3:2start - [40]
3:0 - [INDEX (QWE , RTY ]
3:1 - [QWE ]
3:2 - [ RTY ]
3:2length - [5]
3:1length - [4]

(Line 2 match values indicate that the regex needs work Smilie)
# 4  
Old 11-13-2013
In perl:

Code:
perl -ne 'print "$1 $2\n" if /(\w+)\s(?:.*)\((.*)\);$/' inputfile

This User Gave Thanks to greet_sed For This Post:
# 5  
Old 11-13-2013
A barbarian method:

Code:
cat file | while read line
do var=$(echo $line | awk '{print $1}')
var1=$(echo $line | awk -F[\(\)] '{print $2}')
echo $var $var1
done

# 6  
Old 11-13-2013
Quote:
Originally Posted by protocomm
A barbarian method:

Code:
cat file | while read line
do var=$(echo $line | awk '{print $1}')
var1=$(echo $line | awk -F[\(\)] '{print $2}')
echo $var $var1
done

It is indeed barbarian method!

You are using awk & UUOC, but all this can be done by using shell builtins:
Code:
#!/bin/ksh

while read v1 rest
do
        rest="${rest##*\(}"
        print "$v1 ${rest%\)*}"
done < file

This User Gave Thanks to Yoda For This Post:
# 7  
Old 11-13-2013
Another Barbarian Method :

Code:
gawk -v RS=";" '{
if($0~/PRIMARY[ ]+INDEX/)
{
match($0,"PRIMARY[ ]+INDEX[ ]*[A-Za-z_ ]*[(]+[ A-Za-z_,]+",arr)
a=RSTART 
b=RLENGTH 
match($0,"PRIMARY[ ]+INDEX[ ]*[A-Za-z_ ]*[(]+",xyz)
c=RSTART 
d=RLENGTH 
e=b-d
f=a+d
l=index($1,".")
m=substr($1,l+1)
print m "  " substr($0,f,e) >> "pi_tmp.txt"
}
}' tmp2.txt

Code:
perl -ne 'print "$1 $2\n" if /(\w+)\s(?:.*)\((.*)\);$/' inputfile

Wow !!! Impressive , Can you explain how it is done ?? Thank you !!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sendmail K command regex: adding exclusion/negative lookahead to regex -a@MATCH

I'm trying to get some exclusions into our sendmail regular expression for the K command. The following configuration & regex works: LOCAL_CONFIG # Kcheckaddress regex -a@MATCH +<@+?\.++?\.(us|info|to|br|bid|cn|ru) LOCAL_RULESETS SLocal_check_mail # check address against various regex... (0 Replies)
Discussion started by: RobbieTheK
0 Replies

2. Shell Programming and Scripting

Perl, RegEx - Help me to understand the regex!

I am not a big expert in regex and have just little understanding of that language. Could you help me to understand the regular Perl expression: ^(?!if\b|else\b|while\b|)(?:+?\s+){1,6}(+\s*)\(*\) *?(?:^*;?+){0,10}\{ ------ This is regex to select functions from a C/C++ source and defined in... (2 Replies)
Discussion started by: alex_5161
2 Replies

3. UNIX for Dummies Questions & Answers

read regex from ID file, print regex and line below from source file

I have a file of protein sequences with headers (my source file). Based on a list of IDs (which are included in some of the headers), I'd like to print out only the specified sequences, with only the ID as header. In other words, I'd like to search source.txt for the terms in IDs.txt, and print... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

4. Shell Programming and Scripting

Perl newbie - regex replace all groups issue

Hello, Although I have found similar questions, I could not find advice that could help with my problem. The issue: I am trying to replace all occurrences of a regex, but I cannot make the regex groups work together. This is a simple input test file: The Vedanta Philosophy... (3 Replies)
Discussion started by: samask
3 Replies

5. Shell Programming and Scripting

Need help with regex groups

I have a requirement - replace specified positions in a string with a character. I found perl regex useful for this approach. however, I am facing the following issue. The target file 'temp' contains - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx The goal is to convert... (5 Replies)
Discussion started by: sam_roy
5 Replies

6. Shell Programming and Scripting

Converting perl regex to sed regex

I am having trouble parsing rpm filenames in a shell script.. I found a snippet of perl code that will perform the task but I really don't have time to rewrite the entire script in perl. I cannot for the life of me convert this code into something sed-friendly: if ($rpm =~ /(*)-(*)-(*)\.(.*)/)... (1 Reply)
Discussion started by: suntzu
1 Replies

7. Shell Programming and Scripting

date capturing regex and storing

Hi all I need help on how to store two or more date formates captured using regex from an input sentence in PERL ? For example, I have an input sentence consisting of two dates such as : The departure date is August 12, 2009 and arrival date is 20.08.2009. Now, I want to capture the two... (4 Replies)
Discussion started by: my_Perl
4 Replies

8. Shell Programming and Scripting

Capturing regex of perl

Hi all I am struggling to find out the capturing regex of a date format such as 10/12/2009. Also I need help on how to assign the date(i.e, 10/12/2009 ) to a variable after the match is found using the capturing regex. Any help is appreciated. Thanks in advance. (5 Replies)
Discussion started by: my_Perl
5 Replies

9. Shell Programming and Scripting

Grep regex matches, groups

Hello, I am searching all over the place for this, just not finding anything solid :( I want to do be able to access the groups that are matched with grep (either with extended regex, or perl compatible regex). For instance: echo "abcd" | egrep "a(b(c(d)))" Of course this returns... (1 Reply)
Discussion started by: Rhije
1 Replies

10. Solaris

groups

1 user in member of 4 groups find file permissions and default group (1 Reply)
Discussion started by: tirupathi
1 Replies
Login or Register to Ask a Question