Regex - Capturing groups

11-13-2013

Registered User

36, 0

Join Date: Oct 2011

Last Activity: 5 September 2016, 10:51 PM EDT

Posts: 36

Thanks Given: 6

Thanked 0 Times in 0 Posts

Regex - Capturing groups

I am having trouble with regex capturing groups, For Ex :

I am having a file with

Code:

ABC  CDLF SFSDFK PRIMARY INDEX(XYZ,DEF,GHI);
XYZ   FLJ SDFKLD; PRIMARY INDEX(ABC);
BHI    SDKFLFLSFD  PRIMARY INDEX (QWE , RTY , LHJ);

My output should be :

Code:

ABC XYZ,DEF,GHI
XYZ ABC
BHI  OWE,RTY,LHJ

I am able to do the regex match, but not sure how to capture only a portion of the match.

Code:

gawk -v RS=";" '{
match($0,"PRIMARY[ ]+INDEX[ ]*[A-Za-z_ ]*[(]+",arr)
print arr[0]
}' tmp2.txt

Appreciate your help !

ysvsr1

View Public Profile for ysvsr1

Find all posts by ysvsr1

11-13-2013

Moderator

3,689, 1,352

Join Date: Jan 2012

Last Activity: 22 August 2020, 11:29 PM EDT

Location: Galactic Empire

Posts: 3,689

Thanks Given: 268

Thanked 1,352 Times in 1,258 Posts

Use RSTART and RLENGTH variables set by match function:

Code:

gawk '
        {
                match ( $0, /\(.*\)/ )
                print $1, substr ( $0, RSTART + 1, RLENGTH - 2 )
        }
' file

Or use gsub function:

Code:

awk '{ gsub(/[ ].*\(|\).*/," ")}1' file

Yoda

View Public Profile for Yoda

Visit Yoda's homepage!

Find all posts by Yoda

11-13-2013

Registered User

1,119, 264

Join Date: Oct 2011

Last Activity: 14 August 2020, 12:53 PM EDT

Location: London, UK

Posts: 1,119

Thanks Given: 134

Thanked 264 Times in 247 Posts

Basically you need to define each part that you want as a sub-expression using ellipses. Assuming the expression was matched, you'll get the matching parts in arr[1], arr[2], etc (and everything in 0, as well as some internal stuff used for RSTART, etc).

Also note that match breaking it into an array like this is GNUawk-specific.

Something to play with:

Code:

$ awk '{match ($0, /INDEX[ ]*\(([^\,]*)\,*([^\,]*)*/, arr); for (i in arr) {printf "%s:%s - [%s]\n", NR, i, arr[i]}}' x.txt
1:0start - [26]
1:0length - [13]
1:1start - [32]
1:2start - [36]
1:0 - [INDEX(XYZ,DEF]
1:1 - [XYZ]
1:2 - [DEF]
1:2length - [3]
1:1length - [3]
2:0start - [27]
2:0length - [11]
2:1start - [33]
2:2start - [38]
2:0 - [INDEX(ABC);]
2:1 - [ABC);]
2:2 - []
2:2length - [0]
2:1length - [5]
3:0start - [28]
3:0length - [17]
3:1start - [35]
3:2start - [40]
3:0 - [INDEX (QWE , RTY ]
3:1 - [QWE ]
3:2 - [ RTY ]
3:2length - [5]
3:1length - [4]

(Line 2 match values indicate that the regex needs work

)

CarloM

View Public Profile for CarloM

Find all posts by CarloM

11-13-2013

Registered User

258, 87

Join Date: Aug 2011

Last Activity: 7 December 2017, 3:56 PM EST

Posts: 258

Thanks Given: 48

Thanked 87 Times in 81 Posts

In perl:

Code:

perl -ne 'print "$1 $2\n" if /(\w+)\s(?:.*)\((.*)\);$/' inputfile

This User Gave Thanks to greet_sed For This Post:

greet_sed

View Public Profile for greet_sed

Find all posts by greet_sed

11-13-2013

Registered User

310, 26

Join Date: Mar 2009

Last Activity: 27 December 2015, 12:35 PM EST

Posts: 310

Thanks Given: 35

Thanked 26 Times in 26 Posts

A barbarian method:

Code:

cat file | while read line
do var=$(echo $line | awk '{print $1}')
var1=$(echo $line | awk -F[\(\)] '{print $2}')
echo $var $var1
done

protocomm

View Public Profile for protocomm

Find all posts by protocomm

11-13-2013

Moderator

3,689, 1,352

Join Date: Jan 2012

Last Activity: 22 August 2020, 11:29 PM EDT

Location: Galactic Empire

Posts: 3,689

Thanks Given: 268

Thanked 1,352 Times in 1,258 Posts

Quote:

Originally Posted by protocomm

A barbarian method:

Code:

cat file | while read line
do var=$(echo $line | awk '{print $1}')
var1=$(echo $line | awk -F[\(\)] '{print $2}')
echo $var $var1
done

It is indeed barbarian method!

You are using awk & UUOC, but all this can be done by using shell builtins:

Code:

#!/bin/ksh

while read v1 rest
do
        rest="${rest##*\(}"
        print "$v1 ${rest%\)*}"
done < file

This User Gave Thanks to Yoda For This Post:

Yoda

View Public Profile for Yoda

Visit Yoda's homepage!

Find all posts by Yoda

11-13-2013

Registered User

36, 0

Join Date: Oct 2011

Last Activity: 5 September 2016, 10:51 PM EDT

Posts: 36

Thanks Given: 6

Thanked 0 Times in 0 Posts

Another Barbarian Method :

Code:

gawk -v RS=";" '{
if($0~/PRIMARY[ ]+INDEX/)
{
match($0,"PRIMARY[ ]+INDEX[ ]*[A-Za-z_ ]*[(]+[ A-Za-z_,]+",arr)
a=RSTART 
b=RLENGTH 
match($0,"PRIMARY[ ]+INDEX[ ]*[A-Za-z_ ]*[(]+",xyz)
c=RSTART 
d=RLENGTH 
e=b-d
f=a+d
l=index($1,".")
m=substr($1,l+1)
print m "  " substr($0,f,e) >> "pi_tmp.txt"
}
}' tmp2.txt

Code:

perl -ne 'print "$1 $2\n" if /(\w+)\s(?:.*)\((.*)\);$/' inputfile

Wow !!! Impressive , Can you explain how it is done ?? Thank you !!

ysvsr1

View Public Profile for ysvsr1

Find all posts by ysvsr1

Shell Programming and Scripting

Regex - Capturing groups

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sendmail K command regex: adding exclusion/negative lookahead to regex -a@MATCH

Discussion started by: RobbieTheK

2. Shell Programming and Scripting

Perl, RegEx - Help me to understand the regex!

Discussion started by: alex_5161

3. UNIX for Dummies Questions & Answers

read regex from ID file, print regex and line below from source file

Discussion started by: pathunkathunk

4. Shell Programming and Scripting

Perl newbie - regex replace all groups issue

Discussion started by: samask

5. Shell Programming and Scripting

Need help with regex groups

Discussion started by: sam_roy

6. Shell Programming and Scripting

Converting perl regex to sed regex

Discussion started by: suntzu

7. Shell Programming and Scripting

date capturing regex and storing

Discussion started by: my_Perl

8. Shell Programming and Scripting

Capturing regex of perl

Discussion started by: my_Perl

9. Shell Programming and Scripting

Grep regex matches, groups

Discussion started by: Rhije

10. Solaris

groups

Discussion started by: tirupathi