Pattern Match & Extract from a string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pattern Match & Extract from a string
# 1  
Old 01-24-2013
IBM Pattern Match & Extract from a string

Hi,

I have long string in 2nd field, as shown below:
Code:
 
REF1 | CLESCLJSCSHSCSMSCSNSCSRSCUDSCUFSCU7SCV1SCWPSCXGPDBACAPA0DHDPDMESED6 
REF2 | SBR4PCBFPCDRSCSCG3SCHEBSCKNSCKPSCLLSCMCZXTNPCVFPCV6P4KL0DMDSDSASEWG

I have a group of fixed patterns which can occur in these long strings & only one pattern will come for one record, I will maintain all possible patterns in a file called Patterns.txt:

Code:
APA 
APC 
DFH 
CZX

Eg: for the first record, APA occured and for second rec CZX occured and both are occured at differnt positions.

Expected output:
Code:
 
REF1 | APA
REF2 | CZX

Thanks
# 2  
Old 01-24-2013
Here is one way of doing it:
Code:
while read p
do
    awk -F\| -v P="$p" '{if(match($2,P)>0) print $1,substr($2,RSTART,RLENGTH); }' OFS=\| filename
done < Patterns.txt

This User Gave Thanks to Yoda For This Post:
# 3  
Old 01-24-2013
Code:
awk 'NR==FNR{P[$1]; next}{for(i in P) if($3~i) {print $1,$2,i; next}}' file2 file1

This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 01-24-2013
IBM

Quote:
Originally Posted by bipinajith
Here is one way of doing it:
Code:
while read p
do
    awk -F\| -v P="$p" '{if(match($2,P)>0) print $1,substr($2,RSTART,RLENGTH); }' OFS=\| filename
done < Patterns.txt

Thanks Bipin, I got the desired output. if possible, can u please explain the functionality of this code ?
# 5  
Old 01-24-2013
The while-loop simply reads every line in patterns.txt into the p variable, one by one.

Code:
awk
# Use | as the input separator
        -F\|
# Set the P variable inside awk to the value of the shell variable $p
        -v P="$p"
# For each line, check if the second token matches the variable P
# If it does, print the first token, and the subsection of the second
# token that matched.
# RSTART and RLENGTH are automatic variables set by match.
        '{if(match($2,P)>0) print $1,substr($2,RSTART,RLENGTH); }'
# Use | as the output separator
        OFS=\|
# Read from filename
        filename

These 2 Users Gave Thanks to Corona688 For This Post:
# 6  
Old 01-24-2013
Even though my approach works, I recommend using Scrutinizer's approach because it will be way much faster than using a while loop and feeding input to awk
# 7  
Old 01-24-2013
IBM

To avoild dependecy on Pattern.txt, I just want to calculate the required output directly from the data:

Sample data:

Code:
REF 1 | BADSBCESBCSSBNUSBR4PCBFPCDRSCF3SCGDSCG3SCHEPCKBSCKN DMDSDSASEWG SGTKSGXWSGX4SHABSHGASJACPJATSJAV NSPCC QCCSRA4SRCA RDHSRDLSR
REF 2 | APASBABSBCSSBC2SBNESBNGPBNPPBNSPBNTPBRFSCAKSCDCSNHMSPXR QXRSRA2SRCGSRCDFH DHDPDMESED6 GAMSGFASG

Desired output :

Code:
REF 1|PCC|QCC|EWG
REF 2|PXR|QXR|ED6

The three rules to extract the data are:
(i) second filed is cal based on the occurence of "P" just 3 characters left to the space in the source and from there 3 chars.
(ii) Third field is calculated - occurence of "Q" just after the space and from there 3 characters.
(iii) Fouth filed is based on occurence of "E" just before 3 characters left to a space and from there 3 chars.
i.e; The second, third & fourth fields of the output are always 3 chars only.
Any ideas to implement this ?
Thanks in advance.

Last edited by karumudi7; 01-24-2013 at 04:29 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Help with pattern match and Extract

Hi All, I am having a file like below . Basically when SB comes in the text with B. I have to take the word till SB. When there only B I should take take till B. Tried for cut it by demilter but not able to build the logic SB- CD B_RESTO SB_RESTO CRYSTALS BOILERS -->There SB and B so I... (6 Replies)
Discussion started by: arunkumar_mca
6 Replies

2. Shell Programming and Scripting

Extract lines that match a pattern

Hi all, I got a file that contains the following content, Actually it is a part of the file content, Installing XYZ XYZA Image, API 18, revision 2 Unzipping XYZ XYZA Image, API 18, revision 2 (1%) Unzipping XYZ XYZA Image, API 18, revision 2 (96%) Unzipping XYZ XYZA Image, API 18,... (7 Replies)
Discussion started by: Kashyap
7 Replies

3. Shell Programming and Scripting

pattern match in a string

Hello, Please see below line code: #!/bin/ksh set -x /usr/bin/cat /home/temp |while read line do if ] then echo "matched" else echo "nope" fi done content of filr temp is as below (4 Replies)
Discussion started by: skhichi
4 Replies

4. Shell Programming and Scripting

Match a Pattern & Replace The value Using AWK

I have a csv file in which i have to search a particular string and replace the data in any column with something else. How do i do it using awk. file ------ 2001,John,USA,MN,20101001,29091.50,M,Active,Y 2002,Mike,USA,NY,20090130,342.00,M,Pending,N... (3 Replies)
Discussion started by: Sheel
3 Replies

5. Shell Programming and Scripting

Extract data from records that match pattern

Hi Guys, I have a file as follows: a b c 1 2 3 4 pp gg gh hh 1 2 fm 3 4 g h i j k l m 1 2 3 4 d e f g h j i k l 1 2 3 f 3 4 r t y u i o p d p re 1 2 3 f 4 t y w e q w r a s p a 1 2 3 4 I am trying to extract all the 2's from each row. 2 is just an example... (6 Replies)
Discussion started by: npatwardhan
6 Replies

6. Shell Programming and Scripting

Match first pattern first then extract second pattern match

My input file: <accession>Q91G55</accession> <name>043L_IIV6</name> <protein> <recommendedName> <location> <position position="294"/> </location> <fullName>Uncharacterized protein 043L</fullName> <accession>P18556</accession> <name>1106L_ASFB7</name> <protein> <recommendedName>... (5 Replies)
Discussion started by: patrick87
5 Replies

7. Shell Programming and Scripting

Match pattern and replace with string

hi guys, insert into /*<new>*/abc_db.tbl_name this is should be replaced to insert into /*<new>*/${new}.tbl_name it should use '.' as delimiter and replace is there any way to do it using sed (6 Replies)
Discussion started by: sol_nov
6 Replies

8. Shell Programming and Scripting

pattern match url in string / PERL

Am trying to remove urls from text strings in PERL. I have the following but it does not seem to work: $remarks =~ s/www\.\s+\.com//gi; In English, I want to look for www. then I want to delete the www. and everything after it until I hit a space (but not including the space). It's not... (2 Replies)
Discussion started by: mrealty
2 Replies

9. UNIX for Advanced & Expert Users

how can awk match multi pattern in a string

Hi all, I need to category the processes in my system with awk. And for now, there are several command with similar name, so i have to match more than one pattern to pick it out. for instance: binrundb the string1, 2 & 3 may contain word, number, blank or "/". The "bin" should be ahead "rundb"... (5 Replies)
Discussion started by: sleepy_11
5 Replies

10. Shell Programming and Scripting

SED: match pattern & delete matched lines

Hi all, I have the following data in a file x.csv: > ,this is some text here > ,,,,,,,,,,,,,,,,2006/11/16,0.23 > ,,,,,,,,,,,,,,,,2006/12/16,0.88 < ,,,,,,,,,,,,,,,,this shouldnt be deleted I need to use SED to match anything with a > in the line and delete that line, can someone help... (7 Replies)
Discussion started by: not4google
7 Replies
Login or Register to Ask a Question