Pattern Match & Extract from a string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pattern Match & Extract from a string
# 8  
Old 01-24-2013
Hi Karumudi7,
Note that although the scripts provided by bipinajith and Scrutinizer both do what you want, neither of them do what you asked for. The contents you gave us for Patterns.txt in the 1st message in this thread has a trailing <space> character at the end of the first three lines. And, the first line of your input file does not contain "APA " in the last field.

The script bipinajith provided strips the trailing spaces by using the default value of IFS while reading lines from Patterns.txt. The awk script Scrutinizer provided stripped the trailing space by using the default field separator in awk.

I was working on an awk script similar to Scrutinizer's script, but I was using FS = " [|] " to simplify the output line. It took me a while to realize that my script was failing due to the trailing spaces in your list of patterns.
# 9  
Old 01-24-2013
Quote:
Originally Posted by Don Cragun
Hi Karumudi7,
Note that although the scripts provided by bipinajith and Scrutinizer both do what you want, neither of them do what you asked for. The contents you gave us for Patterns.txt in the 1st message in this thread has a trailing <space> character at the end of the first three lines. And, the first line of your input file does not contain "APA " in the last field.

The script bipinajith provided strips the trailing spaces by using the default value of IFS while reading lines from Patterns.txt. The awk script Scrutinizer provided stripped the trailing space by using the default field separator in awk.

I was working on an awk script similar to Scrutinizer's script, but I was using FS = " [|] " to simplify the output line. It took me a while to realize that my script was failing due to the trailing spaces in your list of patterns.
The trailing space might came wich I copying & paste those here. Sorry for that.
Due to these constraints, I want to remove the dependecy on Patterns.txt and updated the same before your post.

https://www.unix.com/302760771-post7.html

Thanks.
# 10  
Old 01-24-2013
For your new problem, try:
Code:
awk 'BEGIN {FS = OFS = "|" }
{       match($2, /P.. /)
        f2 = " " substr($2, RSTART, RLENGTH)
        match($2, / Q../)
        f3 = substr($2, RSTART, RLENGTH) " "
        match($2, /E.. /)
        f4 = " " substr($2, RSTART, RLENGTH - 1)
        print $1, f2, f3, f4
}' Sample2.txt

As always, if you're using a Solaris/Sun OS system, use /usr/xpg4/bin/awk or nawk instead of awk.
This User Gave Thanks to Don Cragun For This Post:
# 11  
Old 01-26-2013
Quote:
Originally Posted by Don Cragun
For your new problem, try:
Code:
awk 'BEGIN {FS = OFS = "|" }
{       match($2, /P.. /)
        f2 = " " substr($2, RSTART, RLENGTH)
        match($2, / Q../)
        f3 = substr($2, RSTART, RLENGTH) " "
        match($2, /E.. /)
        f4 = " " substr($2, RSTART, RLENGTH - 1)
        print $1, f2, f3, f4
}' Sample2.txt

As always, if you're using a Solaris/Sun OS system, use /usr/xpg4/bin/awk or nawk instead of awk.
Thanks it worked, can u please let me know the functionality. First time I am using match,RSTART & RLENGTH functions.
# 12  
Old 01-26-2013
Quote:
Originally Posted by karumudi7
Thanks it worked, can u please let me know the functionality. First time I am using match,RSTART & RLENGTH functions.
The awk script:
Code:
awk 'BEGIN {FS = OFS = "|" }
{       match($2, /P.. /)
        f2 = " " substr($2, RSTART, RLENGTH)
        match($2, / Q../)
        f3 = substr($2, RSTART, RLENGTH) " "
        match($2, /E.. /)
        f4 = " " substr($2, RSTART, RLENGTH - 1)
        print $1, f2, f3, f4
}' Sample2.txt

starts by setting the input and output field separators (FS and OFS) to the <vertical-line> (or pipe) character. So, with your sample data, the 2nd field in each input line always begins with a <space> character.

The match() calls search the string specified by the first argument (in all three cases this is the 2nd field in an input line) for the extended regular expression specified by the 2nd argument, returns the index in that string where the 1st match occurs (or 0 if there is no match) and sets RSTART to the same value. If RSTART is not zero, RLENGTH is set to the length of the substring that matches the ERE. The substring EREs given to these three calls to match() search for a P followed by any two characters followed by a space, for a space followed by a Q followed by any two characters, and for an E followed by any two characters followed by a space character, respectively. (Note that with these EREs, three letter strings found at the end of the line will not be matched since there is no trailing space in those cases; but your requirements explicitly stated that the match was to be to a following space. Note also that if some of your input lines do not have matches for all three of your specified conditions, the results are unspecified and there will be no warning that something didn't match. If this is a concern, you should check the return code from match() and print a diagnostic message if it returns 0.)

The following calls to substr() use the values of RSTART and RLENGTH set by match() to extract the desired output fields (with added leading or trailing spaces) to set f2, f3, and f4 to be the desired 2nd, 3rd, and 4th output fields, respectively.

Note that RLENGTH - 1 is used in the last substr() to eliminate the unwanted trailing space that would appear at the end of the line if RLENGTH had been used instead. With all of the EREs used in these match() calls, RLENGTH will always be 4, but I kept RLENGTH and RLENGTH - 1 rather than 4 and 3 in case you later decide to change the EREs to match different strings.

With OFS set to "|", the print call adds the specified field separators when printing the output lines.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Help with pattern match and Extract

Hi All, I am having a file like below . Basically when SB comes in the text with B. I have to take the word till SB. When there only B I should take take till B. Tried for cut it by demilter but not able to build the logic SB- CD B_RESTO SB_RESTO CRYSTALS BOILERS -->There SB and B so I... (6 Replies)
Discussion started by: arunkumar_mca
6 Replies

2. Shell Programming and Scripting

Extract lines that match a pattern

Hi all, I got a file that contains the following content, Actually it is a part of the file content, Installing XYZ XYZA Image, API 18, revision 2 Unzipping XYZ XYZA Image, API 18, revision 2 (1%) Unzipping XYZ XYZA Image, API 18, revision 2 (96%) Unzipping XYZ XYZA Image, API 18,... (7 Replies)
Discussion started by: Kashyap
7 Replies

3. Shell Programming and Scripting

pattern match in a string

Hello, Please see below line code: #!/bin/ksh set -x /usr/bin/cat /home/temp |while read line do if ] then echo "matched" else echo "nope" fi done content of filr temp is as below (4 Replies)
Discussion started by: skhichi
4 Replies

4. Shell Programming and Scripting

Match a Pattern & Replace The value Using AWK

I have a csv file in which i have to search a particular string and replace the data in any column with something else. How do i do it using awk. file ------ 2001,John,USA,MN,20101001,29091.50,M,Active,Y 2002,Mike,USA,NY,20090130,342.00,M,Pending,N... (3 Replies)
Discussion started by: Sheel
3 Replies

5. Shell Programming and Scripting

Extract data from records that match pattern

Hi Guys, I have a file as follows: a b c 1 2 3 4 pp gg gh hh 1 2 fm 3 4 g h i j k l m 1 2 3 4 d e f g h j i k l 1 2 3 f 3 4 r t y u i o p d p re 1 2 3 f 4 t y w e q w r a s p a 1 2 3 4 I am trying to extract all the 2's from each row. 2 is just an example... (6 Replies)
Discussion started by: npatwardhan
6 Replies

6. Shell Programming and Scripting

Match first pattern first then extract second pattern match

My input file: <accession>Q91G55</accession> <name>043L_IIV6</name> <protein> <recommendedName> <location> <position position="294"/> </location> <fullName>Uncharacterized protein 043L</fullName> <accession>P18556</accession> <name>1106L_ASFB7</name> <protein> <recommendedName>... (5 Replies)
Discussion started by: patrick87
5 Replies

7. Shell Programming and Scripting

Match pattern and replace with string

hi guys, insert into /*<new>*/abc_db.tbl_name this is should be replaced to insert into /*<new>*/${new}.tbl_name it should use '.' as delimiter and replace is there any way to do it using sed (6 Replies)
Discussion started by: sol_nov
6 Replies

8. Shell Programming and Scripting

pattern match url in string / PERL

Am trying to remove urls from text strings in PERL. I have the following but it does not seem to work: $remarks =~ s/www\.\s+\.com//gi; In English, I want to look for www. then I want to delete the www. and everything after it until I hit a space (but not including the space). It's not... (2 Replies)
Discussion started by: mrealty
2 Replies

9. UNIX for Advanced & Expert Users

how can awk match multi pattern in a string

Hi all, I need to category the processes in my system with awk. And for now, there are several command with similar name, so i have to match more than one pattern to pick it out. for instance: binrundb the string1, 2 & 3 may contain word, number, blank or "/". The "bin" should be ahead "rundb"... (5 Replies)
Discussion started by: sleepy_11
5 Replies

10. Shell Programming and Scripting

SED: match pattern & delete matched lines

Hi all, I have the following data in a file x.csv: > ,this is some text here > ,,,,,,,,,,,,,,,,2006/11/16,0.23 > ,,,,,,,,,,,,,,,,2006/12/16,0.88 < ,,,,,,,,,,,,,,,,this shouldnt be deleted I need to use SED to match anything with a > in the line and delete that line, can someone help... (7 Replies)
Discussion started by: not4google
7 Replies
Login or Register to Ask a Question