Perl:Regex for Search and Replace that has a flexible match


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl:Regex for Search and Replace that has a flexible match
# 1  
Old 01-02-2013
Perl:Regex for Search and Replace that has a flexible match

Hi,

I'm trying to match the front and back of a sequence. It works when there is an exact match (obviously), but I need the regex to be more flexible. When we get strings of nucleotides sometimes their prefixes and suffixes aren't exact matches. Sometimes there will be an extra letter and sometimes a letter will be missing or sometimes both.

For example if I was trying to match the string "Imhungry" in the front of a string and replace it with nothing I would use the following code.

Code:
$sequence =~ s/^.*?Imhungry//s;

This works great, but I need help writing some flexibility in the regex where I could also capture instances where
[1] single letter is missing eg."Imungry" or "mungry".
[2] a single letter is added (any letter) eg. "Immhungry" or Imhungryy"
[3] both eg. "Imhungyy" or "Immungryy" *notice this last example has two single letter duplications and one deletion

Thanks!

If this is too absurd let me know.

With a wildcard character I think I can do this.
Code:
$sequence =~ s/^.*?I{0,2}m{0,2}h{0,2}u{0,2}n{0,2}g{0,2}r{0,2}y{0,2}//s;


Last edited by jdilts; 01-02-2013 at 12:43 PM.. Reason: maybe this is absurd and wildcard char comment
# 2  
Old 01-02-2013
There are transforms like soundex that nullify spelling differences.

Regex that tolerates missing or extra every byte of key gets too loose, fast. You might construct an extended regex where for a n byte key, bytes 1 through n only are *, so it matches n-1 bytes.\, e.g., for 'abcd', 'a*bcd|ab*cd|abc*d|abcd*'.

I suppose you could write a scoring system for how many extra or missing in key match, and sort by the score, cut off at an 80% score or something.
This User Gave Thanks to DGPickett For This Post:
# 3  
Old 01-03-2013
Extended Regex or Scoring System

I think you are right. The regex I wrote is too loose. I'm going to give the extended regex a try and then decided if I should use a scoring system. Thanks for responding.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Need help to use regex to do search and replace. Don't know how to and can't figure out how :(

Hi, Below is an excerpt from a 20000+ lines and I want to do a search and replace of a specific string but I don't know how and I can't figure out how to. Can't find an example from Google or anywhere to do what I am wanting to do. A 2018-11-21 08:42:17 TEST_TEST 2018-11-21... (9 Replies)
Discussion started by: newbie_01
9 Replies

2. Shell Programming and Scripting

Help search and replace the last occurance of match in a file

Hi I want to replace only the last occurance of "union all" in input file with ";" I tried with sed 's/union all/;/g' in my input file, it replaced in all lines of input file Eg: select column1,column2 from test1 group by 2 union all select column1,column2 from test2 group by 2 union all ... (9 Replies)
Discussion started by: antosr7
9 Replies

3. Shell Programming and Scripting

Search Replace Specific Column using RegEx

Have Pipe Delimited File: > BRYAN BAKER|4/4/2015|518 VIRGINIA AVE|TEST > JOE BAXTER|3/30/2015|2233 MockingBird RD|ROW2On 3rd column where the address is located, I want to add a space after every numeric value - basically doing a "s//&\ / ": > BRYAN BAKER|4/4/2015|5 1 8 VIRGINIA AVE|TEST > JOE... (5 Replies)
Discussion started by: svn
5 Replies

4. Shell Programming and Scripting

Multi line regex for search and replace

I have text file like below: a.txt Server=abc Run=1 Time=120.123 Tables=10 Sessions=16 Time=380.123 Version=1.1 Jobs=5 Server=abc Run=2 Time=160.123 Tables=15 Sessions=16 Time=400.258 Version=2.0 (1 Reply)
Discussion started by: sol_nov
1 Replies

5. Shell Programming and Scripting

Regex - search and replace

I have file which contains data in the following format all in a single line: BDW_PUBLN_ID DECIMAL(18:0) NOT NULL PRIMARY INDEX ARGO_ACCT_DEP_PI ( OFC_ID ,CSHBX_ID ,TRXN_SEQ_NUM ,PROCG_DT ) PARTITION BY RANGE_N(PROCG_DT BETWEEN DATE '2012-03-01' AND DATE '2014-12-31' EACH INTERVAL '1' MONTH );... (4 Replies)
Discussion started by: ysvsr1
4 Replies

6. Shell Programming and Scripting

Regex:search/replace but not for escaped character

Hi Input: - -- --- ---- aa-bb-cc aa--bb--cc aa---bb---cc aa----bb----cc Output: . - -. -- aa.bb.cc (7 Replies)
Discussion started by: chitech
7 Replies

7. Shell Programming and Scripting

perl regex string match issue..kindly help

i have a script in which i need to skip comments, and i am able to achieve it partially... IN text file: {**************************** {test : test...test } Script: while (<$fh>) { push ( @data, $_); } if ( $data =~ m/(^{\*+$)/ ){ } With the above match i am... (5 Replies)
Discussion started by: avskrm
5 Replies

8. Emergency UNIX and Linux Support

search replace regex question

Hi, I need to run a search and replace on a large database, what I need to change is all instances of #### (eg. 1764 or 1964) to (####) (eg. (1764) or (1964)) But there might be other numbers in there such as (1764) and I do not need those changed to ((1764)) How can I... (7 Replies)
Discussion started by: lawstudent
7 Replies

9. Shell Programming and Scripting

Search & Replace regex Perl one liner to AWK one liner

Thanks for giving your time and effort to answer questions and helping newbies like me understand awk. I have a huge file, millions of lines, so perl takes quite a bit of time, I'd like to convert these perl one liners to awk. Basically I'd like all lines with ISA sandwiched between... (9 Replies)
Discussion started by: verge
9 Replies

10. Shell Programming and Scripting

Issues with an exact match using regex in perl!

Hello Guys, I am trying to make an exact match for an email address entered as an argument, using perl, however, it's not working if I put a "$" in the email address. See the below outputs, Correct Match : bash-2.03$ echo sandy@test.com | perl -wln -e 'print if /(^*\@test.com$)/i'... (6 Replies)
Discussion started by: suffisandy
6 Replies
Login or Register to Ask a Question