Perl:Regex for Search and Replace that has a flexible match
Hi,
I'm trying to match the front and back of a sequence. It works when there is an exact match (obviously), but I need the regex to be more flexible. When we get strings of nucleotides sometimes their prefixes and suffixes aren't exact matches. Sometimes there will be an extra letter and sometimes a letter will be missing or sometimes both.
For example if I was trying to match the string "Imhungry" in the front of a string and replace it with nothing I would use the following code.
This works great, but I need help writing some flexibility in the regex where I could also capture instances where
[1] single letter is missing eg."Imungry" or "mungry".
[2] a single letter is added (any letter) eg. "Immhungry" or Imhungryy"
[3] both eg. "Imhungyy" or "Immungryy" *notice this last example has two single letter duplications and one deletion
Thanks!
If this is too absurd let me know.
With a wildcard character I think I can do this.
Last edited by jdilts; 01-02-2013 at 12:43 PM..
Reason: maybe this is absurd and wildcard char comment
Hello Guys,
I am trying to make an exact match for an email address entered as an argument, using perl, however, it's not working if I put a "$" in the email address. See the below outputs,
Correct Match :
bash-2.03$ echo sandy@test.com | perl -wln -e 'print if /(^*\@test.com$)/i'... (6 Replies)
Thanks for giving your time and effort to answer questions and helping newbies like me understand awk.
I have a huge file, millions of lines, so perl takes quite a bit of time, I'd like to convert these perl one liners to awk.
Basically I'd like all lines with ISA sandwiched between... (9 Replies)
Hi,
I need to run a search and replace on a large database,
what I need to change is all instances of
#### (eg. 1764 or 1964)
to
(####) (eg. (1764) or (1964))
But there might be other numbers in there such as
(1764) and I do not need those changed to ((1764))
How can I... (7 Replies)
i have a script in which i need to skip comments, and i am able to achieve it partially...
IN text file:
{****************************
{test : test...test }
Script:
while (<$fh>)
{
push ( @data, $_);
}
if ( $data =~ m/(^{\*+$)/ ){
}
With the above match i am... (5 Replies)
I have file which contains data in the following format all in a single line:
BDW_PUBLN_ID DECIMAL(18:0) NOT NULL PRIMARY INDEX ARGO_ACCT_DEP_PI ( OFC_ID ,CSHBX_ID ,TRXN_SEQ_NUM ,PROCG_DT ) PARTITION BY RANGE_N(PROCG_DT BETWEEN DATE '2012-03-01' AND DATE '2014-12-31' EACH INTERVAL '1' MONTH );... (4 Replies)
Have Pipe Delimited File:
> BRYAN BAKER|4/4/2015|518 VIRGINIA AVE|TEST
> JOE BAXTER|3/30/2015|2233 MockingBird RD|ROW2On 3rd column where the address is located, I want to add a space after every numeric value - basically doing a "s//&\ / ":
> BRYAN BAKER|4/4/2015|5 1 8 VIRGINIA AVE|TEST
> JOE... (5 Replies)
Hi I want to replace only the last occurance of "union all" in input file with ";"
I tried with sed 's/union all/;/g' in my input file, it replaced in all lines of input file
Eg:
select column1,column2 from test1 group by 2 union all
select column1,column2 from test2 group by 2 union all
... (9 Replies)
Hi,
Below is an excerpt from a 20000+ lines and I want to do a search and replace of a specific string but I don't know how and I can't figure out how to. Can't find an example from Google or anywhere to do what I am wanting to do.
A 2018-11-21 08:42:17 TEST_TEST 2018-11-21... (9 Replies)
Discussion started by: newbie_01
9 Replies
LEARN ABOUT ULTRIX
egrep
grep(1) General Commands Manual grep(1)Name
grep, egrep, fgrep - search file for regular expression
Syntax
grep [option...] expression [file...]
egrep [option...] [expression] [file...]
fgrep [option...] [strings] [file]
Description
Commands of the family search the input files (standard input default) for lines matching a pattern. Normally, each line found is copied
to the standard output.
The command patterns are limited regular expressions in the style of which uses a compact nondeterministic algorithm. The command patterns
are full regular expressions. The command uses a fast deterministic algorithm that sometimes needs exponential space. The command pat-
terns are fixed strings. The command is fast and compact.
In all cases the file name is shown if there is more than one input file. Take care when using the characters $ * [ ^ | ( ) and in the
expression because they are also meaningful to the Shell. It is safest to enclose the entire expression argument in single quotes ' '.
The command searches for lines that contain one of the (new line-separated) strings.
The command accepts extended regular expressions. In the following description `character' excludes new line:
A followed by a single character other than new line matches that character.
The character ^ matches the beginning of a line.
The character $ matches the end of a line.
A . (dot) matches any character.
A single character not otherwise endowed with special meaning matches that character.
A string enclosed in brackets [] matches any single character from the string. Ranges of ASCII character codes may be abbreviated
as in `a-z0-9'. A ] may occur only as the first character of the string. A literal - must be placed where it can't be mistaken as
a range indicator.
A regular expression followed by an * (asterisk) matches a sequence of 0 or more matches of the regular expression. A regular
expression followed by a + (plus) matches a sequence of 1 or more matches of the regular expression. A regular expression followed
by a ? (question mark) matches a sequence of 0 or 1 matches of the regular expression.
Two regular expressions concatenated match a match of the first followed by a match of the second.
Two regular expressions separated by | or new line match either a match for the first or a match for the second.
A regular expression enclosed in parentheses matches a match for the regular expression.
The order of precedence of operators at the same parenthesis level is the following: [], then *+?, then concatenation, then | and new
line.
Options-b Precedes each output line with its block number. This is sometimes useful in locating disk block numbers by context.
-c Produces count of matching lines only.
-e expression
Uses next argument as expression that begins with a minus (-).
-f file Takes regular expression (egrep) or string list (fgrep) from file.
-i Considers upper and lowercase letter identical in making comparisons and only).
-l Lists files with matching lines only once, separated by a new line.
-n Precedes each matching line with its line number.
-s Silent mode and nothing is printed (except error messages). This is useful for checking the error status (see DIAGNOSTICS).
-v Displays all lines that do not match specified expression.
-w Searches for an expression as for a word (as if surrounded by `<' and `>'). For further information, see only.
-x Prints exact lines matched in their entirety only).
Restrictions
Lines are limited to 256 characters; longer lines are truncated.
Diagnostics
Exit status is 0 if any matches are found, 1 if none, 2 for syntax errors or inaccessible files.
See Alsoex(1), sed(1), sh(1)grep(1)