non alpha characters in sed + making it fast?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting non alpha characters in sed + making it fast?
# 1  
Old 10-01-2010
non alpha characters in sed + making it fast?

hello, I'm trying to write the fastest sed command possible (large files will be processed) to replace RICH with NICK in a file which looks like this (below) if the occurance of RICH is uppercase, replace with uppercase if it's lowercase, replace with lowercase

SOMTHING_RICH_SOMTHING <- replace here
hellorichwhatareyoudoing <- do not replace here, forms part of a word
SOMTHING.RICH.SOMTHING <- replace here
somthing_rich_somthing <- replace here
HELLO-RICH-HELLO <- replace here

I've used the find operator first to speed up execution how do I alter the sed to achieve the above requirements?

Code:
sed '/RICH/ s/RICH/NICK/g' filename   # executes more quickly

# 2  
Old 10-01-2010
hi,
try this:
Code:
s/\(.*[_.-]\)\([Rr][iI][Cc][Hh]\)\([_.-].*\)/\1NICK\3/g

i got the following output for your input:
Code:
SOMTHING_NICK_SOMTHING 
hellorichwhatareyoudoing 
SOMTHING.NICK.SOMTHING 
somthing_NICK_somthing 
HELLO-NICK-HELLO

use this simple one:
Code:
s/\(.*[_.-]\)\(rich\)\([_.-].*\)/\1NICK\3/ig

i - ignore case sensitive (forgot this Smilie )

with the first approach below words will be considered:
Code:
RIch
RicH etc


Last edited by dragon.1431; 10-01-2010 at 01:27 PM.. Reason: added more accurate solution
# 3  
Old 10-01-2010
Quote:
Originally Posted by dragon.1431
hi,
try this:
Code:
s/\(.*[_.-]\)\([Rr][iI][Cc][Hh]\)\([_.-].*\)/\1NICK\3/g

i got the following output for your input:
Code:
SOMTHING_NICK_SOMTHING 
hellorichwhatareyoudoing 
SOMTHING.NICK.SOMTHING 
somthing_NICK_somthing 
HELLO-NICK-HELLO

not quite what I was expecting Smilie, My fault probably for not being clear enough with the requirement:

Code:
richard@opensolaris:~/share/cleaner$ echo "RICH_SOMTHING" | sed 's/\(.*[_.-]\)\([Rr][iI][Cc][Hh]\)\([_.-].*\)/\1NICK\3/g'
RICH_SOMTHING

richard@opensolaris:~/share/cleaner$ echo "SOMTHING@RICH" | sed 's/\(.*[_.-]\)\([Rr][iI][Cc][Hh]\)\([_.-].*\)/\1NICK\3/g'
SOMTHING@RICH



---------- Post updated at 05:48 PM ---------- Previous update was at 05:29 PM ----------

basically it's a case of "if there are any letters (with no spaces) directly to the left or right of the string I'm replacing, don't replace it" so the following must be left alone:

HELLORICHHELLO
HELLORICH
RICHHELLO

anything other than the above is fair game Smilie with the added condition that 'if you're doing a replace and the string being replaced is lower case, replace with lowercase, if uppercase, replace with uppercase Smilie

---------- Post updated at 06:11 PM ---------- Previous update was at 05:48 PM ----------

Nearly got it...

Code:
richard@opensolaris:~/share/cleaner$ echo "@rich" | sed 's/\(.*[_.-.@]\)\(rich\)\([_.-.@].*\)/\1NICK\3/ig'
@rich
richard@opensolaris:~/share/cleaner$ echo "@RICH@" | sed 's/\(.*[_.-.@]\)\(rich\)\([_.-.@].*\)/\1NICK\3/ig'
@NICK@


Last edited by rich@ardz; 10-01-2010 at 01:54 PM..
# 4  
Old 10-01-2010
hi,
give your exact input and output.
please note that i posted code as per your first post.
# 5  
Old 10-03-2010
input:
Code:
hellorichhello
HELLORICHHELLO
SOMTHING-RICH-SOMTHING
SOMTHING@RICH
@RICH
RICH@
-RICH
RICH-
SOMTHING_RICH_SOMTHING
_RICH
RICH_
SOMTHING RICH SOMTHING
somthing rich somthing

output:
Code:
hellorichhello
HELLORICHHELLO
SOMTHING-NICK-SOMTHING
SOMTHING@NICK
@NICK
NICK@
-NICK
NICK-
SOMTHING_NICK_SOMTHING
_NICK
NICK_
SOMTHING NICK SOMTHING
somthing NICK somthing

basically if the string to be replaced has an alphabetic character to the left or right (or left AND right) of it, don't replace it - if it has anything other than an alphabetic char (inc. spaces) to the left or right of it (or left AND RIGHT), replace it...

Last edited by Scott; 10-03-2010 at 07:23 AM.. Reason: Code tags
# 6  
Old 10-03-2010
Code:
sed 's/\(^\|[^[:alnum:]]\)\(rich\|RICH\|Rich\)\([^[:alnum:]]\|$\)/\1NICK\3/g' infile

This User Gave Thanks to Scrutinizer For This Post:
# 7  
Old 10-05-2010
Quote:
Originally Posted by Scrutinizer
Code:
sed 's/\(^\|[^[:alnum:]]\)\(rich\|RICH\|Rich\)\([^[:alnum:]]\|$\)/\1NICK\3/g' infile

Smilie I won't even pretend i understand how this sed works! but it does! is it possible to replace uppercase with uppercase and lower with lower or would i have to run 2 different seds... cheers Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Outputting characters after a given string and reporting the characters in the row below --sed

I have this fastq file: @M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86 GGGGGGGGGGGGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCA +test-1 GGGGGGGGGGGGGGGGGCCGGGGGFF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8... (10 Replies)
Discussion started by: Xterra
10 Replies

2. Shell Programming and Scripting

awk to print column number while ignoring alpha characters

I have the following script that will print column 4 ("25") when column 1 contains "123". However, I need to ignore the alpha characters that are contained in the input file. If I were to ignore the characters my output would be column 3. What is the best way to print my column of interest... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

3. Shell Programming and Scripting

Find/replace alpha characters in string

Hi, I would like to find a 3-letter character series in a string/variable and replace it with x's. An example set of strings is: 563MS333_101_afp_400-100_screening 563MS333_104-525_rjk_525_screening 563MS333_110_dlj_500-100_w24 563MS333_888-100_mmm_424_screening The only constants... (5 Replies)
Discussion started by: goodbenito
5 Replies

4. Shell Programming and Scripting

How does this sed expression to remove non-alpha characters work?

Hello! I know that this expression gets rid of non-alphanumeric characters: sed 's///g' and I understand that it is replacing them with nothing - hence the '//'-, but I don't understand how it's doing it. It seems it's finding strings that begin with alphanumeric and replacing them with... (2 Replies)
Discussion started by: bgnersoon2be#1
2 Replies

5. Shell Programming and Scripting

sed replacing specific characters and control characters by escaping

sed -e "s// /g" old.txt > new.txt While I do know some control characters need to be escaped, can normal characters also be escaped and still work the same way? Basically I do not know all control characters that have a special meaning, for example, ?, ., % have a meaning and have to be escaped... (11 Replies)
Discussion started by: ijustneeda
11 Replies

6. Shell Programming and Scripting

Sed or trim to remove non alphanumeric and alpha characters?

Hi All, I am new to Unix and trying to run some scripting on a linux box. I am trying to remove the non alphanumeric characters and alpha characters from the following line. <measResults>883250 869.898 86432.4 809875.22 804609 60023 59715 </measResults> Desired output is: 883250... (6 Replies)
Discussion started by: jackma
6 Replies

7. Shell Programming and Scripting

Making Some Characters in file BOLD

Hi All, I want to make some characters to be bold in a file. I have a file e.g aa.log which contains rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr gfgfgdaerqrqwrqerqwrwqwrqrqwrqr qqwerqwrqwrqwrqwrqwrqwrqwrq qwrqwrqwrqwrqwrqwrqwrqwrqwr File is too large to view Last line... (2 Replies)
Discussion started by: rajeshorpu
2 Replies

8. Shell Programming and Scripting

Perl: How do I remove leading non alpha characters

Hi, Sorry for silly question, but I'm trying to write a perl script to operate a log file that is in following format: (4)ab=1234/(10)bc=abcdef9876/cd=0.... The number in the brackets is the lenghts of the field, "/" is the field separator. Brackets are not leading every field. What I'm... (9 Replies)
Discussion started by: Juha
9 Replies

9. Shell Programming and Scripting

how to set a variable to accept alpha-numeric characters?

I am working on a shell program that needs to accept alpha-numeric input (i.e., P00375); when I use a simple 'read' statement to read in the input (i.e., read LOG), I receive the message "p00375: bad number". How must I adjust my input statement to accept alpha-numerics? Thanks! Brent (3 Replies)
Discussion started by: bcaunt
3 Replies
Login or Register to Ask a Question