Sponsored Content
Top Forums Shell Programming and Scripting How does this sed expression to remove non-alpha characters work? Post 302836649 by Don Cragun on Wednesday 24th of July 2013 12:48:03 PM
Old 07-24-2013
Quote:
Originally Posted by bgnersoon2be#1
Hello!

I know that this expression gets rid of non-alphanumeric characters:
Code:
sed 's/[^a-zA-z0-9]//g'

and I understand that it is replacing them with nothing - hence the '//'-, but I don't understand how it's doing it.
It seems it's finding strings that begin with alphanumeric and replacing them with nothing! Obviously that would not give the required output so I'm a little confused and clearly missing something...

Thanks for your help!
The circumflex I marked in red in your code (and please use CODE tags) is not an anchor. When a circumflex is the first character in a matching expression (i.e., [expression specifying a set of characters to match]), it specifies a non-matching expression where all of the characters except those specified by the expression following the circumflex will be matched. So, in this case everything that is not a lowercase letter, not an uppercase letter, and not a digit is removed (or, as you said, replaced by nothing).
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl: How do I remove leading non alpha characters

Hi, Sorry for silly question, but I'm trying to write a perl script to operate a log file that is in following format: (4)ab=1234/(10)bc=abcdef9876/cd=0.... The number in the brackets is the lenghts of the field, "/" is the field separator. Brackets are not leading every field. What I'm... (9 Replies)
Discussion started by: Juha
9 Replies

2. Shell Programming and Scripting

Sed replace characters not equal to an expression

Hi all, Suppose I have a file with the contents below, and I only want to print words %S_ then | sort -u. ------------------------------ The %S_MSG that starts with '%.*s' is too long. Maximum length is %d. The %S_MSG name '%.*s' contains more than the maximum number of prefixes. The... (5 Replies)
Discussion started by: poldo
5 Replies

3. Shell Programming and Scripting

grep or sed. How to remove certain characters

Here is my problem. I have a list of phone numbers that I want to use only the last 4 digits as PINs for something I am working on. I have all the numbers in a file but now I want to be removed all items EXCEPT the last 4 digits. I have seen sed commands and some grep commands but I am... (10 Replies)
Discussion started by: Sucio
10 Replies

4. Shell Programming and Scripting

non alpha characters in sed + making it fast?

hello, I'm trying to write the fastest sed command possible (large files will be processed) to replace RICH with NICK in a file which looks like this (below) if the occurance of RICH is uppercase, replace with uppercase if it's lowercase, replace with lowercase SOMTHING_RICH_SOMTHING <- replace... (10 Replies)
Discussion started by: rich@ardz
10 Replies

5. UNIX for Dummies Questions & Answers

sed command to remove characters help!

I am trying to analyse a large file of sequencing data, example of first 10 lines below, @HWUSI-EAS656_0044_FC:7:1:2447:1039#GCAATT/1 GNCTATGGCTTGCCGGGCTCAGGGAAGACAATCATAGCCATGAAAATCATGGAAAAGATCAGAAAAACATTTCAA +HWUSI-EAS656_0044_FC:7:1:2447:1039#GCAATT/1... (1 Reply)
Discussion started by: Adeleh
1 Replies

6. Shell Programming and Scripting

Sed or trim to remove non alphanumeric and alpha characters?

Hi All, I am new to Unix and trying to run some scripting on a linux box. I am trying to remove the non alphanumeric characters and alpha characters from the following line. <measResults>883250 869.898 86432.4 809875.22 804609 60023 59715 </measResults> Desired output is: 883250... (6 Replies)
Discussion started by: jackma
6 Replies

7. Shell Programming and Scripting

Remove the Characters '[' and ']' with Sed

Hi, I am new to Sed and would like to know if it is possible to remove the characters . I have a couple of files with a keyword and would like to remove the substring. I am Using sed s/// but Its not working Thanks for your Support Andrew Borg (2 Replies)
Discussion started by: andrewborg
2 Replies

8. UNIX for Dummies Questions & Answers

sed remove expression from output I'm watching

I'm watching a particular expression as it is appended in a line to a file: tail -f LOG | sed -n /"$@"/p So whatever value I pass into this script will tail -f the file, but only show me lines that contain the value: lgwatch expression However some of the output contains a #20 control... (8 Replies)
Discussion started by: MaindotC
8 Replies

9. Shell Programming and Scripting

perl regular expression to remove the special characters

I had a string in perl script as below. Tue Augáá7 03:54:12 2012 Now I need to replace the special character with space. After removing the special chaacters Tue Aug 7 03:54:12 2012 Could anyone please help me here for writing the regular expression? Thanks in advance.. Regards, GS (1 Reply)
Discussion started by: giridhar276
1 Replies

10. Shell Programming and Scripting

Sed: -e expression #1, char 2: extra characters after command

Greetings.. getting the error while execution of the script, correct where i am missing #!/bin/bash DATE=`date +%Y-%m-%d:::%H:%M` HOSTNAME=`hostname` TXT="/log/temp.txt" LOGPATH="/log1/commanlogs/" IP=`/sbin/ifconfig | grep -i inet| head -n1| awk '{print $2}'| awk -F : '{print $2}'`... (7 Replies)
Discussion started by: manju98458
7 Replies
regexpr(3GEN)					     String Pattern-Matching Library Functions					     regexpr(3GEN)

NAME
regexpr, compile, step, advance - regular expression compile and match routines SYNOPSIS
cc [flag...] [file...] -lgen [library...] #include <regexpr.h> char *compile(char *instring, char *expbuf, const char *endbuf); int step(const char *string, const char *expbuf); int advance(const char *string, const char *expbuf); extern char *loc1, loc2, locs; extern int nbra, regerrno, reglength; extern char *braslist[], *braelist[]; DESCRIPTION
These routines are used to compile regular expressions and match the compiled expressions against lines. The regular expressions compiled are in the form used by ed(1). The parameter instring is a null-terminated string representing the regular expression. The parameter expbuf points to the place where the compiled regular expression is to be placed. If expbuf is NULL, compile() uses mal- loc(3C) to allocate the space for the compiled regular expression. If an error occurs, this space is freed. It is the user's responsibil- ity to free unneeded space after the compiled regular expression is no longer needed. The parameter endbuf is one more than the highest address where the compiled regular expression may be placed. This argument is ignored if expbuf is NULL. If the compiled expression cannot fit in (endbuf-expbuf) bytes, compile() returns NULL and regerrno (see below) is set to 50. The parameter string is a pointer to a string of characters to be checked for a match. This string should be null-terminated. The parameter expbuf is the compiled regular expression obtained by a call of the function compile(). The function step() returns non-zero if the given string matches the regular expression, and zero if the expressions do not match. If there is a match, two external character pointers are set as a side effect to the call to step(). The variables set in step() are loc1 and loc2. loc1 is a pointer to the first character that matched the regular expression. The variable loc2 points to the character after the last character that matches the regular expression. Thus if the regular expression matches the entire line, loc1 points to the first char- acter of string and loc2 points to the null at the end of string. The purpose of step() is to step through the string argument until a match is found or until the end of string is reached. If the regular expression begins with ^, step() tries to match the regular expression at the beginning of the string only. The advance() function is similar to step(); but, it only sets the variable loc2 and always restricts matches to the beginning of the string. If one is looking for successive matches in the same string of characters, locs should be set equal to loc2, and step() should be called with string equal to loc2. locs is used by commands like ed and sed so that global substitutions like s/y*//g do not loop forever, and is NULL by default. The external variable nbra is used to determine the number of subexpressions in the compiled regular expression. braslist and braelist are arrays of character pointers that point to the start and end of the nbra subexpressions in the matched string. For example, after calling step() or advance() with string sabcdefg and regular expression (abcdef), braslist[0] will point at a and braelist[0] will point at g. These arrays are used by commands like ed and sed for substitute replacement patterns that contain the notation for subexpressions. Note that it is not necessary to use the external variables regerrno, nbra, loc1, loc2 locs, braelist, and braslist if one is only checking whether or not a string matches a regular expression. EXAMPLES
Example 1: The following is similar to the regular expression code from grep: #include<regexpr.h> . . . if(compile(*argv, (char *)0, (char *)0) == (char *)0) regerr(regerrno); . . . if (step(linebuf, expbuf)) succeed(); RETURN VALUES
If compile() succeeds, it returns a non-NULL pointer whose value depends on expbuf. If expbuf is non-NULL, compile() returns a pointer to the byte after the last byte in the compiled regular expression. The length of the compiled regular expression is stored in reglength. Otherwise, compile() returns a pointer to the space allocated by malloc(3C). The functions step() and advance() return non-zero if the given string matches the regular expression, and zero if the expressions do not match. ERRORS
If an error is detected when compiling the regular expression, a NULL pointer is returned from compile() and regerrno is set to one of the non-zero error numbers indicated below: ERROR MEANING 11 Range endpoint too large. 16 Bad Number. 25 "digit" out or range. 36 Illegal or missing delimiter. 41 No remembered string search. 42 (~) imbalance. 43 Too many (. 44 More than 2 numbers given in [~}. 45 } expected after . 46 First number exceeds second in {~}. 49 [] imbalance. 50 Regular expression overflow. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |MT-Level |MT-Safe | +-----------------------------+-----------------------------+ SEE ALSO
ed(1), grep(1), sed(1), malloc(3C), attributes(5), regexp(5) NOTES
When compiling multi-threaded applications, the _REENTRANT flag must be defined on the compile line. This flag should only be used in multi-threaded applications. SunOS 5.10 29 Dec 1996 regexpr(3GEN)
All times are GMT -4. The time now is 11:41 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy