SED With Regex to extract Email Address


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting SED With Regex to extract Email Address
# 1  
Old 04-04-2012
SED With Regex to extract Email Address

Hi Folks,
In my program, I have a variable which consists of multiple lines. i need to use each line as an input. My intention is to extract the email address of the user in each line and use it to process further.

The email address could be anywhere in the whole line. But there will be only one and I need to extract it. The most complex possible format of the email ID is:
first-name.last-name@xyz.com

In other words, first and last names are separated by a period (.) and the first and last names may have a hyphen (-). The domain name (@xyz.com) is fixed and only has letters, no numbers.

We have a Solaris OS and I am using Korn shell. I read some examples on SED command and have made a few attempts to use it. But every time, no matter what regex I use, I get the entire input as the output. I must be doing something very basic thing wrong. Could you please suggest?

Code:
$ echo '92' | sed '/[0-9]+/p'
92

$ echo 'email92' | sed '/[0-9]+/p'
email92

$ echo "abc.xyz@comp.com" | sed '/\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)\(\.\)\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)@comp.com/p'
abc.xyz@comp.com

$ echo "Email address is abc.xyz@comp.com" | sed '/\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)\(\.\)\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)@comp.com/p'
Email address is abc.xyz@comp.com

Note that below is the regex that I arrived at to extract the email address.
Code:
/\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)\(\.\)\([A-Za-z0-9]+\)\(-*\)\([A-Za-z0-9]*\)@comp.com

Any help is greatly appreciated.

Moderator's Comments:
Mod Comment Please use code tags. Video tutorial on how to use them

Last edited by Scrutinizer; 04-04-2012 at 05:05 PM..
# 2  
Old 04-04-2012
Just feel free to adjust with some additionnal [A-Z] range or whatever but here is an idea :

(The trick is to add a space at the beginning of a line so that if the line contains only the mail address, the next regular expression used to reach the mail adresse preceeded by a space would also match.)

Code:
echo "Email address is abc.xyz@comp.com" | sed 's/.*/ &/;s/.* \([^ @]*@[^ @]*.com\).*/\1/'

(Just remember that if you want the hyphen to be taken as litteral in a list it should be set at the end of it

quick example : [:#@-]


also consider :
Code:
... | sed 's/.*/ &/;s/.* \([A-Za-z0-9.-]*@[A-Za-z0-9.-]*.com\).*/\1/'

but this would also match an adress with empty user and/or domain like : @.com

so you want it to be at least one char (and you don't want it to be a an hyphen or a dot you can for example:
Code:
... | sed 's/.*/ &/;s/.* \([A-Za-z0-9][A-Za-z0-9.-]*@[A-Za-z0-9][A-Za-z0-9.-]*.com\).*/\1/'

Just adjust it to your requirements

If you know that the mail address always appear at the end of line and after a space you can simply :

Code:
echo "Email address is abc.xyz@comp.com" | sed 's/.* //g'

---------- Post updated at 08:25 PM ---------- Previous update was at 07:55 PM ----------

Code:
# cat tst
first-name1.last-name@xyz.com
bla bla first-name2.last-name@xyz.com blebla blealdsfl
first-name3.last-name@xyz.com is a valid mail address.com
This line with valid adress.com: first-name4.last-name@xyz.com next nested address.com
last valid mail address.com first-name5.last-name@xyz.com
# sed 's/.*/ &/;s/.* \([^ @]*@[^ @]*.com\).*/\1/' tst
first-name1.last-name@xyz.com
first-name2.last-name@xyz.com
first-name3.last-name@xyz.com
first-name4.last-name@xyz.com
first-name5.last-name@xyz.com

(Ok i didn't handle the case where the separator is a tabulation instead of a space but it's easy to tweak the code or to tr -s '[:blank:]' ' ' <output before applying the sed statement

Last edited by ctsgnb; 04-04-2012 at 03:32 PM..
# 3  
Old 04-04-2012
For the sake of another way to do it, I was interested in this question and did as little searching. I found a perl script at this site: Extract email addresses from big file. - Unix / Linux / BSD

Now I do not know perl but it seems to work.
Code:
$ cat x
#!/bin/ksh
echo "Email address is ab-c.x-yz@comp.com" |perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}'

$ ./x
ab-c.x-yz@comp.com
$

# 4  
Old 04-04-2012
I would be inclined to prefer [^ \t@] to \w for the bit before the @, since a surprising range of character are allowed in the local part of email addresses:
Email address - Wikipedia, the free encyclopedia
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 04-04-2012
Search 'RFC2822 regex' - the regular expression for the official standard for addresses is, um, long Smilie.
Code:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|”(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*”)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

This User Gave Thanks to CarloM For This Post:
# 6  
Old 04-05-2012
Jeepers! Looks like a cat ran across the keyboard a few times!
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Sed: -e expression #1, char 20: unterminated address regex

I am trying to add word in last of particular line. the same command syntex is running on prompt. but in bash script give error."sed: -e expression #1, char 20: unterminated address regex" Please help. for i in `cat servername`; do ssh -q -t root@$i sed -i '/simple_allow_groups =/s/$/,... (4 Replies)
Discussion started by: yash_message
4 Replies

2. Shell Programming and Scripting

Sed: -e expression #1, char 16: unterminated address regex

I am trying to grep for a particular text (Do action on cell BL330) in a text file(sample.gz) which is searched in the content filtered by date+timestamp (2016-09-14 01:09:56,796 to 2016-09-15 04:10:29,719) on a remote machine and finally write the output into a output file on a local machine. ... (23 Replies)
Discussion started by: rbadveti
23 Replies

3. Forum Support Area for Unregistered Users & Account Problems

Cant use certain email address

I tried to re-register using my new email address which is <firstname>@<surname>.me But it never sent out the email confirmation. I had to hit the back button and use my gmail address instead and it came through instantly. Is there a problem with using .me addresses? (1 Reply)
Discussion started by: frustin
1 Replies

4. Solaris

Find and sed for an email address in Solaris 10

in Solaris 10 I am able to run: find . -type f -name "copy*" exec grep example.com {} \; and I get results. but when I try to find and sed: find . -type f -name "copy*" exec sed -e 's/user@example\.com/user2@example\.com' {} \; the command executes correctly but doesn't change... (6 Replies)
Discussion started by: os2mac
6 Replies

5. Shell Programming and Scripting

Regex: Extract substring between 2 separator

Hi Input: aa-bb-cc-dd.ee.ff.gg Output: dd I want to get the word after the last '-' until the first dot I have tried with regex lookbehind and lookahead like this: (?<=-).*(?=\.) but his returns too much bb-cc-dd.ee.ff (7 Replies)
Discussion started by: chitech
7 Replies

6. Shell Programming and Scripting

Converting perl regex to sed regex

I am having trouble parsing rpm filenames in a shell script.. I found a snippet of perl code that will perform the task but I really don't have time to rewrite the entire script in perl. I cannot for the life of me convert this code into something sed-friendly: if ($rpm =~ /(*)-(*)-(*)\.(.*)/)... (1 Reply)
Discussion started by: suntzu
1 Replies

7. Shell Programming and Scripting

Using SED command in a shell script: Unterminated address regex

Hi All, I am trying to use a sed command in a shell script in order to delete some lines in a file and I got the following error message. I don't understand why it is not working 'cause I have tried with simple quotes, then with double-quotes, and it is not working. sed: -e expression #1,... (7 Replies)
Discussion started by: Alpha3363
7 Replies

8. UNIX for Advanced & Expert Users

Regex to match IP address

What do you think of this regex to match IP address? I have been reading up on regex and have seen some really long ones for IP. Would this fail in any scenarios? (+\.){3}* (5 Replies)
Discussion started by: glev2005
5 Replies

9. UNIX for Dummies Questions & Answers

Send email where # is in the email address - Using Unix

Hi All, How do I send an email using malix where email address contains a #. I have a email address like this : #test@test.com I want to send email like malix -s "TEST" #test@test.com < SOMEFILE I tried \# but doesn't work. Please let me know how we can achieve this? I am in... (1 Reply)
Discussion started by: jingi1234
1 Replies
Login or Register to Ask a Question