Grep regex to ignore sequence only if surrounded by fwd-slashes


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Grep regex to ignore sequence only if surrounded by fwd-slashes
# 1  
Old 12-01-2013
Grep regex to ignore sequence only if surrounded by fwd-slashes

Hi,

I've got a regex match to perform in a Bash script and can't quite get it right.

Basically I want to match all IP address like sequences in a file which may or may not contain an IP address but with the extra qualification of ignoring any IP-like sequence which begins and ends with a forward slash - to weed out software version numbers if a line like the following is in the HTML.
Code:
<script... src="http://web.com/libs/1.6.1.0/file.js"></script>

Here are the matching requirements:
Code:
a) "11.11.11.11"               --> This should match "11.11.11.11"
b) "lots11.11.11.11lots"       --> This should match "11.11.11.11"
c) "lots1111.11.11.1111lots"   --> This should match "1111.11.11.1111" (see P.S. for why)
d) "lots/11.11.11.11/lots"     --> This should NOT match anything

Also:

e) "lots11.11.11.11lots then lots http://url/1.6.1.0/file.js"
which needs to match just the "11.11.11.11" part.

I have b), c), d), and e) working fine by using 2 piped grep expressions:
Code:
extractIpExpr1="[^/][0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+[^/]"
extractIpExpr2="[0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+"

# Pipe to sort and then to uniq so that all duplicates are removed.
# Note: -E is to use extended regex and -o is to print only the matching parts.

ipLikeAddressMatches=$(grep -Eo "$extractIpExpr1" < "$tempFileName" | \
                       grep -Eo "$extractIpExpr2" | sort -n | uniq)

The 1st call to grep matches an IP-like sequence with an extra char at the start and end as long as it's not '/' and then the 2nd call to grep gets rid of the extra char at the start and end.

The problem is with a) "11.11.11.11" - occasionally all that's there is an IP address with no character at all before or after it. So the not slash "[^/]" bits don't work cos there aren't any not-a-slash chars to match.

I've just spent ages trying to figure this out but can't. If it's better to use sed or awk instead of grep then that's not a problem.

Thanks all. Smilie

P.S. This is just to match IP-like number sequences, I'm deliberately not using quantifiers, i.e. {1,3}, in the first and last number sequence match of the expressions so that grep does not examine something like '1111.222.222.1111' and match just the '111.222.222.111' section of it, and in so doing cause something that is definitely not an IP address to get mistaken for one. I have a function which later examines the IP-like number sequences and makes sure the numbers are all in the range 0..255.

Last edited by gencon; 12-01-2013 at 02:44 PM..
# 2  
Old 12-01-2013
I can't see what's going wrong. Did you try alternation of [^/] with the begin-of-line (^)?
This User Gave Thanks to RudiC For This Post:
# 3  
Old 12-01-2013
As long as you don't have input lines that contain both an ip address and an ip-like address between slash characters, the following seems to do what you want:
Code:
skipIpExpr1="[/][0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+[/]"
extractIpExpr2="[0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+"

# Pipe to sort -u so duplicates are removed.
# Note: -E is to use EREs and -o is to print only the matching parts, and -v
# discards matching lines.

ipLikeAddressMatches=$(grep -Ev "$skipIpExpr1" "$tempFileName" | \
                       grep -Eo "$extractIpExpr2" | sort -u)

When given the 1st message in this thread as input, ipLikeAddressMatches is set to:
Code:
11.11.11.11
111.222.222.111
1111.11.11.1111
1111.222.222.1111

This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 12-01-2013
Hi,
Or, if your grep have -P option, you can reduct one regex:
Code:
$ cat file
a) "11.11.11.11"               --> This should match "11.11.11.11"
b) "lots12.11.11.11lots"       --> This should match "12.11.11.11"
c) "lots1311.11.11.1111lots"   --> This should match "1311.11.11.1111" (see P.S. for why)
d) "lots/14.11.11.11/lots"     --> This should NOT match anything

e) "lots15.11.11.11lots then lots http://url/1.6.1.0/file.js"
16.11.11.11a
a17.11.11.11
18.11.11.11

Code:
$ grep -Po "(^|[^/0-9])\K[0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+(?=([^/0-9]|$))" file
11.11.11.11
11.11.11.11
12.11.11.11
12.11.11.11
1311.11.11.1111
1311.11.11.1111
15.11.11.11
16.11.11.11
17.11.11.11
18.11.11.11

Regards.
This User Gave Thanks to disedorgue For This Post:
# 5  
Old 12-01-2013
Quote:
Originally Posted by Don Cragun
As long as you don't have input lines that contain both an ip address and an ip-like address between slash characters, the following seems to do what you want:
Thanks Don, that is a perfect solution. [No, they will never be on the same line.]

I should have been looking at the grep man page instead of scratching my head trying to work out a regex solution.

Thanks also for including "sort -u" didn't know about that and it saves a process. ["He who saves a process, saves the World", ummm, as a proverb it needs some work.]

Cheers. Smilie

---------- Post updated at 08:59 PM ---------- Previous update was at 08:56 PM ----------

Quote:
Originally Posted by disedorgue
Hi,
Or, if your grep have -P option, you can reduct one regex:
In theory I have the -P Perl regex available, but the "unimplemented features" factor has bitten me more than once so I avoid using that.

Thanks anyway.
# 6  
Old 12-01-2013
If you every did have a ip and ip-like addresses sed could just blank out the ip-like ones in this manor:

Code:
skipIpExpr1="[/][0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+[/]"
extractIpExpr2="[0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+"

# Pipe to sort -u so duplicates are removed.
# Note: -E is to use EREs and -o is to print only the matching parts.

ipLikeAddressMatches=$(sed -E "s:$skipIpExpr1::g" | \
                       grep -Eo "$extractIpExpr2" | sort -u)

These 2 Users Gave Thanks to Chubler_XL For This Post:
# 7  
Old 12-02-2013
Quote:
Originally Posted by Chubler_XL
If you every did have a ip and ip-like addresses [gencon insert: On One Line] sed could just blank out the ip-like ones in this manor:
Thanks. That is an ever so slightly more elegant solution and the one which I shall in fact use. Don will get over it eventually, I'm sure it's nothing that some intensive therapy won't cure. Smilie

Of course Don might point out that yours will not actually work at the moment due to the mysterious disappearance of any actual input.

Thanks all.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep and ignore list from file

cat /tmp/i.txt '(ORA-28001|ORA-00100|ORA-28001|ORA-20026|ORA-20025|ORA-02291|ORA-01458|ORA-01017|ORA-1017|ORA-28000|ORA-06512|ORA-06512|Domestic Phone|ENCRYPTION)' grep -ia 'ORA-\{5\}:' Rep* |grep -iavE `cat /tmp/i.txt` grep: Unmatched ( or \( Please tell me why am i getting that (6 Replies)
Discussion started by: jhonnyrip
6 Replies

2. Shell Programming and Scripting

Grep command to ignore line starting with hyphen

Hi, I want to read a file line by line and exclude the lines that are beginning with special characters. The below code is working fine except when the line starts with hyphen (-) in the file. for TEST in `cat $FILE | grep -E -v '#|/+' | awk '{FS=":"}NF > 0{print $1}'` do . . done How... (4 Replies)
Discussion started by: Srinraj Rao
4 Replies

3. Shell Programming and Scripting

Need sequence no in the grep output

Hi, How to achieve the displaying of sequence no while doing grep for an output. Ex., need the output like below with the serial no, but not the available line number in the file S.No Array Lun 1 AABC 7080 2 AABC 7081 3 AADD 8070 4 AADD 8071 5 ... (3 Replies)
Discussion started by: ksgnathan
3 Replies

4. Shell Programming and Scripting

Ignore escape sequence in sed

Friends, In the file i am having more then 100 lines like, File1 had the values like this: #Example East.server_01=EAST.SERVER_01 East.server_01=EAST.SERVER_01 West.server_01=WEST.SERVER_01 File2 had the values like this: #Example EAST.SERVER_01=http://yahoo.com... (3 Replies)
Discussion started by: jothi basu
3 Replies

5. Shell Programming and Scripting

regex - start with a word but ignore that word

Hi Guys. I guess I have a very basic query but stuck with it :( I have a file in which I want to extract particular content. The content is between standard format like : Verify stats A=0 B=12 C=34 TEST Failed Now I want to extract data between "Verify stats" & "TEST Failed" but do... (6 Replies)
Discussion started by: ratneshnagori
6 Replies

6. Shell Programming and Scripting

Grep but ignore first column

Hi, I need to perform a grep from a file, but ignore any results from the first column. For simplicity I have changed the actual data, but for arguments sake, I have a file that reads: MONACO Monaco ASMonaco MANUTD ManUtd ManchesterUnited NEWCAS NewcastleUnited NAC000 NAC ... (5 Replies)
Discussion started by: danhodges99
5 Replies

7. Shell Programming and Scripting

ignore fields to check in grep

Hi, I have a pipe delimited file. I am checking for junk characters ( non printable characters and unicode values). I am using the following code grep '' file.txt But i want to ignore the name fields. For example field2 is firstname so i want to ignore if the junk characters occur... (4 Replies)
Discussion started by: ashwin3086
4 Replies

8. UNIX for Dummies Questions & Answers

| help | unix | grep (GNU grep) 2.5.1 | advanced regex syntax

Hello, I'm working on unix with grep (GNU grep) 2.5.1. I'm going through some of the newer regex syntax using Regular Expression Reference - Advanced Syntax a guide. ls -aLl /bin | grep "\(x\)" Which works, just highlights 'x' where ever, when ever. I'm trying to to get (?:) to work but... (4 Replies)
Discussion started by: MykC
4 Replies

9. Fedora

Hosting issue regarding subdirectories and fwd Slashes

I admin two co-located servers. I built an app that creates subdirectories for users ie www.site.com/username. one server that works just fine when you hit that url, it sees the index within and does as it should. I moved the app to my other server running FEDORA 1 i686 standard, cPanel... (3 Replies)
Discussion started by: iecowboy
3 Replies

10. Shell Programming and Scripting

To grep in sequence

Hi, I have a log file containg records in sequence <CRMSUB:MSIN=2200380,BSNBC=TELEPHON-7553&TS21-7716553&TS22-7716553,NDC=70,MSCAT=ORDINSUB,SUBRES=ONAOFPLM,ACCSUB=BSS,NUMTYP=SINGLE; <ENTROPRSERV:MSIN=226380,OPRSERV=OCSI-PPSMOC-ACT-DACT&TCSI-PPSMTC-ACT-DACT&UCSI-USSD;... (17 Replies)
Discussion started by: helplineinc
17 Replies
Login or Register to Ask a Question