Grep regex to ignore sequence only if surrounded by fwd-slashes

12-01-2013

Registered User

51, 0

Join Date: Mar 2010

Last Activity: 16 December 2013, 11:39 AM EST

Posts: 51

Thanks Given: 28

Thanked 0 Times in 0 Posts

Grep regex to ignore sequence only if surrounded by fwd-slashes

Hi,

I've got a regex match to perform in a Bash script and can't quite get it right.

Basically I want to match all IP address like sequences in a file which may or may not contain an IP address but with the extra qualification of ignoring any IP-like sequence which begins and ends with a forward slash - to weed out software version numbers if a line like the following is in the HTML.

Code:

<script... src="http://web.com/libs/1.6.1.0/file.js"></script>

Here are the matching requirements:

Code:

a) "11.11.11.11"               --> This should match "11.11.11.11"
b) "lots11.11.11.11lots"       --> This should match "11.11.11.11"
c) "lots1111.11.11.1111lots"   --> This should match "1111.11.11.1111" (see P.S. for why)
d) "lots/11.11.11.11/lots"     --> This should NOT match anything

Also:

e) "lots11.11.11.11lots then lots http://url/1.6.1.0/file.js"
which needs to match just the "11.11.11.11" part.

I have b), c), d), and e) working fine by using 2 piped grep expressions:

Code:

extractIpExpr1="[^/][0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+[^/]"
extractIpExpr2="[0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+"

# Pipe to sort and then to uniq so that all duplicates are removed.
# Note: -E is to use extended regex and -o is to print only the matching parts.

ipLikeAddressMatches=$(grep -Eo "$extractIpExpr1" < "$tempFileName" | \
                       grep -Eo "$extractIpExpr2" | sort -n | uniq)

The 1st call to grep matches an IP-like sequence with an extra char at the start and end as long as it's not '/' and then the 2nd call to grep gets rid of the extra char at the start and end.

The problem is with a) "11.11.11.11" - occasionally all that's there is an IP address with no character at all before or after it. So the not slash "[^/]" bits don't work cos there aren't any not-a-slash chars to match.

I've just spent ages trying to figure this out but can't. If it's better to use sed or awk instead of grep then that's not a problem.

Thanks all.

P.S. This is just to match IP-like number sequences, I'm deliberately not using quantifiers, i.e. {1,3}, in the first and last number sequence match of the expressions so that grep does not examine something like '1111.222.222.1111' and match just the '111.222.222.111' section of it, and in so doing cause something that is definitely not an IP address to get mistaken for one. I have a function which later examines the IP-like number sequences and makes sure the numbers are all in the range 0..255.

Last edited by gencon; 12-01-2013 at 02:44 PM..

gencon

View Public Profile for gencon

Find all posts by gencon

12-01-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

I can't see what's going wrong. Did you try alternation of [^/] with the begin-of-line (^)?

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-01-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

As long as you don't have input lines that contain both an ip address and an ip-like address between slash characters, the following seems to do what you want:

Code:

skipIpExpr1="[/][0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+[/]"
extractIpExpr2="[0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+"

# Pipe to sort -u so duplicates are removed.
# Note: -E is to use EREs and -o is to print only the matching parts, and -v
# discards matching lines.

ipLikeAddressMatches=$(grep -Ev "$skipIpExpr1" "$tempFileName" | \
                       grep -Eo "$extractIpExpr2" | sort -u)

When given the 1st message in this thread as input, ipLikeAddressMatches is set to:

Code:

11.11.11.11
111.222.222.111
1111.11.11.1111
1111.222.222.1111

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

12-01-2013

Registered User

503, 195

Join Date: Sep 2013

Last Activity: 22 January 2021, 1:52 PM EST

Location: France

Posts: 503

Thanks Given: 43

Thanked 195 Times in 176 Posts

Hi,
Or, if your grep have -P option, you can reduct one regex:

Code:

$ cat file
a) "11.11.11.11"               --> This should match "11.11.11.11"
b) "lots12.11.11.11lots"       --> This should match "12.11.11.11"
c) "lots1311.11.11.1111lots"   --> This should match "1311.11.11.1111" (see P.S. for why)
d) "lots/14.11.11.11/lots"     --> This should NOT match anything

e) "lots15.11.11.11lots then lots http://url/1.6.1.0/file.js"
16.11.11.11a
a17.11.11.11
18.11.11.11

Code:

$ grep -Po "(^|[^/0-9])\K[0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+(?=([^/0-9]|$))" file
11.11.11.11
11.11.11.11
12.11.11.11
12.11.11.11
1311.11.11.1111
1311.11.11.1111
15.11.11.11
16.11.11.11
17.11.11.11
18.11.11.11

Regards.

This User Gave Thanks to disedorgue For This Post:

disedorgue

View Public Profile for disedorgue

Find all posts by disedorgue

12-01-2013

Registered User

51, 0

Join Date: Mar 2010

Last Activity: 16 December 2013, 11:39 AM EST

Posts: 51

Thanks Given: 28

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Don Cragun

As long as you don't have input lines that contain both an ip address and an ip-like address between slash characters, the following seems to do what you want:

Thanks Don, that is a perfect solution. [No, they will never be on the same line.]

I should have been looking at the grep man page instead of scratching my head trying to work out a regex solution.

Thanks also for including "sort -u" didn't know about that and it saves a process. ["He who saves a process, saves the World", ummm, as a proverb it needs some work.]

Cheers.

---------- Post updated at 08:59 PM ---------- Previous update was at 08:56 PM ----------

Quote:

Originally Posted by disedorgue

Hi,
Or, if your grep have -P option, you can reduct one regex:

In theory I have the -P Perl regex available, but the "unimplemented features" factor has bitten me more than once so I avoid using that.

Thanks anyway.

gencon

View Public Profile for gencon

Find all posts by gencon

12-01-2013

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

If you every did have a ip and ip-like addresses sed could just blank out the ip-like ones in this manor:

Code:

skipIpExpr1="[/][0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+[/]"
extractIpExpr2="[0-9]+\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+"

# Pipe to sort -u so duplicates are removed.
# Note: -E is to use EREs and -o is to print only the matching parts.

ipLikeAddressMatches=$(sed -E "s:$skipIpExpr1::g" | \
                       grep -Eo "$extractIpExpr2" | sort -u)

These 2 Users Gave Thanks to Chubler_XL For This Post:

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

12-02-2013

Registered User

51, 0

Join Date: Mar 2010

Last Activity: 16 December 2013, 11:39 AM EST

Posts: 51

Thanks Given: 28

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Chubler_XL

If you every did have a ip and ip-like addresses [gencon insert: On One Line] sed could just blank out the ip-like ones in this manor:

Thanks. That is an ever so slightly more elegant solution and the one which I shall in fact use. Don will get over it eventually, I'm sure it's nothing that some intensive therapy won't cure.

Of course Don might point out that yours will not actually work at the moment due to the mysterious disappearance of any actual input.

Thanks all.

gencon

View Public Profile for gencon

Find all posts by gencon

Shell Programming and Scripting

Grep regex to ignore sequence only if surrounded by fwd-slashes

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep and ignore list from file

Discussion started by: jhonnyrip

2. Shell Programming and Scripting

Grep command to ignore line starting with hyphen

Discussion started by: Srinraj Rao

3. Shell Programming and Scripting

Need sequence no in the grep output

Discussion started by: ksgnathan

4. Shell Programming and Scripting

Ignore escape sequence in sed

Discussion started by: jothi basu

5. Shell Programming and Scripting

regex - start with a word but ignore that word

Discussion started by: ratneshnagori

6. Shell Programming and Scripting

Grep but ignore first column

Discussion started by: danhodges99

7. Shell Programming and Scripting

ignore fields to check in grep

Discussion started by: ashwin3086

8. UNIX for Dummies Questions & Answers

| help | unix | grep (GNU grep) 2.5.1 | advanced regex syntax

Discussion started by: MykC

9. Fedora

Hosting issue regarding subdirectories and fwd Slashes

Discussion started by: iecowboy

10. Shell Programming and Scripting

To grep in sequence

Discussion started by: helplineinc