Regex for plucking out IPs and CIDRs from text file?


 
Thread Tools Search this Thread
Operating Systems Linux Regex for plucking out IPs and CIDRs from text file?
# 1  
Old 07-24-2010
Regex for plucking out IPs and CIDRs from text file?

Hello to the unix.com community.

I have a mess of text. What I would like to do I pluck out IP addresses and CIDR notations only.

I thought I would try something like this

Code:
/usr/bin/grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\\[0-9]' /Path/To/File

But there are a few problems wit this.


This regex will not print out CIDRs. This is because it's not designed to as I have no idea how to create a regex to match CIDR Notations.


Also this regex will print information I don't want printed. For example. Somewhere in the mess of text there's two strings I wanted printed when searching for IPs and CIDR notations. Lets say those two strings are "1.1.1.1/8 and 2.2.2.2" When running the above command, the output should look like this
Code:
"
2.2.2.2
"

However, the output is
Code:
"
1.1.1.1
2.2.2.2
"

What regex would I need to print out only IPs and CIDRs?
The output should look like this
Code:
"
1.1.1.1/8
2.2.2.2



---------- Post updated 07-24-10 at 10:29 AM ---------- Previous update was 07-23-10 at 05:16 PM ----------

Here's where I'm at now

Code:
/usr/bin/grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\/[0-9]\{1,2\}\|[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}'

This matches valid IPs and CIDRs such as
Code:
1.1.1.1/8
2.2.2.2

However this also match invalid IPs and CIDRs such as
Code:
1.1.1.1/76
2.2.2.999


Last edited by TroubleNow345; 07-24-2010 at 11:25 AM.. Reason: Code tags, please!
# 2  
Old 07-24-2010
Could you please post a small representative sample of your input file and an example of the desired output?
# 3  
Old 07-24-2010
Sample Input file.
Code:
hptt://113.11.194.174
hptt://114.207.112.169
hptt://1140.co
hptt://116.125.127.169
hptt://116.127.121.27:80
hptt://116.212.119.161
Reported URLs,IP Address,AS Number,52wk High,52wk Low,% Daily Change
5696,94.23.94.22,16276,5751,0,+0.35
3993,72.14.204.191,15169,11475,0,-0.37
3882,174.120.120.151,21844,3919,0,-0.49
3454,61.4.190.206,9809,3681,26,0.00
2336,194.63.250.11,12996,2373,145,0.00
2308,64.34.175.158,30099,2477,1,-0.09
1869,68.178.232.99,26496,1898,226,-0.74
1358,68.178.232.100,26496,1370,462,+0.22
 Sat, 24 Jul 2010 15:55 GMT hphostsk (MysteryFCM) secure1.homelinux hptt://hosts-file/?s=secure1.homelinux64.76.3.2http/
85.17.162.0/24
200.74.244.0/2448tyurkhkdjhg
190.80.216.0/24foo
61.8.221.0/24
192.160.106.0/24
58.53.128.0/24
#Known RBN Nets and IPs

109.232.225.0/24
109.70.26.36
109.95.112.0/22
111.111.111.1
111.111.111.111
href='hptt://www.threatexpert/report.aspx?md5=6f1b0b67ca34184130c3bf533f595173'>6f1b0b67ca34184130c3bf533f595173</a></td> 
<tr class='class2'><td><nobr>2010-07-24</nobr></td><td>www.scarpxe</td><td><a class='modalBox' href='hptt://www.malwaregroup/ipaddresses/details/212.112.230.130'>212.112.230.130</a></t

##These IPs and CIDRs should NOT match regex.
1.1.1.1/99
999.999.999.999
5.6.7.999/8
0.0.0.257

Desire output should be
Code:
113.11.194.174
114.207.112.169
116.125.127.169
116.127.121.27
116.212.119.161
94.23.94.22
72.14.204.191
174.120.120.151
61.4.190.206
194.63.250.11
64.34.175.158
68.178.232.99
68.178.232.100
64.76.3.2
85.17.162.0/24
200.74.244.0/24
190.80.216.0/24
61.8.221.0/24
192.160.106.0/24
58.53.128.0/24
109.232.225.0/24
109.70.26.36
109.95.112.0/22
111.111.111.1
111.111.111.111
212.112.230.130
212.112.230.130

However output is the same as above with the following invalid IPs and CIDRs
Code:
1.1.1.1/99
999.999.999.999
5.6.7.999/8
0.0.0.257

# 4  
Old 07-26-2010
I think you'll need more than a regex for this ...
Using the module Net::CIDR from CPAN:

Code:
perl -MNet::CIDR -nle'
  Net::CIDR::cidrvalidate $1 and print $1 if
     (/((?:\d+\.){3}\d+(?:\/\d+)?)/);
     ' infile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for a text between two strings in a file using regex

Here is my sample file data: My requirement is to have a regex expression that is able to search for visible starting string "SSLInsecureRenegotiation Off" between strings "<VirtualHost " and "</VirtualHost>". In the sample data two lines should be matched. Below is what I tried but... (5 Replies)
Discussion started by: mohtashims
5 Replies

2. Shell Programming and Scripting

Help with shell script - filter txt file full of ips

Hello again gentlemen. I would like to make a shell script to 'optimize' a plain text full of IPs. Let's suppose to have this text file: 1.192.63.253-1.192.63.253 1.0.234.46/32 1.1.128.0/17 1.116.0.0/14 1.177.1.157-1.177.1.157 1.23.22.19 1.192.61.0-1.192.61.99 8.6.6.6 I want to... (2 Replies)
Discussion started by: accolito
2 Replies

3. UNIX for Dummies Questions & Answers

read regex from ID file, print regex and line below from source file

I have a file of protein sequences with headers (my source file). Based on a list of IDs (which are included in some of the headers), I'd like to print out only the specified sequences, with only the ID as header. In other words, I'd like to search source.txt for the terms in IDs.txt, and print... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

4. Shell Programming and Scripting

Need regex shell script to remove text from file

Hello I am trying to remove a line like <?php /*versio:2.05*/if (!defined('determinator')){ content goes here}?> Now i want to scan all... (6 Replies)
Discussion started by: devp
6 Replies

5. Shell Programming and Scripting

awk regex- include text

Hi I am trying to filter some data using awk. I have a statement- awk 'BEGIN { FS = "\n" ; RS = "" } { if ( $6 = "City: " ) { print "City: Unknown" } else { print $6 } }'` The $6 values are City: London City: Madrid City: City: Tokyo This expression seems to catch all the lines... (4 Replies)
Discussion started by: jamie_123
4 Replies

6. Emergency UNIX and Linux Support

Regular expression (regex) clean up text

Hi, Server - MEDIAWIKI - MYSQL - CENTOS 5 - PHP5 I have a database import of close to a million pages into my wiki, mediawiki site, the format that were left with is not pretty, and I need to find a way to clean this up and present it nicely... I think regex is the best option as I can... (1 Reply)
Discussion started by: lawstudent
1 Replies

7. Shell Programming and Scripting

Get Ips from a list file

Hi Everyone, I typed a command: awk '{ print $1}' $LOGFILE | sort | uniq -c | sort -nr > $DEST/a.txt And I got file a.txt which show 6 1.1.1.1 3 2.2.2.2 2 3.3.3.3 1 4.4.4.4 Just now, I want to get exact ips which has first column > 5 to a file b.txt. In this situation, the... (5 Replies)
Discussion started by: testcase
5 Replies

8. UNIX for Advanced & Expert Users

Regular expression / regex substition on Unicode text

I have a large file encoded in Unicode that I need to convert to CSV. In general, I know how to do this by regular expression substitutions using sed or Perl, but one problem I am having is that I need to put a quotation mark at the end of each line to protect the last field. The usual regex... (1 Reply)
Discussion started by: thomas.hedden
1 Replies

9. Shell Programming and Scripting

RegEx for text pattern

Hi, Please help me write regex for text pattern like CONTACT PEOPLE:first_name1.last_name1,first_name2.last_name2,first_name3.last_name3, ...so on Any advice is Okay! Thanks in advance. (6 Replies)
Discussion started by: rider29
6 Replies

10. Shell Programming and Scripting

Eleminating Duplicate IPs from a text file

Hey Guys I need to eleminate duplicate IP's from a text file using bash.Any suggestions.Appreciate your help guys. --CoolKid (4 Replies)
Discussion started by: coolkid
4 Replies
Login or Register to Ask a Question