Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Convert ip ranges to CIDR netblocks

Shell Programming and Scripting


Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 08-23-2013
ripat ripat is offline Forum Advisor  
Registered User
 
Join Date: Oct 2006
Last Activity: 3 January 2017, 1:58 AM EST
Location: Belgium
Posts: 544
Thanks: 5
Thanked 42 Times in 29 Posts
Convert ip ranges to CIDR netblocks

Hi,

Recently I had to convert a 280K lines of ip ranges to the CIDR notation and generate a file to be used by ipset (netfilter) for ip filtering.

Input file:

Code:
000.000.000.000 - 000.255.255.255 , 000 , invalid ip
001.000.064.000 - 001.000.127.255 , 000 , XXXXX
001.000.245.123 - 001.000.245.123 , 000 , YYYYYY YYYYY
001.002.002.000 - 001.002.002.255 , 000 , ZZZ ZZZ ZZ
001.002.004.000 - 001.002.004.255 , 000 , AAAA AA

Some of them are range with a single ip.

Required output:

Code:
-N cidr nethash --maxelem 260000
-N single iphash --maxelem 60000
-A cidr 0.0.0.0/8
-A cidr 1.0.64.0/18
-A single 1.0.245.123
-A cidr 1.2.2.0/24
-A cidr 1.2.4.0/24
COMMIT

As I got nowhere with awk - the CIDR convertion being the culprit - I found a solution with Python and its netaddr module:

Code:
#!/usr/bin/python3

"""
Usage: ip2cidr.py input_file
"""

import sys, re, netaddr

def sanitize (ip):
	seg = ip.split('.')
	return '.'.join([ str(int(v)) for v in seg ])

# pointer to input file
fp_source = open(sys.argv[1], "r")

# pointer to outfile
fp_outfile = open('ip.ipset', "w")

ptrnSplit = re.compile(' - | , ')

# Write ipset header to outfile
fp_outfile.write('-N cidr nethash --maxelem 260000\n-N single iphash --maxelem 60000\n',)

for line in fp_source:
	
	# parse on ' - ' et ' , '
	s = re.split(ptrnSplit, line)
	
	# sanitize ip: 001.004.000.107 --> 1.4.0.107 to avoid netaddr err.
	ip = [ sanitize(v) for v in s[:2] ]
	
	# conversion ip range to CIDR netblocks
	# single ip in range
	if ip[0] == ip[1]:
		fp_outfile.write('-A single %s\n' % ip[0])
		
	# multiple ip's in range
	else:
		ipCidr = netaddr.IPRange(ip[0], ip[1])
		for cidr in ipCidr.cidrs():
			fp_outfile.write('-A cidr %s\n' % cidr)

fp_outfile.write('COMMIT\n')

Time to process the 280K ip ranges: 4 minutes.



As I found that time being on the high side and having a couple of days off, I decided to give awk another try:

Code:
@include "lib_netaddr.awk"

function sanitize(ip) {
	split(ip, slice, ".")
	return slice[1]/1 "." slice[2]/1 "." slice[3]/1 "." slice[4]/1
}

BEGIN{
	FS=" , | - "
	print "-N cidr nethash --maxelem 260000\n-N single iphash --maxelem 60000\n"
}

# sanitize ip's
{$1 = sanitize($1); $2 = sanitize($2)}

# range with a single IP
$1==$2 {printf "-A single %s\n", $1} 

# ranges with multiple IP's
$1!=$2{print range2cidr(ip2dec($1), ip2dec($2))}

# footer
END {print "COMMIT\n"}

lib_netaddr.awk

Code:
#
#    Library with various ip manipulation functions
#

# convert ip ranges to CIDR notation
# str range2cidr(ip2dec("192.168.0.15"), ip2dec("192.168.5.115"))
#
# Credit to Chubler_XL for this brilliant function. (see his post below for non GNU awk)
#
function range2cidr(ipStart, ipEnd,  bits, mask, newip) {
    bits = 1
    mask = 1
    result = "-A cidr "
    while (bits < 32) {
        newip = or(ipStart, mask)
        if ((newip>ipEnd) || ((lshift(rshift(ipStart,bits),bits)) != ipStart)) {
           bits--
           mask = rshift(mask,1)
           break
        }
        bits++
        mask = lshift(mask,1)+1
    }
    newip = or(ipStart, mask)
    bits = 32 - bits
    result = result dec2ip(ipStart) "/" bits
    if (newip < ipEnd) result = result "\n" range2cidr(newip + 1, ipEnd)
    return result
}

# convert dotted quads to long decimal ip
#	int ip2dec("192.168.0.15")
#
function ip2dec(ip,   slice) {
	split(ip, slice, ".")
	return (slice[1] * 2^24) + (slice[2] * 2^16) + (slice[3] * 2^8) + slice[4]
}

# convert decimal long ip to dotted quads
#	str dec2ip(1171259392)
#
function dec2ip(dec,    ip, quad) {
	for (i=3; i>=1; i--) {
		quad = 256^i
		ip = ip int(dec/quad) "."
		dec = dec%quad
	}
	return ip dec
}


# convert decimal ip to binary
#	str dec2binary(1171259392)
#
function dec2binary(dec,    bin) {
	while (dec>0) {
		bin = dec%2 bin
		dec = int(dec/2)
	}
	return bin
}

# Convert binary ip to decimal
#	int binary2dec("1000101110100000000010011001000")
#
function binary2dec(bin,   slice, l, dec) {
	split(bin, slice, "")
	l = length(bin)
	for (i=l; i>0; i--) {
		dec += slice[i] * 2^(l-i)
	}
	return dec
}

# convert dotted quad ip to binary
#	str ip2binary("192.168.0.15")
#
function ip2binary(ip) {
	return dec2binary(ip2dec(ip))
}


# count the number of ip's in a dotted quad ip range
#	int countIp ("192.168.0.0" ,"192.168.1.255") + 1
#
function countQuadIp(ipStart, ipEnd) {
	return (ip2dec(ipEnd) - ip2dec(ipStart))
}


# count the number of ip's in a CIDR block
#	int countCidrIp ("192.168.0.0/12")
#
function countCidrIp (cidr) {
	sub(/.+\//, "", cidr)
	return 2^(32-cidr)
}

Time to process: 16 sec. A whooping 15 times faster! Not bad for a 43 years old language! And it's even faster with mawk: 7 sec.

Please note that the @include only works with gawk. If you are using the original awk or the lightning fast mawk, you will have to copy/paste the functions library into your main script.

If you find this awk library useful or if it needs to be optimized, let me know before I submit it in Tips & Tutorials section.

Last edited by ripat; 09-04-2013 at 07:46 AM.. Reason: Inclusion of Chuble_XL's range2cidr() function
The Following 5 Users Say Thank You to ripat For This Useful Post:
bestragamuglava (12-22-2015), jim mcnamara (08-22-2013), Scott (08-22-2013), Smiling Dragon (01-08-2014), vbe (08-26-2013)
Sponsored Links
    #2  
Old Unix and Linux 08-25-2013
Chubler_XL's Unix or Linux Image
Chubler_XL Chubler_XL is offline Forum Staff  
Moderator
 
Join Date: Oct 2010
Last Activity: 19 January 2017, 8:48 PM EST
Posts: 3,332
Thanks: 135
Thanked 1,137 Times in 1,058 Posts
How about this for range2cidr (Then call it like this range2cidr(ip2dec($1), ip2dec($2)):


Code:
function range2cidr(ipStart, ipEnd,  bits, mask, newip) {
    bits = 1
    mask = 1
    while (bits < 32) {
        newip = or(ipStart, mask)
        if ((newip>ipEnd) || ((lshift(rshift(ipStart,bits),bits)) != ipStart)) {
           bits--
           mask = rshift(mask,1)
           break
        }
        bits++
        mask = lshift(mask,1)+1
    }
    newip = or(ipStart, mask)
    bits = 32 - bits
    result = dec2ip(ipStart) "/" bits
    if (newip < ipEnd) result = result "\n" range2cidr(newip + 1, ipEnd)
    return result
}

---------- Post updated at 10:24 AM ---------- Previous update was at 08:31 AM ----------

Of course this does require the following gawk bitwise functions: or() lshift() and rshift()

We could be replace these with local (bit_) variants for more portability.


Code:
# Bitwise OR of var1 and var2
function bit_or(a, b, r, i, c) {
    for (r=i=0;i<32;i++) {
        c = 2 ^ i
        if ((int(a/c) % 2) || (int(b/c) % 2)) r += c
    }
    return r
}


# Rotate bytevalue left x times
function bit_lshift(var, x) {
  while(x--) var*=2;
  return var;
}

# Rotate bytevalue right x times
function bit_rshift(var, x) {
  while(x--) var=int(var/2);
  return var;
}

The Following 4 Users Say Thank You to Chubler_XL For This Useful Post:
bestragamuglava (12-22-2015), Jomeaide (03-09-2014), ripat (09-03-2013), Scott (09-03-2013)
Sponsored Links
    #3  
Old Unix and Linux 09-03-2013
ripat ripat is offline Forum Advisor  
Registered User
 
Join Date: Oct 2006
Last Activity: 3 January 2017, 1:58 AM EST
Location: Belgium
Posts: 544
Thanks: 5
Thanked 42 Times in 29 Posts
Brilliant. Works much better than my original range2cidr() function. I just edited my post above to include your function.

Well done!
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to convert multiple number ranges into sequence? jcue25 Shell Programming and Scripting 3 11-11-2012 11:32 AM
How to parse IP range in CIDR format in C semash! Programming 3 09-28-2009 12:09 PM
date ranges akil Shell Programming and Scripting 2 07-15-2009 06:50 AM
Get IP list from CIDR Santi Shell Programming and Scripting 10 12-10-2007 01:09 PM



All times are GMT -4. The time now is 02:23 AM.