Convert ip ranges to CIDR netblock


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Convert ip ranges to CIDR netblock
# 29  
Old 12-25-2018
Tried all 4 variants that you posted Don Cragun. Each one produced a segmentation fault at the exact same place as my previous code using awk. gawk is able to process the entire file without any issues. awk produced 141K lines then it had a segmentation fault. gawk produced 257K lines which was the entire downloaded file that needed to be processed. I compared the output from both awk and gawk and both produced the same results except of course past the 141K mark awk produced nothing.
# 30  
Old 12-25-2018
Please show us the last two lines in your input file that were successfully processed by awk and the next three lines in your input file. Also, please show us the output you get from gawk for those five input lines and the exact text of the diagnostic message that is printed when awk fails.

Does awk still fail if you only feed it the above five lines; or does it only fail if it has already processed the first 141K lines?

With the above requested information, maybe we can trace through what is happening with the line that is failing in awk.
This User Gave Thanks to Don Cragun For This Post:
# 31  
Old 12-25-2018
Quote:
Originally Posted by Don Cragun
Please show us the last two lines in your input file that were successfully processed by awk and the next three lines in your input file.
Code:
Quanzhou Broadcasting & TV Transmit Center:125.254.192.0-125.254.255.255
SoftbankBB gnufakes:126.72.94.40-126.72.94.40  <-- last success
BBN Communications:128.1.0.0-128.1.255.255
Lawrence Berkeley National Laboratory:128.3.0.0-128.3.255.255
University of Maryland :128.8.61.128-128.8.61.255

Also, please show us the output you get from gawk for those five input lines and the exact text of the diagnostic message that is printed when awk fails.

gawk output for just the five lines:
Code:
125.254.192.0/18
126.72.94.40/32
128.1.0.0/16
128.3.0.0/16
128.8.61.128/25

awk output for just the five lines:
Code:
125.254.192.0/18
126.72.94.40/32
Segmentation fault (core dumped)

Quote:
Does awk still fail if you only feed it the above five lines; or does it only fail if it has already processed the first 141K lines?
Yes awk failes with just the above five lines.

Quote:
With the above requested information, maybe we can trace through what is happening with the line that is failing in awk.
awk seems to be struggling with several lines after the last successful line.

I put this line all by itself in a file and got segmentation fault:
BBN Communications:128.1.0.0-128.1.255.255

I also put this one in file all by itself and got segmentation fault:
Lawrence Berkeley National Laboratory:128.3.0.0-128.3.255.255

As well with this one:
University of Maryland:128.8.61.128-128.8.61.255

Again, this is the file I've been downloading and using:
iblocklist

Thank you for looking into this.

Last edited by azdps; 12-26-2018 at 01:49 AM..
# 32  
Old 12-26-2018
Are you using awk's native lshift/rshift/or functions OR did you provide your own implementations (as previously discussed)?
Are there any other ip ranges with the 128.x.x.x addresses that don't dump cores?
One way to start debugging this is to put print "in function <foo>" statements for each function of the script and see which function/place dumps core.
This User Gave Thanks to vgersh99 For This Post:
# 33  
Old 12-26-2018
Quote:
Originally Posted by vgersh99
Are you using awk's native lshift/rshift/or functions OR did you provide your own implementations (as previously discussed)?
Yes I was using awk native functions. Here is the script:
Code:
function range2cidr(ipStart, ipEnd, result, bits, mask, newip) {
    bits = 1
    mask = 1
    while (bits < 32) {
        newip = or(ipStart, mask)
        if ((newip>ipEnd) || ((lshift(rshift(ipStart,bits),bits)) != ipStart)) {
           bits--
           mask = rshift(mask,1)
           break
        }
        bits++
        mask = lshift(mask,1)+1
    }
    newip = or(ipStart, mask)
    bits = 32 - bits
    result = (result)?result ORS dec2ip(ipStart) "/" bits : dec2ip(ipStart) "/" bits
    if (newip < ipEnd) result = range2cidr(newip + 1, ipEnd,result)
    return result
}

# convert dotted quads to long decimal ip
#       int ip2dec("192.168.0.15")
#
function ip2dec(ip, slice) {
        split(ip, slice, /[.]/)
        return (slice[1] * 2^24) + (slice[2] * 2^16) + (slice[3] * 2^8) + slice[4]
}

# convert decimal long ip to dotted quads
#       str dec2ip(1171259392)
#
function dec2ip(dec, ip, quad) {
        for (i=3; i>=1; i--) {
                quad = 256^i
                ip = ip int(dec/quad) "."
                dec = dec%quad
        }
        return ip dec
}

function sanitize(ip) {
        split(ip, slice, /[.]/)
        return slice[1]/1 "." slice[2]/1 "." slice[3]/1 "." slice[4]/1
}

BEGIN{
        FS=" - |-|:"
}

# sanitize ip's
!/^#/ && NF {
  f1= sanitize($(NF-1))
  f2= sanitize($NF)
  print range2cidr(ip2dec(f1), ip2dec(f2))
}

END {print ""}

After you mentioned native functions lshift/rshift/or I decided to try the other previously "discussed implementations" again (see post #7) bit_lshift/bit_rshift,bit_or. That particular script produced wrong output since it didn't have the changes you made to correct the code. I replaced lshift/rshift/or with bit_lshift/bit_rshift,bit_or in the script and awk does work. No segmentation fault. See the new script below which produces accurate results:

Code:
function range2cidr(ipStart, ipEnd, result, bits, mask, newip) {
    bits = 1
    mask = 1
    while (bits < 32) {
        newip = bit_or(ipStart, mask)
        if ((newip > ipEnd) || ((bit_lshift(bit_rshift(ipStart,bits),bits)) != ipStart)) {
            bits--
            mask = bit_rshift(mask,1)
            break
        }
        bits++
        mask = bit_lshift(mask,1)+1
    }
    newip = bit_or(ipStart, mask)
    bits = 32 - bits
    result = (result)?result ORS dec2ip(ipStart) "/" bits : dec2ip(ipStart) "/" bits
    if (newip < ipEnd) result = range2cidr(newip + 1, ipEnd,result)
    return result
}

# convert dotted quads to long decimal ip
#	int ip2dec("192.168.0.15")
#
function ip2dec(ip, slice) {
    split(ip, slice, /[.]/)
    return (slice[1] * 2^24) + (slice[2] * 2^16) + (slice[3] * 2^8) + slice[4]
}

# convert decimal long ip to dotted quads
#	str dec2ip(1171259392)
#
function dec2ip(dec, ip, quad) {
    for (i=3; i>=1; i--) {
        quad = 256^i
        ip = ip int(dec/quad) "."
        dec = dec%quad
    }
    return ip dec
}

# Bitwise OR of var1 and var2
function bit_or(a, b, r, i, c) {
    for (r=i=0;i<32;i++) {
        c = 2 ^ i
        if ((int(a/c) % 2) || (int(b/c) % 2)) r += c
    }
    return r
}

# Rotate bytevalue left x times
function bit_lshift(var, x) {
    while(x--) var*=2;
    return var;
}

# Rotate bytevalue right x times
function bit_rshift(var, x) {
    while(x--) var=int(var/2);
    return var;
}

function sanitize(ip) {
    split(ip, slice, /[.]/)
    return slice[1]/1 "." slice[2]/1 "." slice[3]/1 "." slice[4]/1
}

BEGIN{
    FS=" - |-|:"
}

# sanitize ip's
!/^#/ && NF {
    f1= sanitize($(NF-1))
    f2= sanitize($NF)
    print range2cidr(ip2dec(f1), ip2dec(f2))
}

END {print ""}

Benchmarks processing a file containing approximately 236K IP address ranges with the new script. Same benchmarks as before (see post #7).
  • ipcacl 15 min
  • mawk 59 sec
  • gawk 1 min 45 sec
  • awk 2 min 46 sec

Appears the issue may have been the implementation of the native functions lshift/rshift/or in the version of awk that OpenBSD uses. I would recommend that anyone that uses gawk use the first script with the native functions since the script is over twice as fast.

vgersh99 when I get the chance I'll try the debugging you suggested.

Last edited by azdps; 12-26-2018 at 02:01 PM..
This User Gave Thanks to azdps For This Post:
# 34  
Old 12-26-2018
Glad it worked out for you!
Cheers!
This User Gave Thanks to vgersh99 For This Post:
# 35  
Old 12-28-2018
Quote:
Originally Posted by vgersh99
Are there any other ip ranges with the 128.x.x.x addresses that don't dump cores?
vgersh99 I did some more testing with the awk script that segmentation faults. I've tested numerous IP ranges and anything that starts with 128 or higher awk will fault. The results are very strange. Anything that less than 128 such as 127.x.x.x will not fault. 128.x.x.x and greater such as 129.x.x.x, 221.x.x.x, 222.x.x.x, etc with produce a segmentation fault.

I messed around with the other parts of the IP address ranges and these work fine:
127.128.128.0 - 127.128.128.255
124.11.58.0 - 127.11.58.7

These do not work and segmentation faults (IP starting 128 or higher)
129.128.128.0 - 129.128.128.255

Although the first IP starts with 124 the second IP in the range starts with 128 and seg faults:
124.11.58.0 - 128.11.58.7

EDIT: Reason for segmentation fault solved

Okay I found this information reference OpenBSD awk vs gawk. It states "Gawk uses 53-bit unsigned integers, but OpenBSD awk uses 32-bit signed integers." This applies to the bitwise operations.

If I convert 128.0.0.0 to decimal the result is 2,147,483,648 which exceeds the maximum 32-bit signed integer value for variables 2,147,483,647 declared as integers. So it's clear now why the script that uses the native lshift, rshift, or bitwise operations is causing an awk segmentation fault with IP's greater than 128.0.0.0 and the script that uses the custom bit_lshift, bit_rshift, bit_or bitwise operations doesn't.

End of a long story =(

Last edited by azdps; 12-28-2018 at 12:04 PM..
This User Gave Thanks to azdps For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. What is on Your Mind?

Blocked A6-Index and Entire AWS Netblock

Weary of seeing our load average go up to 50+, I just did a major block on these networks (stats over a less than 20 min interval): https://www.unix.com/members/1-albums215-picture866.png (3 Replies)
Discussion started by: Neo
3 Replies

2. Shell Programming and Scripting

Convert ip ranges to CIDR netblocks

Hi, Recently I had to convert a 280K lines of ip ranges to the CIDR notation and generate a file to be used by ipset (netfilter) for ip filtering. Input file: 000.000.000.000 - 000.255.255.255 , 000 , invalid ip 001.000.064.000 - 001.000.127.255 , 000 , XXXXX 001.000.245.123 -... (10 Replies)
Discussion started by: ripat
10 Replies

3. Shell Programming and Scripting

How to change ip addressing format from CIDR notation to netmask and vice versa?

Hi all, I would appreciate if someone could share how to convert CIDR notation to netmask and vice versa. The value below is just an example. it could be different numbers/ip addresses. Initial Output, let say file1.txt Final Output, let say file2.txt (3 Replies)
Discussion started by: type8code0
3 Replies

4. Shell Programming and Scripting

How to convert multiple number ranges into sequence?

Looking for a simple way to convert ranges to a numerical sequence that would assign the original value of the range to the individual numbers that are on the range. Thank you given data 13196-13199 0 13200 4 13201 10 13202-13207 3 13208-13210 7 desired... (3 Replies)
Discussion started by: jcue25
3 Replies

5. Shell Programming and Scripting

Values between ranges

Hi, I have two files file1 chr1_22450_22500 chr2_12300_12350 chr1_34500_34550 file2 11000_13000 15000_19000 33000_44000 If the file 1 ranges fall between file2 ranges then assign the value of file2 in column 2 to file1 output: chr2_12300_12350 11000_13000 chr1_34500_34550 ... (7 Replies)
Discussion started by: Diya123
7 Replies

6. UNIX for Dummies Questions & Answers

Need help filling in ranges

I have a list of about 200,000 lines in a text file that look like this: 1 1 120 1 80 200 1 150 270 5 50 170 5 100 220 5 300 420 The first column is an identifier, the next 2 columns are a range (always 120 value range) I'm trying fill in the values of those ranges, and remove... (4 Replies)
Discussion started by: knott76
4 Replies

7. Programming

How to parse IP range in CIDR format in C

Hello everybody, I'm coding a network program and i need it to "understand" ip ranges, but i don't know how to make to parse an IP CIDR range, let's say "172.16.10.0/24" to work with the specified IP range. I've found a program which does it, but i don't understand the code. Here is the... (3 Replies)
Discussion started by: semash!
3 Replies

8. Shell Programming and Scripting

date ranges

Hi, Please anyone help to achive this using perl or unix scripting . This is date in my table 20090224,based on the date need to check the files,If file exist for that date then increment by 1 for that date and check till max date 'i.e.20090301 and push those files . files1_20090224... (2 Replies)
Discussion started by: akil
2 Replies

9. Shell Programming and Scripting

Get IP list from CIDR

Dear Srs :-) I'm looking for a shell script, that given a network in CIDR format it lists all IPs, for example: Preferredly a shell script, but a Perl, Python, C, etc.. is also welcome :-) I have been looking in sipcalc, ipcalc, etc.. options but this feature is not implemented :-( ... (10 Replies)
Discussion started by: Santi
10 Replies
Login or Register to Ask a Question