http://www.perlmonks.org?node_id=118596

This is code I wrote to test an idea to solve the problem Dominus posted in Challenge Problem: Merging Network Addresses. The benchmark shows an order of magnitude speed increase over using Net::CIDR, as I had originally suggested. And it's an unusual (ab-)use of the regex engine. Enjoy.

(Yes, I'm writing a column about my findings. {grin})

#!/usr/bin/perl -w use strict; $|++; use Socket qw(inet_aton inet_ntoa); sub cidr2bits { my $cidr = shift; my ($addr, $maskbits) = $cidr =~ /^([\d.]+)\/(\d+)$/ or die "bad format for cidr: $cidr"; substr(unpack("B*", inet_aton($addr)), 0, $maskbits); } sub bits2cidr { my $bits = shift; inet_ntoa(pack "B*", substr("${bits}00000000000000000000000000000000", 0, 32)) . "/" . length($bits); } sub mergecidr { local $_ = join "", sort map { cidr2bits($_)."\n" } @_; 1 while s/^(\d+)0\n\1[1]\n/$1\n/m or s/^(\d+)\n\1\d+\n/$1\n/m; map bits2cidr($_), split /\n/; } my @first = qw( 209.152.214.112/30 209.152.214.116/31 209.152.214.118/31 ); my @second = qw( 209.152.214.112/30 209.152.214.116/32 209.152.214.118/31 ); my @third = qw( 209.152.214.112/31 209.152.214.116/31 209.152.214.118/31 ); if (1) { print join "----\n", map join("", "from:\n", map(" $_\n", @$_), "to:\n", map(" $_\n", mergecidr(@$_))), \@first, \@second, \@third; }

Replies are listed 'Best First'.
Re: Merge CIDRs
by gbarr (Monk) on Oct 13, 2001 at 04:31 UTC
    Looking at this and considering my own post, I thought I would try to improve it. The following code allows trailing zeros to be left off the input CIDR and it will leave off the trailing zeros in the output. It also handles 0/0, which neither of our posts did

    use strict; sub cidr2bits { my $cidr = shift; my $n = $cidr =~ s,/(\d+)$, ? $1 : 32; my @n = $cidr =~ m,\d+,g; substr(unpack("B*",pack("C4", @n,0,0,0,0)),0,$n); } sub bits2cidr { my $bits = shift; my $n = length $bits; $bits .= "0" x 8; join(".", unpack("C*", pack("B*",($bits =~ /^((?:.{8})+?)0*$/)[0]))) +."/$n"; } sub mergecidr { local $_ = join("\n", sort map { cidr2bits($_) } @_); 1 while s/^(\d*)\n\1.*$/$1/mg || s/^(\d*)0\n\1.$/$1/mg; map { bits2cidr($_) } (!@_ || length($_)) ? split : ''; }

    Update: Swapped the two s/// in the while to prevent the problem merlyn described. Now in the case of

    10010 100100 100101

    the first two lines will be merged first and yeild the correct result

      1 while s/^(\d*)0\n\1.$/$1/mg || s/^(\d*)\n\1.*$/$1/mg;
      There's a reason I put [1] there instead of your ".". Consider what happens when you have the following:
      10010 100100 100101
      The last two lines will be merged, creating
      10010 10010
      And your code will strip that final 0 from both lines, creating "1001". Wrong. You must ensure that it's a 1. Can't be "any".

      -- Randal L. Schwartz, Perl hacker

Re: Merge CIDRs
by Dominus (Parson) on Oct 13, 2001 at 20:50 UTC
    I think your original suggestion, of using Net::CIDR::cidr2range and merging the ranges end-to-end, was better. Your new program seems to be slower than the one I wrote following your Net::CIDR suggestion. Here's my program for comparison:
    #!/usr/bin/perl use Net::CIDR 'cidr2range', 'range2cidr'; my @ranges; my ($cur_start, $cur_end, $cs, $ce); while (<>) { chomp; my @r = cidr2range($_); my ($r_start, $r_end) = split /-/, $r[0]; my ($rs, $re) = map {inet_to_n($_)} $r_start, $r_end; if (! defined $cur_start) { ($cur_start, $cur_end) = ($r_start, $r_end); ($cs, $ce) = ($rs, $re); } else { if ($rs == $ce + 1) { $cur_end = $r_end; $ce = $re; } else { print join "\n", range2cidr("$cur_start-$cur_end"), ""; ($cur_start, $cur_end) = ($r_start, $r_end); ($cs, $ce) = ($rs, $re); } } } continue { print STDERR "$. records processed\n" if $. % 1000 == 0; } print join "\n", range2cidr("$cur_start-$cur_end"), "" if defined $cur_start; sub inet_to_n { unpack "N", pack "C4", split /\./, shift(); }
    I ran your program on nm5.in and waited three minutes. Then I started my program. My program finished first.

    Of course, I might have made a mistake in the benchmarking somewhere.

    --
    Mark Dominus
    Perl Paraphernalia

      You need to sort by start address. You're lucky that your input data was already sorted by such.

      Also, you have to look for $rs <= $ce + 1 rather than $rs == $ce + 1, or you end up not coalescing things like 0-7 (0.0/3), 4-7 (4.0/2), which should just get swallowed.

      -- Randal L. Schwartz, Perl hacker

        Says merlyn:
        You need to sort by start address. You're lucky that your input data was already sorted by such.
        No, I'm not. My original problem specification guaranteed sorted input.

        Anyway, those changes wouldn't slow down the program so much that I'd prefer to use the code you posted above. But you said the code above was an order of magnitude faster, not an order of magnitude slower, so I wonder what's up? Someone made a mistake, and I'm less than 50% confident that it was you.

        --
        Mark Dominus
        Perl Paraphernalia