Sorting ip addresses quickly

I recently set out to sort some ip addresses. Here's what I came up with:

First the straightforward method:

sub by_ip
{
        # Split the two ip addresses up into octets
        my ($a1, $a2, $a3, $a4) = split /\./, $a;
        my ($b1, $b2, $b3, $b4) = split /\./, $b;

        # Check to see if the first octets are the same
        if ($a1 == $b1) {
                # If the first octets are the same, check
                # the second octets
                if ($a2 == $b2)
                {
                        # Ditto for the third octets
                        if ($a3 == $b3)
                        {
                                # If the first 3 octets
                                # of each address are 
                                # the same, return the
                                # comparison of the last
                                # octet
                                $a4 =~ s/^([0-9]+)/$1/;
                                $b4 =~ s/^([0-9]+)/$1/;

                                return $a4 <=> $b4;
                        } else {
                                # 3rd octets were different
                                # so return their comparison
                                return $a3 <=> $b3;
                        }
                } else {
            # 2nd octets were different so
                        # return their comparison
                        return $a2 <=> $b2;
                }
        } else {
                # Best case: The first octets were
                # different so we can return their
                # comparison immediately
                return $a1 <=> $b1;
        }
}

my @sorted = sort by_ip @unsorted;
[download]

This works, but it's slow because we have to split both ip addresses and probably make several equality checks for each address comparison. So can we get away with only mangling each ip address once during the whole operation and only making one numeric comparison? Yes, Schwartzian Transform to the rescue:

my @sorted = map  { $_->[0] }
             sort { $a->[1] cmp $b->[1] }
             map  { [$_, sprintf("%03.f%03.f%03.f%03.f", split(/\./, $
+_))] }
             @unsorted;
[download]

Now what's this doing? Let's split it up into three separate statements:

@mapped = map { [$_, sprintf("%03.f%03.f%03.f%03.f", split(/\./, $_))]
+ } @unsorted;
@sorted = sort { $a->[1] cmp $b->[1] } @mapped;
@sorted = map { $_->[0] } @sorted;
[download]

The first map command takes the list of ip addresses as input and gives us back a new array of arrayrefs. Now each array element in @mapped looks like [ "1.2.3.4", "001002003004" ]. Now we can use the numeric conversion of each ip address to compare it directly with another.
The sort command sorts the members of @mapped by comparing the second element of each (e.g. 001002003004), and leaves us with a sorted array of arrayrefs in @sorted.
The final map command extracts the first element of each member of the sorted array (e.g. "1.2.3.4"), and leaves us with a sorted list of ip addresses.

So why is it faster to use the transform? Well, even in the best case when the first octet of each ip address to be sorted is different, &by_ip() has to make two split()s and two numeric comparisons every time it's called. Since Perl's sort() uses a quicksort algorithm, it runs in O(N*log(N)) time. That makes 2N*log(N) split()s and 2N*log(N) numeric comparisons, where N is the number of addresses in the unsorted list. The transform, on the other hand, only split()s each address once and only makes one numeric comparison for each address comparison. N split()s and N*log(N) comparisons is a big win over 2N*log(N) split()s and 2N*log(N) comparisons.

- Matt

Comment on Sorting ip addresses quickly Select or Download Code

Replies are listed 'Best First'.
Sorting ip addresses quicker by gryng (Hermit) on Jul 14, 2000 at 05:47 UTC
You are indeed correct DrManhattan however a few small things that I'd like to add to make this code even faster. (Though, I'll readily admit, probably not the fastest!) The first thing I noticed was that you said: "Now we can use the numeric conversion of each ip address to compare it ...". This is in regards to your transformation from 1.2.3.4 form to 001002003004 form. However, you use cmp not <=> to compare your ips after this point. cmp is for strings of course, and <=> for numbers. I thought that maybe by simply changing this we could get a little speed up: New code: `my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, int sprintf("%03.f%03.f%03.f%03.f", split(/\. +/, $_))] } @unsorted;` [download] The two differences are that we are now forcing perl to compare the two ips as numbers not strings -- which should be faster, as a 12 digit binary number will only have 5 bytes to compare, whereas a 12 digit character string would have 12 bytes to compare. Also, concerning space, I added an int infront of the sprintf, in order to force perl to store the ip as a number, and not as a string. The result? A slight speed down (6%), I didn't bother to measure the space taken, it should have simply been proportional. However the fact that it got slower shows that the cost difference between 5 comparisons and 12 was not worth the additional cost of converting the 12 character string into a 5 byte number. However it is good to know that it should use less space, which may be more important than 6% speed difference. Results: `Loading ips Loaded 1000 ips Benchmark: running Fastnum, Faststr, each for at least 10 CPU seconds. +.. Fastnum: 11 wallclock secs (11.28 usr + 0.00 sys = 11.28 CPU) @ 7 +.98/s (n=90) Faststr: 12 wallclock secs (11.34 usr + 0.00 sys = 11.34 CPU) @ 8 +.29/s (n=94)` [download] Now, the real kicker is, why use sprintf? The first warning is that you are using a number as a string for longer than you need to. The modified code below treats the octets as numbers as soon as they shoot out of the split statement. On top of this sprintf isn't going to be the fastest subroutine to call. The second thing is, why multiply by 1000? Ip's can only range up to 255, not 999. So just multiply by 256 (which can be optimized away as a bit-shift instead of a full blown integer). Is the cost of string conversion worth the speed increase of removing the sprintf? Remember we are also substituting the sprintf with 4 multiplies. Of course we are also getting one byte back too, 4 bytes to store the ip instead of 5, so slightly faster to compare too (though as we can see the cost of converting can outway the time sorting). So here is the newer code, and benchmark below it. `my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { my ($x,$y)=(0,$_); $x=$_ + $x * 256 for split(/\./, $y); [$y,$x]} @unsorted;` [download] `Loading ips Loaded 1000 ips Benchmark: running Faster, Faststr, each for at least 10 CPU seconds.. +. Faster: 11 wallclock secs (11.09 usr + 0.00 sys = 11.09 CPU) @ 12 +.98/s (n=144) Faststr: 11 wallclock secs (11.24 usr + 0.00 sys = 11.24 CPU) @ 8 +.27/s (n=93)` [download] Tada, a 57% increase! Much much better. As you can see a subroutine call can make a big difference, they cost alot, plus you often don't get to know all the stuff that goes on inside. Well here is one final benchmark, all three running on a larger file. The main idea here is to put more weight on sorting side of the process than the transformation: `Loading ips Loaded 100000 ips Benchmark: timing 5 iterations of Faster, Fastnum, Faststr... Faster: 58 wallclock secs (57.90 usr + 0.21 sys = 58.11 CPU) Fastnum: 83 wallclock secs (82.57 usr + 0.38 sys = 82.95 CPU) Faststr: 83 wallclock secs (82.74 usr + 0.00 sys = 82.74 CPU)` [download] Aha! As you can see, now fastnum is now nearly the same speed as faststr. So the difference of saving 7 comps for the cost of string conversion eventually pays off. This is, as DrManhattan mentioned earlier, because you are paying the cost of your comps Nlog(N) times, not just N. Also notice that our speedup with the new code is also less drastic now that we have 100 times the ips to sort. Anyway, it's been fun, Gryn Updated:* Fixed a small typo (thanks splinky)	[reply] [d/l] [select]
RE: Sorting ip addresses quicker by splinky (Hermit) on Jul 14, 2000 at 07:49 UTC
I think you meant `$x=$_ + $x << 8 for split(/\./, $y);` [download] rather than `$x=$_ + $x * 8 for split(/\./, $y);` [download] Woof	[reply] [d/l] [select]
RE: Sorting ip addresses quicker by DrManhattan (Chaplain) on Jul 21, 2000 at 23:27 UTC
Neat! See, this is exactly why open source works. -Matt	[reply]
Opensource (Re: Sorting ip addresses quicker) by gryng (Hermit) on Jul 23, 2000 at 01:37 UTC
Yeah I agree, I think open source is great for learning, and for programming in general. Open forums like this are great places for learning essential techniques as well as those things that aren't taught in (most) books and classes. I wonder if anyone can come up with a (significantly) faster perl version of this code. I would be quite surprised if that was the fastest way. Ciao, Gryn	[reply]
RE: Sorting ip addresses quickly by davorg (Chancellor) on Jul 14, 2000 at 11:43 UTC
Sorting IP addresses is the canonical example of the Guttman-Rosler Transform (aka the packed default sort) given in A Fresh Look at Efficient Perl Sorting. The paper is well-worth reading in detail, but here is the IP sorting code. `@out = map substr($_, 4) => sort map pack('C4' => /(\d+)\.(\d+)\.(\d+)\.(\d+)/) . $_ => @in;` [download] Notice that by careful choice of a 'pack' function they can use the default `sort` behaviour, rather than writing a custom sort routine. The benchmarks in the paper give this version as being about twice as fast as the Schwartzian Transform. Update: URL replaced with one that works. Thanks to grinder for pointing it out. -- <http://www.dave.org.uk> European Perl Conference - Sept 22/24 2000, ICA, London <http://www.yapc.org/Europe/>	[reply] [d/l]
RE: RE: Sorting ip addresses quickly by arturo (Vicar) on Oct 05, 2000 at 23:13 UTC
I found that split() benchmarks a little faster than the regex in the following code #!/usr/bin/perl -w use strict; use Benchmark; use vars qw/@ip_strings/; # these are all made up I hope @ip_strings = qw(192.168.1.1. 192.168.1.2 152.2.100.2 204.165.43.1 1.2 +.3.4 152.2.1.23 112.145.165.205 2.2.2.2. 4.5.6.7 8.9.10.12 2.4.6.8 10 +.12.14.16); timethese (10000, { 'split-em', q{ my @packed_ips; foreach (@ip_strings) { push @packed_ips, pack 'C4', split /\./, $_; } }, 'regex-em', q{ my @packed_ips; foreach (@ip_strings) { push @packed_ips, pack 'C4', /(\d+)\.(\d+)\.(\d+)\.(\d+) +/; } } } ); [download] Some typical results: `Benchmark: timing 10000 iterations of regex-em, split-em... regex-em: 2 wallclock secs ( 3.64 usr + 0.00 sys = 3.64 CPU) split-em: 3 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 CPU) bash-2.04$ perl pack_ip.pl Benchmark: timing 10000 iterations of regex-em, split-em... regex-em: 3 wallclock secs ( 4.03 usr + 0.00 sys = 4.03 CPU) split-em: 3 wallclock secs ( 3.76 usr + 0.00 sys = 3.76 CPU)` [download] Philosophy can be made out of anything -- or less	[reply] [d/l] [select]
RE: Sorting ip addresses quickly by turnstep (Parson) on Jul 13, 2000 at 23:55 UTC
Nice ST, but the non-ST example could be shorter: `@sorted = sort { @a = split /\./, $a; @b = split /\./, $b; $a[0] <=> $b[0] or $a[1] <=> $b[1] or $a[2] <=> $b[2] or $a[3] <=> $b[3]; } @unsorted;` [download]	[reply] [d/l]
Re: Sorting ip addresses quickly by salva (Canon) on May 28, 2007 at 17:23 UTC
probably, the fastest way to sort IPv4 addresses in Perl is to use Sort::Key::IPv4: `use Sort::Key::IPv4 qw(ipv4sort); my @ip = qw(192.168.1.1 10.0.0.1 127.0.0.1 ...); my @sorted = ipv4sort @ip;` [download] See also this node: Sorting IP addresses, lots of them, quickly.	[reply] [d/l]
Re^2: Sorting ip addresses quickly by Anonymous Monk on Nov 24, 2010 at 13:43 UTC
`print sort map { s/(\d+)/$1/eg ; $_ } <DATA>; __DATA__ 192.168.7.10 192.168.1.2 192.168.6.10 10.10.254.253` [download]	[reply] [d/l]
Re^3: Sorting ip addresses quickly by salva (Canon) on Nov 24, 2010 at 15:19 UTC
obviously that doesn't work: `print sort map { s/(\d+)/$1/eg ; $_ } <DATA>; __DATA__ 192.168.6.10 10.10.254.253 10.10.254.3` [download]	[reply] [d/l]
Re^4: Sorting ip addresses quickly by Anonymous Monk on Nov 24, 2010 at 15:40 UTC

Back to Cool Uses for Perl