First the straightforward method:
sub by_ip { # Split the two ip addresses up into octets my ($a1, $a2, $a3, $a4) = split /\./, $a; my ($b1, $b2, $b3, $b4) = split /\./, $b; # Check to see if the first octets are the same if ($a1 == $b1) { # If the first octets are the same, check # the second octets if ($a2 == $b2) { # Ditto for the third octets if ($a3 == $b3) { # If the first 3 octets # of each address are # the same, return the # comparison of the last # octet $a4 =~ s/^([0-9]+)/$1/; $b4 =~ s/^([0-9]+)/$1/; return $a4 <=> $b4; } else { # 3rd octets were different # so return their comparison return $a3 <=> $b3; } } else { # 2nd octets were different so # return their comparison return $a2 <=> $b2; } } else { # Best case: The first octets were # different so we can return their # comparison immediately return $a1 <=> $b1; } } my @sorted = sort by_ip @unsorted;
This works, but it's slow because we have to split both ip addresses and probably make several equality checks for each address comparison. So can we get away with only mangling each ip address once during the whole operation and only making one numeric comparison? Yes, Schwartzian Transform to the rescue:
my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, sprintf("%03.f%03.f%03.f%03.f", split(/\./, $ +_))] } @unsorted;
Now what's this doing? Let's split it up into three separate statements:
@mapped = map { [$_, sprintf("%03.f%03.f%03.f%03.f", split(/\./, $_))] + } @unsorted; @sorted = sort { $a->[1] cmp $b->[1] } @mapped; @sorted = map { $_->[0] } @sorted;
- The first map command takes the list of ip addresses as input and gives us back a new array of arrayrefs. Now each array element in @mapped looks like [ "1.2.3.4", "001002003004" ]. Now we can use the numeric conversion of each ip address to compare it directly with another.
- The sort command sorts the members of @mapped by comparing the second element of each (e.g. 001002003004), and leaves us with a sorted array of arrayrefs in @sorted.
- The final map command extracts the first element of each member of the sorted array (e.g. "1.2.3.4"), and leaves us with a sorted list of ip addresses.
So why is it faster to use the transform? Well, even in the best case when the first octet of each ip address to be sorted is different, &by_ip() has to make two split()s and two numeric comparisons every time it's called. Since Perl's sort() uses a quicksort algorithm, it runs in O(N*log(N)) time. That makes 2N*log(N) split()s and 2N*log(N) numeric comparisons, where N is the number of addresses in the unsorted list. The transform, on the other hand, only split()s each address once and only makes one numeric comparison for each address comparison. N split()s and N*log(N) comparisons is a big win over 2N*log(N) split()s and 2N*log(N) comparisons.
- Matt
|
---|
Replies are listed 'Best First'. | |
---|---|
Sorting ip addresses quicker
by gryng (Hermit) on Jul 14, 2000 at 05:47 UTC | |
by splinky (Hermit) on Jul 14, 2000 at 07:49 UTC | |
by DrManhattan (Chaplain) on Jul 21, 2000 at 23:27 UTC | |
by gryng (Hermit) on Jul 23, 2000 at 01:37 UTC | |
RE: Sorting ip addresses quickly
by davorg (Chancellor) on Jul 14, 2000 at 11:43 UTC | |
by arturo (Vicar) on Oct 05, 2000 at 23:13 UTC | |
RE: Sorting ip addresses quickly
by turnstep (Parson) on Jul 13, 2000 at 23:55 UTC | |
Re: Sorting ip addresses quickly
by salva (Canon) on May 28, 2007 at 17:23 UTC | |
by Anonymous Monk on Nov 24, 2010 at 13:43 UTC | |
by salva (Canon) on Nov 24, 2010 at 15:19 UTC | |
by Anonymous Monk on Nov 24, 2010 at 15:40 UTC |