<?xml version="1.0" encoding="windows-1252"?>
<node id="25833" title="Sorting a list of IP addresses (aka Why I hate Big O)" created="2000-08-02 20:15:12" updated="2005-08-13 03:25:08">
<type id="120">
perlmeditation</type>
<author id="18800">
jeffa</author>
<data>
<field name="doctext">
I was recently enjoying(sic) seeing 'script kiddies' trying
to scan the ports on my home box.  I wrote a script to
parse out the packet DENY entries from my &lt;i&gt;messages&lt;/i&gt; 
log file and realized that I should sort the list of 
IP's to make them easier to read.
&lt;p&gt;
Thoughts of complex sorting started to form in my head: 
split up the quad's and start comparing.  I started to look
for some elegant ways of doing this, when I realized how
much more simple (and potentially more efficient) 
it would be to pack each IP into it's hex form.  "This has
to be a much, much faster way", I thought to myself.
&lt;p&gt;
So I did a search at Google and found
&lt;a href="http://www.hpl.hp.com/personal/Larry_Rosler/sort/sorting.html"&gt;this page&lt;/a&gt;
written by Uri Guttman and Larry Rosler.  They discuss this
very thing - giving one example that sorts by breaking up the
IP bytes and comparing them, and another example that packs
the IP's into hex.
&lt;p&gt;
Here are their two examples, slightly paraphrased:
&lt;code&gt;
#naive: O(N*logN)
my @sorted = sort {

	my @a = $a =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/;
	my @b = $b =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/;
	
	$a[0] &lt;=&gt; $b[0] || 
	$a[1] &lt;=&gt; $b[1] || 
	$a[2] &lt;=&gt; $b[2] || 
	$a[3] &lt;=&gt; $b[3]

} @unsorted;

#packed: O(N*logN)
my @sorted = sort {

	pack('C4' =&gt; $a =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/)
	cmp
	pack('C4' =&gt; $b =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/)

} @unsorted;
&lt;/code&gt;
What gets me is the fact that they have the same Big(O), but
when benchmarked:
&lt;code&gt;
Benchmark: timing 5000 iterations of naive, packed...
     naive: 11 wallclock secs ( 9.91 usr +  0.32 sys = 10.23 CPU)
    packed:  8 wallclock secs ( 7.95 usr +  0.08 sys =  8.03 CPU)
&lt;/code&gt;
there is an obvious difference.
&lt;P&gt;
To be honest, I really hated Big(O)analysis in college, 
and I think this is the very reason why.  There is an
obvious amount of better efficiency in the &lt;i&gt;packed&lt;/i&gt;
version, but in a constant way - this of course is eliminated
when doing Big(O), making both algorithms 'equal'.
&lt;P&gt;
I am posting this mainly because I couldn't find any good
IP sorting algorithms on PM, but I was also
wondering how other Monks out there feel about the
Big(O).
&lt;P&gt;
Jeff Anderson&lt;br&gt;
tyring to be a better programmer every day . . .
&lt;P&gt;
&lt;b&gt;&lt;font color="#AA0000"&gt;UPDATE: Fri Aug  4 09:31:30 CDT 2000&lt;/font&gt;&lt;/b&gt;&lt;BR&gt;
Thanks to everyone for helping me understand the error
in my ways - this whole time I was trying to compare
apples to oranges, no wonder I had such a hard time
understanding O(n).  Big thanks to [gryng] for such a
wonderful and heart-felt explaination. You rule. </field>
</data>
</node>
