The Big-O notation doesn't work in such a simple way as
taking the complexity of two algorithms and comparing them
directly like that. Big-O notation relies heavily on
exactly what 'N' is. Just because two algorithms are O(N)
or O(N^2) doesn't mean a thing, because in one algorithm,
N could be a large, time-consuming mathematical calculation,
and in the other N could be a simple regular expression.
The power of Big-O notation comes with being able to roughly
be able to predict how an algorithm will act on different
sizes of datasets, and taking that information and tailoring
your algorithm to get the best performance based on the
fastest running time of the average-sized set of data. If
one algorithm is O(N), and the other is O(N^2), the latter
may be a better choice in some cases, if the former has a
much larger N, and you can ensure that there won't be too
much data to negate the smaller N.
Basically, it all boils down to the necessity of doing much
more testing than you did. The best efficiency takes many
steps to reach. If you did more benchmarks with different
numbers of IP addresses, different complexity algorithms,
and things like that, you'd start to see how the Big-O
notation would help you predict future tests.