Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: Code Interpretation

by ikegami (Pope)
on Jul 28, 2014 at 21:29 UTC ( #1095432=note: print w/replies, xml ) Need Help??

in reply to Code Interpretation

Assuming every value of %uni_refs is a unique index of @allrefs (which I judge to be quite likely), the following is a solution that scales better (O(N) instead of O(N log N)).

my %keep = map { $_ => 1 } values %uni_refs; my @refs = @allrefs[ grep $keep{$_}, 0..$#allrefs ];

Replies are listed 'Best First'.
Re^2: Code Interpretation
by wanna_code_perl (Pilgrim) on Jul 29, 2014 at 23:33 UTC

    Edit: Apologies to ikegami for misreading his code. While the benchmark still more or less stands, my assertion that the output would differ was incorrect.

    Your code will not produce the same output as the OP's. You indeed removed an O(N log N) loop, but at the cost of the sort. Even then, thanks to the grep and extra hash loop versus slice, the OP's (sorted) performance is 300% better than your unsorted code. Both unsorted, the gap widens to nearly 1000% with N = 1x105.

    With N = 1x106, the gap shrinks a bit to 214% and 738%, respectively.

    Perhaps I'm missing your point, though? Edit: Yup!

      Your code will not produce the same output as the OP's.

      As long as grep doesn't change the order of 0..$#allrefs, I'd expect the result to be in the same order as by the OP.

      About the performance point, this depends on wether %uni_refs has the same size as @allrefs, or not.
      If %uni_refs is just a small part of @allrefs, ikegami's solution is faster even with the additional sort...

      Update: Oops, I should take a course in reading benchmarks ... map+grep is slower, anyway.
      Yet I'm not completely convinced. There must be an edge case where it is faster :-)
        As long as grep doesn't change the order of 0..$#allrefs, I'd expect the result to be in the same order as by the OP.

        It wouldn't of course. You are right; I grossly misread ikegami's code thinking that he was iterating over the unsorted result of keys (%keep), when in fact he was iterating over the indicies of the original array. (Sorry, ikegami, I should've known better!) And the rest of this node has nothing to do with that.

        At least my own mistake did encourage me to do some potentially enlightening benchmarking. It's always interesting to me when even though O(N) beats O(N log N) on a graph when N is large, how large N needs to be depends on the smaller powers of the function (never shown in big-O notation, big-Theta, yes). The bigger the constant, the bigger N must be to overcome it for the "faster" algorithm to win out, especially in Perl, which as I found out years ago, has some pretty huge constants in even the simplest of operations (but much smaller constants for many of the more complex operations like sort, which are heavily optimized in C).

        What I'm saying is, performance is cute, performance is fun, and performance is often irrelevant (within a few orders of magnitude, anyway), but for me personally, obtaining surprising results about performance is one thing that played a significant part in pushing me to learn how to write better Perl code, and not just more C code translated to run in the perl interpreter.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1095432]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2018-01-21 01:28 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (227 votes). Check out past polls.