Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: Best method to diff very large array efficiently

by LanX (Canon)
on Nov 25, 2013 at 21:20 UTC ( #1064302=note: print w/ replies, xml ) Need Help??


in reply to Re: Best method to diff very large array efficiently
in thread Best method to diff very large array efficiently

Like already explained, if keys are sufficient then setting values doesn't make sense (well the OP was updated w/o mention...)

Changing this @diff3{@arr_1} = @arr_1; to  @diff3{@arr_1} = () makes some difference.

Cheers Rolf

( addicted to the Perl Programming Language)


Comment on Re^2: Best method to diff very large array efficiently
Select or Download Code
Re^3: Best method to diff very large array efficiently
by Kenosis (Priest) on Nov 25, 2013 at 22:43 UTC

    I undef @arr_2_hash{@arr_2}; in sub hash_grep(), noting it was faster than the OP's original.

    Changing this @diff3{@arr_1} = @arr_1; to @diff3{@arr_1} = () makes some difference.

    No--it makes a huge difference and it, by far, blows everything else away. Will make that change in a new sub and re-benchmark. Glad you mentioned it!

      Well I think it depends on the testcase, I tried random numbers in an intervall 1..1e6 like BUK did.

      See my benchmark here RFC extending Benchmark.pm to facilitate CODEHASHREF

      Maybe I did something wrong ...

      ... but I'm not to keen to continue, IMHO all approaches are already fast enough.

      Cheers Rolf

      ( addicted to the Perl Programming Language)

      update

      oops undef @hash{@arr} is significantly faster than @hash{@arr}=()

        Well I think it depends on the testcase...

        Indeed, I think you're correct. And since the OP had some very specific specs, the qualified language "based upon benchmarking for this task" was used when characterizing the benchmarking results.

        It was surprising to see how 'slow' Set::Scalar was, in this case. It may, in part, have to do with it maintaining the object-accessible universe of numbers from which the diff is calculated -- also accessed via the returned object.


        update

        Yes, undef @diff3{@arr_1} makes the OP's original solution faster than using @diff3{@arr_1} = (). Have updated the benchmarking. Thank you, again.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1064302]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (13)
As of 2014-10-01 12:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (17 votes), past polls