Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Given the OP's two arrays containing approximately 8k unique integer elements, it's evident from benchmarking (Perl v5.14.2 64-bit) that using a hash and grep to find the elements in @arr_1 that are not in @arr_2 may be a good choice:

use strict; use warnings; use Set::Scalar; use List::Compare; use Benchmark qw/cmpthese/; my @arr_1 = 0 .. 8e3; my @arr_2 = 2e3 .. 1e4; sub setScalar { my $s1 = Set::Scalar->new(@arr_1); my $s2 = Set::Scalar->new(@arr_2); my $diff = $s1->difference($s2); } sub listCompare { my $lc = List::Compare->new( \@arr_1, \@arr_2 ); my @diff = $lc->get_Lonly; } sub OPdiff { my %diff3; @diff3{@arr_1} = @arr_1; delete @diff3{@arr_2}; my @diff = ( keys %diff3 ); } sub OPdiffModified { my %diff3; @diff3{@arr_1} = (); delete @diff3{@arr_2}; my @diff = ( keys %diff3 ); } sub OPdiff_undef { my %diff3; undef @diff3{@arr_1}; delete @diff3{@arr_2}; my @diff = ( keys %diff3 ); } sub using_vec { my $vec = ''; vec( $vec, $_, 1 ) = 1 for @arr_2; my @diff = grep !vec( $vec, $_, 1 ), @arr_1; } sub hash_grep { my %arr_2_hash; undef @arr_2_hash{@arr_2}; my @diff = grep !exists $arr_2_hash{$_}, @arr_1; } cmpthese( -5, { setScalar => sub { setScalar() }, listCompare => sub { listCompare() }, OPdiff => sub { OPdiff() }, OPdiffModified => sub { OPdiffModified() }, OPdiff_undef => sub { OPdiff_undef() }, using_vec => sub { using_vec() }, hash_grep => sub { hash_grep() } } );

Output:

Rate setScalar listCompare using_vec OPdiff OPdiffMod +ified hash_grep OPdiff_undef setScalar 7.94/s -- -72% -98% -98% + -98% -98% -99% listCompare 28.1/s 254% -- -92% -93% + -94% -94% -96% using_vec 349/s 4289% 1139% -- -7% + -27% -32% -47% OPdiff 375/s 4623% 1233% 8% -- + -22% -26% -42% OPdiffModified 478/s 5919% 1599% 37% 27% + -- -6% -27% hash_grep 510/s 6317% 1712% 46% 36% + 7% -- -22% OPdiff_undef 652/s 8105% 2217% 87% 74% + 36% 28% -- --

Edit I: My thanks to LanX for redirecting my attention back to the suggested @diff3{@arr_1} = (), as it makes a significant difference in performance, as shown in the OPdiffModified() benchmark results. Thus, based upon benchmarking for this task, OPdiffModified() is the best of this group of diff solutions for the OP.

Edit II: Thanks again to LanX (OK, I'll set up a new node for thanking LanX :), substituted @diff3{@arr_1} = () with undef @diff3{@arr_1} in OPdiff_undef(), and it's now the fastest.


In reply to Re: Best method to diff very large array efficiently by Kenosis
in thread Best method to diff very large array efficiently by newbieperlperson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (2)
As of 2024-03-19 06:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found