Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Fast Way to find one array in second array (Update: < 35 seconds.)

by BrowserUk (Pope)
on Aug 14, 2016 at 07:35 UTC ( #1169739=note: print w/replies, xml ) Need Help??


in reply to Fast Way to find one array in second array

Running the inner loop forward ensure earliest short-circuit and cut time to 1/4.

This will work regardless of the data and takes just under 2 minutes 35 seconds for a maximum sized AoA.

#! perl -slw use strict; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; ## Generate test data my @paths = map[ map chr(65+rand 26), 2 .. 2+rand(11) ], 1 .. 32448; my $start = time; ## Sort by length of subarray; shortest last to facilitate deletion in + a loop @paths = sort{ $#{ $b } <=> $#{ $a } } @paths; #pp \@paths; ## Build an index to the elements in all the subarrays my( $i, %index ) = 0; for my $ref ( @paths ){ $index{ $_ } //= ++$i for @$ref; } #pp \%index; ## Use the index to build bitmaps representing the subarrays my @bitmaps = ( '' ) x @paths; for my $n ( 0 .. $#paths ) { vec( $bitmaps[ $n ], $index{ $_ }, 1 ) = 1 for @{ $paths[ $n ] }; } #pp \@bitmaps; ## bitwise OR the bitmaps to discover wholly contained subarrays and r +emove them OUTER:for my $i ( reverse 0 .. $#paths ) { for my $j ( 0 .. $i-1 ) { ## Remove rev +erse; short-circuit earlier. if( ( $bitmaps[ $j ] | $bitmaps[ $i ] ) eq $bitmaps[ $j ] ) { # warn "deleting $i"; delete $paths[ $i ]; next OUTER; } } } ## remove undefs @paths = grep defined, @paths; printf "Deleted %u subarrays\n", 32448 - @paths; printf "Took %.6f seconds\nEnter to see results:", time() - $start; <STDIN>; pp \@paths; __END__ C:\test>1169736.pl Deleted 22228 subarrays Took 107.267549 seconds Enter to see results:Terminating on signal SIGINT(2) C:\test>1169736.pl Deleted 22024 subarrays Took 108.820184 seconds Enter to see results:Terminating on signal SIGINT(2)

It works by building an index to the items in the subarrays; uses that index to build bitmaps representing the subarrays; the cross compares the bitmaps using bitwise-OR to determine wholly contained subarrays and deletes them.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: Fast Way to find one array in second array (Update: < 35 seconds.)
  • Download Code

Replies are listed 'Best First'.
Re^2: Fast Way to find one array in second array (Update: < 35 seconds.)
by Marshall (Abbot) on Aug 14, 2016 at 23:08 UTC
    Interesting.
    I ran this on my ancient Win32 XP laptop and got:
    C:\Projects_Perl\testing>perl browseruk.pl Deleted 22240 subarrays Took 39.828125 seconds Enter to see results:
    this code is "fast".
Re^2: Fast Way to find one array in second array (Update: < 35 seconds.)
by melmoth (Acolyte) on Oct 12, 2016 at 14:48 UTC

    this solution is definitely one that works as some of the other solutions do not preserve the order of the input arrays. this solution both removes the arrays whose set of elemetns are continaed within a second array, and does so without changing the order the elements in the filtered arrays.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1169739]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (2)
As of 2017-11-19 07:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:













    Results (278 votes). Check out past polls.

    Notices?