Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: CPAN Module to determing overlap of 2 lists?

by kcott (Archbishop)
on Aug 11, 2020 at 07:58 UTC ( [id://11120598]=note: print w/replies, xml ) Need Help??


in reply to CPAN Module to determing overlap of 2 lists?

G'day wazat,

Here's an implementation of &list_overlap. You could add it to a module if that would be useful to you.

#!/usr/bin/env perl use strict; use warnings; use constant { REF => 0, DATA => 1, EXP => 2, NAME => 3, JOIN => $;, }; use Test::More; my @ref = qw{a b c d c}; my @same = qw{a b c d c}; my @long = qw{c d c x y z}; my @short = qw{c x y z}; my @none = qw{x y z}; my @tests = ( [ [@ref], [@same], 5, 'Same' ], [ [@ref], [@long], 3, 'Long' ], [ [@ref], [@short], 1, 'Short' ], [ [@ref], [@none], 0, 'None' ], ); plan tests => 0+@tests; for my $test (@tests) { my $got = list_overlap($test->[REF], $test->[DATA]); is($got, $test->[EXP], $test->[NAME]); } sub list_overlap { my ($ref, $data) = @_; my $got = 0; my ($ref_len, $data_len) = (0+@$ref, 0+@$data); my $start = $ref_len > $data_len ? $ref_len - $data_len : 0; for my $i ($start .. $ref_len - 1) { if (join(JOIN, @{$ref}[$i .. $#$ref]) eq join(JOIN, @{$data}[0 .. $#$ref - $i]) ) { $got = 1 + $#$ref - $i; last; } } return $got; }

I find $; useful for this type of join because it's rarely used elsewhere. Pick something else if that's not appropriate.

Output:

1..4 ok 1 - Same ok 2 - Long ok 3 - Short ok 4 - None

— Ken

Replies are listed 'Best First'.
Re^2: CPAN Module to determing overlap of 2 lists?
by wazat (Monk) on Aug 11, 2020 at 19:27 UTC

    That works.

    It feels a bit resource intensive given the repeated temporary strings created in the joins.

      I've generally found that Perl's functions that operate on strings (join, index, etc.) are typically very fast. You can compare with other solutions using Benchmark; you can profile to find hotspots (e.g. Devel::NYTProf).

      I can see a number of micro-optimisations you could make: replace $ref_len - 1 with $#$ref and replace 1 + $#$ref with $ref_len. That may gain you absolutely nothing but, I suppose, the code would be a little neater.

      You could do a single join. Then repeatedly chop the front off (index to locate JOIN then substr); again, that may gain you nothing but perhaps worth a try.

      I, and no doubt others, would be interested in the results if you do decide to benchmark this against your current "brute force solution", a regex solution, and anything else you may have come up with.

      — Ken

        Your approach has some similarities to my brute force approach, but is more optimistic.

        Maybe I will do some benchmarking

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11120598]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-19 17:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found