Re: CPAN Module to determing overlap of 2 lists?

Here's an implementation of &list_overlap. You could add it to a module if that would be useful to you.

#!/usr/bin/env perl

use strict;
use warnings;

use constant {
    REF  => 0,
    DATA => 1,
    EXP  => 2,
    NAME => 3,
    JOIN => $;,
};

use Test::More;

my @ref = qw{a b c d c};
my @same = qw{a b c d c};
my @long = qw{c d c x y z};
my @short = qw{c x y z};
my @none = qw{x y z};

my @tests = (
    [ [@ref], [@same], 5, 'Same' ],
    [ [@ref], [@long], 3, 'Long' ],
    [ [@ref], [@short], 1, 'Short' ],
    [ [@ref], [@none], 0, 'None' ],
);

plan tests => 0+@tests;

for my $test (@tests) {
    my $got = list_overlap($test->[REF], $test->[DATA]);
    is($got, $test->[EXP], $test->[NAME]);
}

sub list_overlap {
    my ($ref, $data) = @_;

    my $got = 0;
    my ($ref_len, $data_len) = (0+@$ref, 0+@$data);
    my $start = $ref_len > $data_len ? $ref_len - $data_len : 0;

    for my $i ($start .. $ref_len - 1) {
        if (join(JOIN, @{$ref}[$i .. $#$ref])
            eq join(JOIN, @{$data}[0 .. $#$ref - $i])
        ) {
            $got = 1 + $#$ref - $i;
            last;
        }
    }

    return $got;
}
[download]

I find $; useful for this type of join because it's rarely used elsewhere. Pick something else if that's not appropriate.

Output:

1..4
ok 1 - Same
ok 2 - Long
ok 3 - Short
ok 4 - None
[download]

— Ken

Comment on Re: CPAN Module to determing overlap of 2 lists? Select or Download Code

Replies are listed 'Best First'.
Re^2: CPAN Module to determing overlap of 2 lists? by wazat (Monk) on Aug 11, 2020 at 19:27 UTC
That works. It feels a bit resource intensive given the repeated temporary strings created in the joins.	[reply]
Re^3: CPAN Module to determing overlap of 2 lists? by kcott (Archbishop) on Aug 12, 2020 at 08:09 UTC
I've generally found that Perl's functions that operate on strings (`join`, `index`, etc.) are typically very fast. You can compare with other solutions using Benchmark; you can profile to find hotspots (e.g. Devel::NYTProf). I can see a number of micro-optimisations you could make: replace `$ref_len - 1` with `$#$ref` and replace `1 + $#$ref` with `$ref_len`. That may gain you absolutely nothing but, I suppose, the code would be a little neater. You could do a single `join`. Then repeatedly chop the front off (`index` to locate `JOIN` then `substr`); again, that may gain you nothing but perhaps worth a try. I, and no doubt others, would be interested in the results if you do decide to benchmark this against your current "brute force solution", a regex solution, and anything else you may have come up with. — Ken	[reply] [d/l] [select]
Re^4: CPAN Module to determing overlap of 2 lists? by wazat (Monk) on Aug 13, 2020 at 01:03 UTC
Your approach has some similarities to my brute force approach, but is more optimistic. Maybe I will do some benchmarking	[reply]


Perl Monk, Perl Meditation
	PerlMonks