http://www.perlmonks.org?node_id=392861

revdiablo has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. I have a pretty simple problem, which I was able to solve without much difficulty. The algorithm I'm using seems a bit clunky, though. I wonder if there might be a cleaner way to do this, or perhaps simply a few tweaks on what I've got.

Update: oops! Apparently I forgot to actually explain anything about the problem. Sorry guys. Jasper's guess was correct -- the code tries to find any combinations of lines that have two or more words in common. The purpose is finding duplicates in a list of names that could have their First/Last and Last/First order rearranged, but also could have the Middle name, or any other mess of additional stuff.

Without further adieu ado, here it is:

#!/usr/bin/perl use strict; use warnings; chomp(my @lines = <DATA>); my @words = map [ split /_/ ], @lines; my %matches; my $comb = combinations( 0 .. $#words ); while (my @comb = $comb->()) { next unless @comb == 2; my ($i, $j) = @comb; for my $wi (@{$words[$i]}) { for my $wj (@{$words[$j]}) { $matches{"$i.$j"}++ if $wi eq $wj; } } } for (grep $matches{$_} > 1, keys %matches) { my ($i, $j) = split /\./; print "$lines[$i] and $lines[$j]\n"; } # from 197008 sub combinations { my @list = @_; my @pick = (0) x @list; return sub { my $i = 0; while (1 < ++$pick[$i]) { $pick[$i] = 0; return if $#pick < ++$i; } return @list[ grep $pick[$_], 0..$#pick ]; }; } __DATA__ one_two one_three_two three_one one_four four_three_one

My initial revision had 4 nested for loops, but I was able to use tye's simple combinations subroutine from (tye)Re2: Finding all Combinations to reduce it to just three. I couldn't think of a nice way to use combinations on the 2nd set of loops, though. Perhaps I'm missing something obvious?

Any ideas will be greatly appreciated.