http://www.perlmonks.org?node_id=1205094

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks
if you build an array, say with the following snippet:
$file=$ARGV[0]; @all_IDS=(); open IN, $file; while(<IN>) { chomp $_; push @all_IDs, $_; } close IN;

What I want to do is to calculate the Levenshtein distance between each element and all the rest in the array.
If I have only 2 IDS, it is easy:
use Text::Levenshtein qw(distance); $distance = distance ($id1, $id2);

What must I write in order to sequentially grep each of the IDS and then compare it to the rest? With slice I guess I would remove it and I do not want that, maybe slice and push once I am done? Something smarter?

Replies are listed 'Best First'.
Re: Compare each array element to the rest, sequentially
by Dallaylaen (Chaplain) on Dec 07, 2017 at 15:32 UTC

    It's not clear what you want to achieve here... Did I understand it right that you want to build a matrix of distances? Then you can just use two nested foreach loops:

    use strict; use warnings; use Data::Dumper; use Text::Levenshtein qw(distance); my $file=$ARGV[0]; my @all_ids=(); open my $in, "<", $file or die "Failed to open $file: $!"; while(<$in>) { chomp $_; push @all_ids, $_; } close $in; my @matrix; foreach my $first( @all_ids ) { my @row; foreach my $second( @all_ids ) { push @row, distance( $first, $second ); }; push @matrix, \@row; }; print Dumper(\@matrix);

    This leaves room for optimization, of course.

    P.S. use strict; use warnings; . It's 2017 already!

Re: Compare each array element to the rest, sequentially
by Laurent_R (Canon) on Dec 07, 2017 at 23:11 UTC
    Since the distance between A and B is the same as distance between B and A, you are looking for unordered combinations of two words, so you presumably want to remove a word once is has been combined with all other words, so that you don't find the same pair again. Here is a quick simple example to make such combinations:
    use strict; use warnings; my @array = sort {$a <=> $b} qw/ 23 45 12 4 8 7 5/; while (my $item = shift @array) { print "$item $_\n" for @array; }
    This will print all combinations of the numbers in the list:
    4 5 4 7 4 8 4 12 4 23 4 45 5 7 5 8 5 12 5 23 5 45 7 8 7 12 7 23 7 45 8 12 8 23 8 45 12 23 12 45 23 45
    Just replace the print statement by the distance calculation (and the number in the original list by your words), and you're done. Just three lines of code.
      Added a unique filter, in case the data contains duplicates:
      use strict; use warnings; my %uniq; my @array = sort {$a <=> $b} grep {$uniq{$_}++==0} qw/ 23 45 12 4 45 8 + 7 5/; while (my $item = shift @array) { print "$item $_\n" for @array; }
      (the number 45 is duplicated here, but only one appears in the output).

                      All power corrupts, but we need electricity.

Re: Compare each array element to the rest, sequentially
by thanos1983 (Parson) on Dec 07, 2017 at 18:16 UTC

    Hello Anonymous Monk

    Another possible solution to your problem.

    The DATA are taken from the Text::Levenshtein module.

    #!/usr/bin/perl use strict; use IO::All; use warnings; use Data::Dumper; use Text::Levenshtein qw(distance); my @words = io($ARGV[0])->chomp->slurp; my %hash; for (0..$#words) { my $word = shift @words; $hash{$word} = [distance($word, @words)]; push @words, $word; } print Dumper \%hash; __END__ $ perl test.pl words.txt $VAR1 = { 'bar' => [ 3, 3 ], 'four' => [ 2, 3 ], 'foo' => [ 3, 2 ] }; __DATA__ $ cat words.txt four foo bar

    Update: Maybe this alternative solution is better:

    #!/usr/bin/perl use strict; use IO::All; use warnings; use Data::Dumper; use Text::Levenshtein qw(distance); my @words = io($ARGV[0])->chomp->slurp; my %HoH; for (0..$#words) { my $word = shift @words; my @distances = distance($word, @words); my %hash; @hash{@words} = @distances; $HoH{$word} = \%hash; push @words, $word; } print Dumper \%HoH; __END__ $ perl test.pl words.txt $VAR1 = { 'foo' => { 'bar' => 3, 'four' => 2 }, 'bar' => { 'four' => 3, 'foo' => 3 }, 'four' => { 'bar' => 3, 'foo' => 2 } };

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Compare each array element to the rest, sequentially
by Anonymous Monk on Dec 07, 2017 at 15:17 UTC
    Ok, I actually do it like this:
    for ($i=0; $i<=$#all_IDS; $i++) { $current_ID = $all_IDS[$i]; splice(@all_IDS,$i,1); print $current_ID,"\n######\n"; print "@all_IDS\n"; print "@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n"; push @all_IDS, $current_ID; }

    and it works.
      and it works.

      Are you sure ?. Try

      #!/usr/bin/perl use strict; use Text::Levenshtein qw(distance); my $file = $ARGV[0]; open IN, '<',$file or die "$!"; chomp(my @all_IDS=<IN>); close IN; for (0..$#all_IDS){ my $id = shift @all_IDS; my @ld = distance ($id, @all_IDS); print join "\t",$id,@all_IDS,@ld,"\n"; push @all_IDS,$id; } __DATA__ A BB CCC DDDD
      Result
      A       BB      CCC     DDDD    2       3       4
      BB      CCC     DDDD    A       3       4       2
      CCC     DDDD    A       BB      4       3       3
      DDDD    A       BB      CCC     4       4       4
      
      poj