Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Compare each array element to the rest, sequentially

by Anonymous Monk
on Dec 07, 2017 at 14:54 UTC ( #1205094=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks
if you build an array, say with the following snippet:
$file=$ARGV[0]; @all_IDS=(); open IN, $file; while(<IN>) { chomp $_; push @all_IDs, $_; } close IN;

What I want to do is to calculate the Levenshtein distance between each element and all the rest in the array.
If I have only 2 IDS, it is easy:
use Text::Levenshtein qw(distance); $distance = distance ($id1, $id2);

What must I write in order to sequentially grep each of the IDS and then compare it to the rest? With slice I guess I would remove it and I do not want that, maybe slice and push once I am done? Something smarter?

Replies are listed 'Best First'.
Re: Compare each array element to the rest, sequentially
by Dallaylaen (Hermit) on Dec 07, 2017 at 15:32 UTC

    It's not clear what you want to achieve here... Did I understand it right that you want to build a matrix of distances? Then you can just use two nested foreach loops:

    use strict; use warnings; use Data::Dumper; use Text::Levenshtein qw(distance); my $file=$ARGV[0]; my @all_ids=(); open my $in, "<", $file or die "Failed to open $file: $!"; while(<$in>) { chomp $_; push @all_ids, $_; } close $in; my @matrix; foreach my $first( @all_ids ) { my @row; foreach my $second( @all_ids ) { push @row, distance( $first, $second ); }; push @matrix, \@row; }; print Dumper(\@matrix);

    This leaves room for optimization, of course.

    P.S. use strict; use warnings; . It's 2017 already!

Re: Compare each array element to the rest, sequentially
by Laurent_R (Canon) on Dec 07, 2017 at 23:11 UTC
    Since the distance between A and B is the same as distance between B and A, you are looking for unordered combinations of two words, so you presumably want to remove a word once is has been combined with all other words, so that you don't find the same pair again. Here is a quick simple example to make such combinations:
    use strict; use warnings; my @array = sort {$a <=> $b} qw/ 23 45 12 4 8 7 5/; while (my $item = shift @array) { print "$item $_\n" for @array; }
    This will print all combinations of the numbers in the list:
    4 5 4 7 4 8 4 12 4 23 4 45 5 7 5 8 5 12 5 23 5 45 7 8 7 12 7 23 7 45 8 12 8 23 8 45 12 23 12 45 23 45
    Just replace the print statement by the distance calculation (and the number in the original list by your words), and you're done. Just three lines of code.
      Added a unique filter, in case the data contains duplicates:
      use strict; use warnings; my %uniq; my @array = sort {$a <=> $b} grep {$uniq{$_}++==0} qw/ 23 45 12 4 45 8 + 7 5/; while (my $item = shift @array) { print "$item $_\n" for @array; }
      (the number 45 is duplicated here, but only one appears in the output).

                      All power corrupts, but we need electricity.

Re: Compare each array element to the rest, sequentially
by thanos1983 (Vicar) on Dec 07, 2017 at 18:16 UTC

    Hello Anonymous Monk

    Another possible solution to your problem.

    The DATA are taken from the Text::Levenshtein module.

    #!/usr/bin/perl use strict; use IO::All; use warnings; use Data::Dumper; use Text::Levenshtein qw(distance); my @words = io($ARGV[0])->chomp->slurp; my %hash; for (0..$#words) { my $word = shift @words; $hash{$word} = [distance($word, @words)]; push @words, $word; } print Dumper \%hash; __END__ $ perl test.pl words.txt $VAR1 = { 'bar' => [ 3, 3 ], 'four' => [ 2, 3 ], 'foo' => [ 3, 2 ] }; __DATA__ $ cat words.txt four foo bar

    Update: Maybe this alternative solution is better:

    #!/usr/bin/perl use strict; use IO::All; use warnings; use Data::Dumper; use Text::Levenshtein qw(distance); my @words = io($ARGV[0])->chomp->slurp; my %HoH; for (0..$#words) { my $word = shift @words; my @distances = distance($word, @words); my %hash; @hash{@words} = @distances; $HoH{$word} = \%hash; push @words, $word; } print Dumper \%HoH; __END__ $ perl test.pl words.txt $VAR1 = { 'foo' => { 'bar' => 3, 'four' => 2 }, 'bar' => { 'four' => 3, 'foo' => 3 }, 'four' => { 'bar' => 3, 'foo' => 2 } };

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Compare each array element to the rest, sequentially
by Anonymous Monk on Dec 07, 2017 at 15:17 UTC
    Ok, I actually do it like this:
    for ($i=0; $i<=$#all_IDS; $i++) { $current_ID = $all_IDS[$i]; splice(@all_IDS,$i,1); print $current_ID,"\n######\n"; print "@all_IDS\n"; print "@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n"; push @all_IDS, $current_ID; }

    and it works.
      and it works.

      Are you sure ?. Try

      #!/usr/bin/perl use strict; use Text::Levenshtein qw(distance); my $file = $ARGV[0]; open IN, '<',$file or die "$!"; chomp(my @all_IDS=<IN>); close IN; for (0..$#all_IDS){ my $id = shift @all_IDS; my @ld = distance ($id, @all_IDS); print join "\t",$id,@all_IDS,@ld,"\n"; push @all_IDS,$id; } __DATA__ A BB CCC DDDD
      Result
      A       BB      CCC     DDDD    2       3       4
      BB      CCC     DDDD    A       3       4       2
      CCC     DDDD    A       BB      4       3       3
      DDDD    A       BB      CCC     4       4       4
      
      poj

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1205094]
Approved by haukex
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2018-07-18 09:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (388 votes). Check out past polls.

    Notices?