Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: Faster indexing an array

by wollmers (Scribe)
on Sep 19, 2014 at 23:42 UTC ( [id://1101314]=note: print w/replies, xml ) Need Help??


in reply to Re: Faster indexing an array
in thread Faster indexing an array

I need something similar as result as sdiff() of Algorithm::Diff provides.

$sequence1 = [ qw( a b ) ]; $sequence2 = [ qw( b c ) ]; # the longest common subsequence of it $LCS_index = [[ 1, 1 ]]; # aligned $result = [ [ 'a', '' ], [ 'b', 'b' ], [ '', 'c' ], ]; $stringified = [ 'ab_', '_bc', ];

The two most popular of the fastest algorithms for LCS are Hunt/McIllroy (used in Algorithm::Diff, Algorithm::LCS from BackPAN written in XS) and Meyers/Ukkonen (used in GNU-diff, String::Similarity).

What I implemented is an improved Hunt/McIllroy from AFROZA BEGUM, A GREEDY APPROACH FOR COMPUTING LONGEST COMMON SUBSEQUENCES, Journal of Prime Research in Mathematics Vol. 4(2008), 165-170. It beats A::D::sdiff(). To be fair A::D provides more functionality, which I also try to strip down for comparison. In the end I would like to modify the XS of A::LCS. A::LCS processes 0.8 Mio/s in comparison to 14 thousand/s A::D::sdiff() of length=10, edit-distance=4. But making the aligned hunks via perl from the LCS of A::LCS slows down to 35 thousand/s.

A::LCS-aligned         35714.29/s (n=50000)
lcs_greedy_aligned:  22831.05/s (n=50000)
A::D::sdiff:               14204.55/s (n=50000)

Here my code (dirty as it is work in progress):

sub lcs_greedy { my $self = shift; my $X = shift; my $Y = shift; my $YPos; my $index = 0; push @{ $YPos->{$_} },$index++ for @$Y; my $Xmatches; for ( $index = 0 ; $index <= $#$X ; $index++ ) { if ( exists( $YPos->{$X->[$index]} ) ) { push ( @$Xmatches , $index ); } } my $Xcurrent = -1; my $Ycurrent = -1; my $m = $#$Xmatches; my $n = $#$Y; my @L = (); # LCS my $R = 0; # records the position of last selected symbol my $i = 0; my $Pi; my $Pi1; my $hunk; for ($i = 0; $i <= $m; $i++) { $hunk = []; $Pi = $YPos->{$X->[$Xmatches->[$i]]}->[0] // $n+1; # Position in Y +of ith symbol $Pi1 = ($i < $m && defined $YPos->{$X->[$Xmatches->[$i+1]]}->[0]) ? $YPos->{$X->[$Xmatches->[$i+1]]}->[0] : -1; # Position in Y of + i + 1st symbol #print STDERR '$i: ',$i,' $Pi: ',$Pi,' $Pi1: ',$Pi1,' $R: ',$R,"\n"; while ($Pi1 < $R && $Pi1 > -1) { #print STDERR '$Pi1 < $R',"\n"; shift @{$YPos->{$X->[$Xmatches->[$i+1]]}}; $Pi1 = $YPos->{$X->[$Xmatches->[$i+1]]}->[0] // -1; } while ($Pi < $R && $Pi < $n+1) { #print STDERR '$Pi < $R',"\n"; shift @{$YPos->{$X->[$Xmatches->[$i]]}}; $Pi = $YPos->{$X->[$Xmatches->[$i]]}->[0] // $n+1; } if ($Pi > $Pi1 && $Pi1 > $R) { $hunk = [$Xmatches->[$i+1],$Pi1]; shift @{$YPos->{$X->[$Xmatches->[$i+1]]}}; $R = $Pi1; $i = $i+1; } elsif ($Pi < $n+1) { $hunk = [$Xmatches->[$i],$Pi]; shift @{$YPos->{$X->[$Xmatches->[$i]]}}; $R = $Pi; } if (scalar @$hunk) { while ($Xcurrent+1 < $hunk->[0] || $Ycurrent+1 < $hunk->[1] ) { my $Xtemp = ''; my $Ytemp = ''; if ($Xcurrent+1 < $hunk->[0]) { #$Xtemp = $Xcurrent+1; $Xtemp = $X->[$Xcurrent+1]; $Xcurrent++; } if ($Ycurrent+1 < $hunk->[1]) { #$Ytemp = $Ycurrent+1; $Ytemp = $Y->[$Ycurrent+1]; $Ycurrent++; } push @L,[$Xtemp,$Ytemp]; } $Xcurrent = $hunk->[0]; $Ycurrent = $hunk->[1]; #push @L,$hunk; # indices push @L,[$X->[$Xcurrent],$Y->[$Ycurrent]]; # elements } } while ($Xcurrent+1 <= $#$X || $Ycurrent+1 <= $#$Y ) { my $Xtemp = ''; my $Ytemp = ''; if ($Xcurrent+1 <= $#$X) { #$Xtemp = $Xcurrent+1; $Xtemp = $X->[$Xcurrent+1]; $Xcurrent++; } if ($Ycurrent+1 <= $#$Y) { #$Ytemp = $Ycurrent+1; $Ytemp = $Y->[$Ycurrent+1]; $Ycurrent++; } push @L,[$Xtemp,$Ytemp]; } return \@L; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1101314]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2024-04-19 08:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found