Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^3: Match and Extract String with Regex

by lima1 (Curate)
on Nov 17, 2007 at 16:58 UTC ( #651423=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Match and Extract String with Regex
in thread Match and Extract String with Regex

String::LCSS does exactly what you need.

I don't want to bash String::LCSS, but the implementation seems to be the naive O(n^3) algorithm instead of the O(mn) dynamic programming solution (http://en.wikipedia.org/wiki/Longest_common_substring_problem). A quick and dirty (and not thoroughly tested) implementation is much faster (although probably buggy).

sub lcss2 { my ($s, $t) = @_; my $z = 0; my $m = length $s; my $n = length $t; my @S = (undef, split(//, $s)); my @T = (undef, split(//, $t)); my @L; my @ret; for my $i ( 1 .. $m ) { for my $j ( 1 .. $n ) { if ($S[$i] eq $T[$j]) { $L[$i-1][$j-1] ||= 0; $L[$i][$j] = $L[$i-1][$j-1] + 1; if ($L[$i][$j] > $z) { $z = $L[$i][$j]; @ret = (); } if ($L[$i][$j] == $z) { push @ret,substr($s, ($i-$z), $z); } } } } # warn Dumper \@L; return join '*', @ret; }
my $s1 = '6'x 200 . 'zyzxx'; my $s2 = '5'x 200 . 'abczyzefg'; my $count = 1; timethese($count, { 'String::LCSS' => sub { String::LCSS::lcss( $s1, $s2 ) }, 'dynprog' => sub { lcss2( $s1, $s2 )}, });
Update: Took the opportunity to learn XS and wrote String::LCSS_XS.


Comment on Re^3: Match and Extract String with Regex
Select or Download Code
Replies are listed 'Oldest First'.
Re^4: Match and Extract String with Regex
by erroneousBollock (Curate) on Nov 18, 2007 at 04:10 UTC
    Ah, well spotted.

    I don't suppose you'd like to file a bug? ;)

    -David

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://651423]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (14)
As of 2015-07-07 16:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (91 votes), past polls